Skip to Main Content

Data Literacy

General information on data literacy

Statistical Definitions

Descriptive Descriptions

Central Tendencies
(an estimate of the "center" of a distribution of values)

Median: The "middle" of a sorted list of numbers
  • Found by ordering the numbers and finding the middle number
  • When the list of numbers is even, the 2 middle numbers are averaged for the median
  • Also known as the 50th percentile
Mean: Same as what most people call average.
  • Found by adding up a list of numbers and dividing by the count of numbers in the list
Mode: The most commonly occuring value
  • Found by counting those numbers that repeat
  • Those that repeat the most are the "mode"
  • There may be more than one "mode"
  • There may be no mode
(spread of the values around the central tendency)

Standard Deviation: extent of deviation for a group as a whole 
  • Find the Mean
  • Subtract the mean from the value of each item
  • Square the result
  • When finished, sum all the results
  • Divide that answer by a number that is 1 less than the total number of items
  • Take the square root of the quotient
Quartiles: Along with the Median, these are values that devide your data into 4 parts.
  • For the first quartile (Q1) find the middle number between the smallest number and the median
  • For the third quartile (Q3) find the middle number between the largest number and the median 


(summary of the frequency of individual values or ranges of values for a variable)

Frequency Distribution: A table showing the values in a sample and how often they occur


Inferential Descriptions

Population: The subjects of a particular study
Parameters: A characteristic or property of a population 
Sample: A subset of the population. What data is collected
Statistic: A characteristic of a sample

Estimation of parameters
Confidence intervals: the best estimation of the parameter of a population value given the sample value.* It is equivelent to the Mean plus or minus the error of margin.
Error of margin:
  • Calculated by taking the standard deviation (or standard error)
  • Dividing it by the square root of the number of observations (> 30 if the std dev is from the sample)
  • Multiplying the result by the Confidence Interval (1.96 for a 95% confidence level and 2.576 for 99% confidence level)
Hypothesis testing
Null hypothesis: When comparing two populations, the assumption that there will be no difference between them.

Regression analysis: A statistical process for estimating relationships among variables. It is used to determine which independent variables (x) will have an impact on the main factor (dependent variable - y) you are trying to understand.

T-test: Used to determine if there is a significant difference between the means of two groups.

ANOVA: Analysis of Variance. A test for the difference between two or more means*