Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Literacy

General information on data literacy

Statistical Definitions

Descriptive Descriptions

Central Tendencies
(an estimate of the "center" of a distribution of values)

Median: The "middle" of a sorted list of numbers
  • Found by ordering the numbers and finding the middle number
  • When the list of numbers is even, the 2 middle numbers are averaged for the median
  • Also known as the 50th percentile
Mean: Same as what most people call average.
  • Found by adding up a list of numbers and dividing by the count of numbers in the list
Mode: The most commonly occuring value
  • Found by counting those numbers that repeat
  • Those that repeat the most are the "mode"
  • There may be more than one "mode"
  • There may be no mode
Disperson
(spread of the values around the central tendency)

Standard Deviation: extent of deviation for a group as a whole 
  • Find the Mean
  • Subtract the mean from the value of each item
  • Square the result
  • When finished, sum all the results
  • Divide that answer by a number that is 1 less than the total number of items
  • Take the square root of the quotient
Quartiles: Along with the Median, these are values that devide your data into 4 parts.
  • For the first quartile (Q1) find the middle number between the smallest number and the median
  • For the third quartile (Q3) find the middle number between the largest number and the median 

 

Distribution
(summary of the frequency of individual values or ranges of values for a variable)

Frequency Distribution: A table showing the values in a sample and how often they occur

 

Inferential Descriptions

Population: The subjects of a particular study
Parameters: A characteristic or property of a population 
Sample: A subset of the population. What data is collected
Statistic: A characteristic of a sample

Estimation of parameters
Confidence intervals: the best estimation of the parameter of a population value given the sample value.* It is equivelent to the Mean plus or minus the error of margin.
Error of margin:
  • Calculated by taking the standard deviation (or standard error)
  • Dividing it by the square root of the number of observations (> 30 if the std dev is from the sample)
  • Multiplying the result by the Confidence Interval (1.96 for a 95% confidence level and 2.576 for 99% confidence level)
Hypothesis testing
Null hypothesis: When comparing two populations, the assumption that there will be no difference between them.

Regression analysis: A statistical process for estimating relationships among variables. It is used to determine which independent variables (x) will have an impact on the main factor (dependent variable - y) you are trying to understand.

T-test: Used to determine if there is a significant difference between the means of two groups.

ANOVA: Analysis of Variance. A test for the difference between two or more means*