LibGuides: Data Literacy: Statistical Definitions

Statistical Definitions

Descriptive Descriptions

Central Tendencies

(an estimate of the "center" of a distribution of values)

Median: The "middle" of a sorted list of numbers

Found by ordering the numbers and finding the middle number
When the list of numbers is even, the 2 middle numbers are averaged for the median
Also known as the 50th percentile

Mean: Same as what most people call average.

Found by adding up a list of numbers and dividing by the count of numbers in the list

Mode: The most commonly occuring value

Found by counting those numbers that repeat
Those that repeat the most are the "mode"
There may be more than one "mode"
There may be no mode

Disperson

(spread of the values around the central tendency)

Standard Deviation: extent of deviation for a group as a whole

Find the Mean
Subtract the mean from the value of each item
Square the result
When finished, sum all the results
Divide that answer by a number that is 1 less than the total number of items
Take the square root of the quotient

Quartiles: Along with the Median, these are values that devide your data into 4 parts.

For the first quartile (Q1) find the middle number between the smallest number and the median
For the third quartile (Q3) find the middle number between the largest number and the median

Distribution

(summary of the frequency of individual values or ranges of values for a variable)

Frequency Distribution: A table showing the values in a sample and how often they occur

Inferential Descriptions

Population: The subjects of a particular study
Parameters: A characteristic or property of a population
Sample: A subset of the population. What data is collected
Statistic: A characteristic of a sample

Estimation of parameters
Confidence intervals: the best estimation of the parameter of a population value given the sample value.* It is equivelent to the Mean plus or minus the error of margin.
Error of margin:

Calculated by taking the standard deviation (or standard error)
Dividing it by the square root of the number of observations (> 30 if the std dev is from the sample)
Multiplying the result by the Confidence Interval (1.96 for a 95% confidence level and 2.576 for 99% confidence level)

Hypothesis testing
Null hypothesis: When comparing two populations, the assumption that there will be no difference between them.

Regression analysis: A statistical process for estimating relationships among variables. It is used to determine which independent variables (x) will have an impact on the main factor (dependent variable - y) you are trying to understand.

T-test: Used to determine if there is a significant difference between the means of two groups.

ANOVA: Analysis of Variance. A test for the difference between two or more means*