Descriptive Descriptions
Central Tendencies
(an estimate of the "center" of a distribution of values)
Median: The "middle" of a sorted list of numbers
- Found by ordering the numbers and finding the middle number
- When the list of numbers is even, the 2 middle numbers are averaged for the median
- Also known as the 50th percentile
Mean: Same as what most people call average.
- Found by adding up a list of numbers and dividing by the count of numbers in the list
Mode: The most commonly occuring value
- Found by counting those numbers that repeat
- Those that repeat the most are the "mode"
- There may be more than one "mode"
- There may be no mode
Disperson
(spread of the values around the central tendency)
Standard Deviation: extent of deviation for a group as a whole
- Find the Mean
- Subtract the mean from the value of each item
- Square the result
- When finished, sum all the results
- Divide that answer by a number that is 1 less than the total number of items
- Take the square root of the quotient
Quartiles: Along with the Median, these are values that devide your data into 4 parts.
- For the first quartile (Q1) find the middle number between the smallest number and the median
- For the third quartile (Q3) find the middle number between the largest number and the median
Distribution
(summary of the frequency of individual values or ranges of values for a variable)
Frequency Distribution: A table showing the values in a sample and how often they occur
Inferential Descriptions
Population: The subjects of a particular study
Parameters: A characteristic or property of a population
Sample: A subset of the population. What data is collected
Statistic: A characteristic of a sample
Estimation of parameters
Confidence intervals: the best estimation of the parameter of a population value given the sample value.* It is equivelent to the Mean plus or minus the error of margin.
Error of margin:
- Calculated by taking the standard deviation (or standard error)
- Dividing it by the square root of the number of observations (> 30 if the std dev is from the sample)
- Multiplying the result by the Confidence Interval (1.96 for a 95% confidence level and 2.576 for 99% confidence level)
Hypothesis testing
Null hypothesis: When comparing two populations, the assumption that there will be no difference between them.
Regression analysis: A statistical process for estimating relationships among variables. It is used to determine which independent variables (
x) will have an impact on the main factor (dependent variable -
y) you are trying to understand.
T-test: Used to determine if there is a significant difference between the means of two groups.
ANOVA: Analysis of Variance. A test for the difference between two
or more means*