Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
118 Cards in this Set
- Front
- Back
Statistics
|
the science of conducting studies to organize, collect, summarize, analyze, and draw conclusions from data.
|
|
Variable
|
a characteristic that can assume different values
|
|
Data
|
the values that the variables can assume
|
|
Descriptive Statistics
|
the collection, organization, summarization, and presentation of data
|
|
Inferential Statistics
|
generalizing from samples to populations, performing estimations and hypothesis tests, determining relatinships among variables, and making predictions
|
|
Population
|
all subjects that are being studied
|
|
Sample
|
group of subjects selected from a population
|
|
hypothesis testing
|
decision making process for evaluating claims about a population, based on information obtained from samples
|
|
qualitative variable
|
variables that can be placed in categories, according to a characteristic, not a number
|
|
quantitative variable
|
numerical and can be ordered or ranked
|
|
Discrete variable
|
values that can be counted
|
|
Continuous variable
|
assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals
|
|
Nominal level of measurement
|
classifies data into mutually exclusive, exhausting categories in which no order or ranking can be imposed on the data
|
|
ordinal level of measurement
|
classifies data into categories that can be ranked. precise differences between ranks don't exist
|
|
interval level measurement
|
ranks data, and precise differences between units of measure do exist. no meaningful zero
|
|
ratio level of measurement
|
possesses all the characteristics of interval measurement, and there exists a true zero. in addition, true ratios exist when the same variable is measured on two different members of the population
|
|
observational study
|
the researcher observes what is happening or what has happened in the past and tries to draw conclusions based on these observations
|
|
experimental study
|
teh researcher manipulates one of the variables and tries to determine how the manipulation influences other variables
|
|
random sampling
|
selected by using chance methods or random numbers
|
|
systematic sampling
|
each subject of the population is numbered and then every kth subject is selected
|
|
stratified sampling
|
the population is divided into groups according to some characterstic that is important to the study, then sampling from each group
|
|
cluster sampling
|
the population is divided into groups called clusters by some means
|
|
categorical frequency distribution
|
used for datat hat can be placed in specific categories, such as nominal, or ordinal, level data
|
|
grouped frequency distribution
|
when the data must be grouped into classes that are more than one unit in width becasue the range of the data is large
|
|
class boundaries
|
used to separate the classes so that there are no gaps in the frequency distribution
|
|
class limits
|
represent the smallest and largest data value included in the class
|
|
histogram
|
a graph that displays the data by using contiguous vertical bars of various heights to represent the frequencies of the classes
|
|
cumulative frequency polygon
|
also ogive. a graph that represents the cumulative frequencies for the classes ina frequency distribution
|
|
stem and leaf plot
|
a data plot that uses part of the data as the stem and part of the data value as the leaf to form groups or classes
|
|
pareto chart (bar graph)
|
used to represent a frequency distribution for a catergorical variable, and the frequencies are displayed by the heights of vertical vars, whcih are arranged in order from highest to lowest
|
|
pie chart
|
a cirlce that is divided into sections or wedges according to the percentage of frequencies in each category of the distribution
|
|
scatter plot
|
a graph of order pairs of data values that is used to determine if a relationship exists between the two variables
|
|
statistic
|
a characteristic or measure obtained by using the data values from a sample
|
|
parameter
|
a charactersitic or measure obtained by using all the data values from a specific population
|
|
mean
|
the arithmetic average. found by adding the values of the data and dividing by the total number of values
|
|
median
|
the halfway point in the data set. first data must be arranged in order
|
|
mode
|
the most abundant data value
|
|
range
|
the highest value minus the lowest value
|
|
variance
|
teh average of the squares of the distance each value is from the mean
|
|
standard deviation
|
the square root of the variance
|
|
percentile
|
divide the data set into 100 equal parts
|
|
standard score (z score)
|
obtianed by subtracting the mean form the value and dividng the result by the standard deviation
|
|
outlier
|
an extrememly high or an extrememly low data value when compared wiht the rest of the data values
|
|
box plot
|
a graph of a data set obtained by drawing a horizontal line form the minimum data value to Q1, drawing a horizontal line form Q3 to the maximum data value, and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing throught the median or Q2
|
|
Probability Experiment
|
a chance process that lead to well-defined results called outcomes
|
|
outcome
|
the result of a single trial of a probability experiment
|
|
sample space
|
the set of all possible outcomes of a probability experiment
|
|
tree diagram
|
a device consisiting of line segments emanating from a starting point and also from the outcome point. It is used to determine all possible outcomes of a probability experiment
|
|
event
|
a set of outcomes of a probability experiment
|
|
classical probability
|
uses samples spaces to determine the numerical probability that an event will happen
|
|
emperical probability
|
relies on actual experience to determine the likelihood of outcomes. f/n
|
|
subjective probability
|
uses a probability value based on a n educated guess or estimate, employing opinions and inexact information
|
|
four probability rules
|
the probability of any event E is a number between 0 and 1; if an event E cannot occur, it's probability is 0; if an event E is certain, it's probability is 1; the sum of the probability of all the outcomes in the sample space is 1
|
|
rule for complementary events
|
if the probability of ane vent or the probability of its complement is known, then the other can be found by subtracting the probability from 1
|
|
mutually exclusive
|
probability events that cannot occur at the same time
|
|
addition rule
|
when two events A and B are mutually exclusive, the probability that A or B will occur is p(A and B)=P(A)+P(B)
|
|
addition rule
|
if a and b aren't mutually exclusive, then P(A and B)=P(A)+P(B)-P(A and B)
|
|
dependent events
|
when the outcome or occurance of the first event affects the outcome or occurance of the second event in such a way that the probability is changed
|
|
independent events
|
if A occurs and doesn't affect the probability of B occuring
|
|
multiplication rule when events are independent
|
P(A and B)=P(A)*P(B)
|
|
multiplication rule when events are dependent
|
P(A and B)=P(A)*P(B\A)
|
|
rules for least problems
|
P(E)=1-P(e`)
|
|
fundamental rule for counting
|
in a sequence of n events in which the first one has k, possibilities and the second event has k2, and the third has k3, and so forth, the total number of possibilities of the sequence will be k1*k2*k3...kn
|
|
factorial formula
|
for any counting n
n!=n(n-1)(n-2)...1 0!=1 |
|
permutation formula
|
nPr=N1/(n-r)!
|
|
combination formula
|
nCr=n!/(n-r)!r!
|
|
random variable
|
a variable whose values are determined by chance
|
|
probability distribution
|
the values a random variable can assume and the corresponding probabilities of the values
|
|
discrete probability distribution
|
consists of the values a random variable can assume and the corresponding probabilities of the values. the probabilities are determined theoretically or by observation
|
|
binomial experiment
|
a probability experiment that there is a fixed number of trials, each trial can have only two outcomes or outcomes that can be reduced to two outcomes, the outcomesof each trial must be independent of one another, the probability of a success must remain the same for each trial
|
|
binomial distribution
|
the outcomes of a binomial experiment and the corresponding probabilities of these outcomes
|
|
probability problems using formula
|
n!/)n-x)!x! *p^x * q^n-x
|
|
probability problems using table b
|
go to the n=x section, find the correct x section, go over to the correct p=w section and that is the answer. table b in appendix c
|
|
normal distribution
|
when the data values are evenly distributed about the mean
|
|
skewed distribution
|
when the majority of the data values fall to the left or right of the mean
|
|
properties of theoretical normal distribution
|
a normal distribution curve is bell-shaped. the mean, median, and mode are equal and are located at the center of the distribution. is unimodal. the curve is symmetric about the mean, which is equivalent to saying that its shape its shape is the same on both sides of a vertical line passing through the center. the curve is continuous: no gaps or holes. the curve never touches the x axis. the total area under a normal distribution curve is equal to 1.00. the area under the part of a normal curve that lies within 1 standard deviation of the mean is approximately 0.68; within 2 standard deviations, about 0.95; and within 3 standard deviations, about 0.997.
|
|
standard normal distribution
|
a normal distribution with a mean of 0 and a standard deviation of 1
|
|
sampling distribution of sample means
|
a distribution using the means computed from all possible random samples of a specific size taken from a population
|
|
properties of the distribution of sample means
|
the mean of the sample means will be the same as the population mean; the standard deviation of the sample means will be smaller than the standard deviation of the population, and it will be equal to the population standard deviation divided by the square root of the sample size.
|
|
central limit theorum
|
as the sample size n increases without limit, the shape of the distribution of the sample means taken with replacement from a population with mean m and standard deviation o will approach a normal distribution
|
|
parameter compared to statistics
|
parameter is for population, statistic is for sample
|
|
properties of a good estimator
|
unbiased, consistent, relatively efficient
|
|
interval estimate
|
an interval or range of values used to estimate the parameter
|
|
point estimate
|
a specific numerical value estimate of a parameter
|
|
confidence level
|
the probability that the interval estimate will contain the parameter, assuming that a large number of samples are selected and that the estimation process on the same parameter is repeated
|
|
t distribution compared to z distribution
|
the variance is greater than 1, the t distribution is actually a family of curves based on the concept of degrees of freedom, as the sample size increases, the t distribution approaches the standard normal distribution
|
|
statistical hypothesis
|
a conjecture about a population parameter; may or may not be true
|
|
null hypothesis
|
Ho; a statistical hypothesis that states that there is no difference between a parameter and a specific value, or between two parameters
|
|
alternative hypothesis
|
Hi; a statistical hypothesis that states the existence of a difference between a parameter and a specific value, or states that there is a difference between two parameters
|
|
statistical test
|
uses the data obtained from a sample to make a decision about whether the null hypothesis should be rejected
|
|
test value
|
the numerical value obtained from a statistical test
|
|
Type I Error
|
if you reject the null hypothesis when it is true
|
|
Type II Error
|
occurs if you don't reject eh null hypothesis when it's false
|
|
Level of significance
|
the maximum probability of committing a type I error. alpha
|
|
critical value
|
separates the critical region from the noncritical value
|
|
critical region
|
(rejection region) the range of values of the test value that indicates that there is a significant difference and that the null hypothesis should be rejected
|
|
noncritical region
|
the range of values of the test value that indicates that the difference was probably due to chance and that the null hypothesis shouldn't be rejected
|
|
one-tailed test
|
indicates that the null hypothesis should be rejected when the test value is in the critical region on one side of the mean
|
|
two-tailed test
|
the null hypothesis should be rejected when the test value is in either of the two critical regions
|
|
z-test
|
a statistical test for the mean of a population. used when n is greater than or equal to 30 or when the population is normally distributed and population standard deviation is known
|
|
t-test
|
a statistical test for the man of a population and is used when the population is normally or approximately normally distributed, population standard deviation is unknown
|
|
large independent samples
|
large samples that are not related
|
|
small independent samples
|
small samples that are not related
|
|
dependent samples
|
samples in which the subjects are paired or matched in some way, i.e. the samples are related
|
|
correlation
|
correlation is a statistical method used to determine whether a relationship between variables exists.
|
|
regression
|
regression is a statistical method used to describe the nature of the relationship between variables
|
|
simple relationships
|
(simple regression) there is one independent variable that is used to predict the dependent variable
|
|
multiple relationships (multiple regression)
|
two or more independent variables are used to predict one dependent variable
|
|
positive relationship
|
exists when both variables are used to predict one dependent variable
|
|
negative relationship
|
as one variable increases, the other variable decreases, and vice versa
|
|
independent variable
|
explanatory variable or a predictor variable
|
|
dependent variable
|
a response variable
|
|
scatter plot
|
a graph of the ordered paris (x,y) of numbers consisting of the independent variable and the dependent variable y
|
|
PPMC (Pearson product moment correlation coefficient)
|
a statistic used to determine the strength of a relationship when the variables are normally distributed
|
|
Correlation coefficient
|
a statistic or parameter that measures the strength and direction of a linear relationship between two variables
|
|
regression line
|
the line of best fit of the data
|
|
coefficient of determination
|
a measure of the variation of the dependent variable that is explained by the regression line and the independent variable
|
|
standard error of estimate
|
the standard deviation of the observed y values about the predicted y1 values in regression and correlation analysis
|