Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
26 Cards in this Set
- Front
- Back
Define regression
|
A method for determining the mathematical formula relating the variables
|
|
Define correlation
|
A method for determining the strength of the relationship between the variables.
|
|
What are applications for regression analysis?
|
Forecasting
|
|
Define cross-sectional data
|
the observations relate to different people at one point in time
|
|
Define time-series data
|
each observation relates to a different point in time
|
|
When the association between the variables is high with high and low with low, what type of correlation is this called?
|
Positive correlation
|
|
When the association between the variables is high with low and vice versa, what type of correlation is this?
|
Negative Correlation
|
|
Define simple linear regression
|
The task of finding the values of a and b which provide the best connection between the two variables.
|
|
Please explain the meaning of the constants in the equation:
y = a + bx |
a is the intercept of the y axis
b is the slope of the line |
|
A positive slope vs a negative slope - what does that mean for correlation?
|
A positive slope means there is a positive correlation.
A negative slope means there is a negative correlation. |
|
Define residual
|
The difference between the actual and fitted values for y.
|
|
Simple linear regression is the process of
|
deciding which is the best straight line through a set of points, so that the residuals are as small as possible.
|
|
What possible approaches to deciding which is the best straight line through a set of points?
|
1. Make the best line for which sum of residuals is the least.
2. Make the sum of the absolute values of the residuals as small as possible. 3. (traditional) The least squares method. |
|
Define the Least Squares method
|
the sum of the squared residuals is a minimum.
|
|
How is correlation measured?
|
By calculating the Correlation Coefficient - r. r can take on all values from -1 to +1. Close to -1 or +1 indicate a strong correlation. Close to 0 indicates a weak correlation.
|
|
Define the explained variation
|
The variation in y that is caused by x.
|
|
How is explained variation measured?
|
From the difference between the fitted y value and the average y value.
|
|
What is the essence of correlation?
|
when regression analysis is carried out, the variation in y is split in 2:
a. a part that is 'explained' by associating the y values with the x, and, b. a part that is unexplained since the relationship is an approximate one and there are residuals |
|
What are the general limits for correlation?
|
r > = .75 - highly satisfactory
r = .5 - .75 - adequate r < .5 - serious doubts |
|
What further tests must be made to confirm that the linear equation sufficiently describes the relationship between the variables?
|
1. a visual check of the randomness of the residuals, as plotted on a graph of the linear equation.
2. a scatter diagram of the residuals against the fitted y values |
|
List the 4 steps in regression and correlation.
|
1. inspecting the scatter diagram
2. calculating the regression coefficients 3. calculating the correlation coefficients 4. checking the residuals for randomness |
|
What is a serial correlation in the residual?
|
Occurs particularly in time-series data where there may be some time-related cycle.
|
|
Define heteroscedasticity
|
a tendency for residuals to vary in size at different parts of the line. Likely to occur in cross sectional data when the size of residuals is related to the x value.
|
|
Explain reservations about Regression and Correlation
|
1. Causality - largest source of confusion and error. While association can be determine by analysis, causality cannot be confirmed.
2. Spurious regressions - correlation coefficient is high, but no relationship. Error in setting up model. 3. Extrapolation - should be avoided (using equation outside range of data). Done in forecasting - but must be wary. 4. Regression to single sets of data, when perhaps two lines are more appropriate. 5. Least-squares criterion can be misleading by being too precise. 6. Least-squares has been applied to regressions of y on x. |
|
Regression analysis is used ...
|
to include both regression and correlation
|
|
Forecasting is base on ...
|
regression analysis
|