Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
39 Cards in this Set
- Front
- Back
Classical Test Theory
|
Any given score is combination of truth and error
|
|
Classical Test Theory: Variability
|
Total Variability in a group is combination of true score variability (differences in test taker, 'reliability') and error variability.
|
|
Sources of Error in Variability
|
Content Sampling (items do or don't tap into domain by chance)
Time Sampling Test Heterogeneity (more domains tapped into = more error due to chance) |
|
Reliability Coefficient (rxx or rtt)
Vs. Pearson R |
rxx/rtt
Range: 0.00-1.00 acceptable reliability: .80 True score variability: 80% Pearson R: -1 - +1 Square for true variability |
|
Factors Affecting Reliability
|
Number of Items
Range of Scores (full range + reliability) Homogeneity of Items Ability to Guess |
|
Estimates of Reliability:
Test-Retest reliability (AKA coefficient of stability) |
test steady over time
Source of Error: Time Sampling |
|
Estimates of Reliability:
Parallel Forms Reliability (AKA Coefficient of Equivalence) |
2 different forms given, 1 group.
Source of Error: Time sampling, content sampling |
|
Estimates of Reliability:
Internal Consistency Reliability -split half reliability -Kuder Richardson/Cronbachs Coefficient |
Consistency across items. Given 1 time, 1 group.
1. Split half reliability Spearman Brown Prophecy Formula: shows how much more reliability would be w/all items (vs. 1/2, which has lower reliability) -inappropriate for speeded (vs. 'power' tests). Source of error: Item/content sampling |
|
Estimates of Reliability:
Internal Consistency Reliability -Kuder-Richardson (kr20 & 21) & cronbach's coefficient alpha |
Compares all possible halfs
KR20/21 used for dichotimous data (20 if varying difficulty, 21 if not) Cronbachs coefficient alpha: for nondichotimous (likert type q's) |
|
Inter-rater Reliability
|
Used when ratings are subjectively scored.
Calculated w/Pearson R, Kappa, Yules Y. Improved by: Group discussion, practice exercises, feedback. |
|
Standard Error of Measurement
|
Avg amount of measurement error in any given test
Smea = SDx X sq rt 1-rxx |
|
Standard Error of Measurement Range
|
0.0 (perfect) - SD of test (whatever that is). Given formula (sd x sqrt 1-rxx)
|
|
*Calculating Confidence Bands:
-need to know (2 things): |
Need persons score on test ex. 120
Need persons standard error of measurement ex. 6 Add/subtract standard error of measurment. 68%: 114-126 95%: 108-132, 99%: 102-138 |
|
Validity - what are the three subtypes?
|
Content: is tool measuring skills it should be (expert validated)
Criterion: Is test accurate predictor? -2 subtypes (Concurrent/predictive) Construct: Is tool measuring trait it should be |
|
Criterion Related Validity:
Calculated by? Subtypes definitions? |
Pearson r - score on x/y (range -1 - +1) valid=.20 or higher
variance calculated by squaring: validity-.5 = 25% of varability in scores of y accounted for by x. |
|
Criterion Related Validity:
2 Subtypes |
Concurrent: Tests are given/measured about same time
Predictive: Predictor is done long after criterion is given (ex. SAT > College GPA. |
|
Review:
Standard Error of Mean Standard Error of the Measurement Standard Error of Estimate |
Error in group mean vs. population mean
error in scores based on given test Error in predictive power of a criterion |
|
Standard Error of Estimate formula
|
Sest = SDy x sq root of 1-rxy squared
|
|
Standard Error of Estimate range
|
0 - SD of criterion
|
|
Mnemonic estimate goes w/Y (criterion) - repairs, cry about estimate. Y so High?
Measurement goes w/X |
that's it
|
|
Criterion Related Validity Coefficient application:
Expectancy Table |
Table that shows probably a given score w/fall in a range of a predictor based on the criterion outcome.
|
|
*Taylor Russell Tables
Base Rate, Selection Ratio, Incremental Validity |
table that outlines how much better hiring decisions will be when using a test vs. no test
Base Rate: Optimal: .5 (moderate, 50% ee's successful), Selection Ratio: Optimal: .1 (large pool, 10 applicants per job) Incremental validity: Degree to which a new predictor will impact the critereon. |
|
*Taylor Russell Tables - Base Rate
|
Rate of successful employees w/out using any test.
ex. 80% are turn out good ee's (base rate=.80) |
|
*Taylor Russell Tables - Selection Ration
|
# of openings / # of applicants
ex. 1 opening, 10 applicants. 1/10, .10 selection ration. (low) |
|
Incremental Validity
|
Amount of improvement when using predictor test vs. no test
|
|
Taylor Russel Table
Incremental Validity Calculation |
Base rate (ex. .40) - 40% are good. W/Test, 65%.
Incremental Validity: .25 (amount of improvement) |
|
Taylor Russel Tables:
Factors affecting INcremental Validity |
1. Criterion Related Validity of instrument (rxy)
2. base rate 3. selection ratio |
|
How to optimize incremental validity
|
*moderate base rate (.5)
*Low selection ration (.1) |
|
Decision Making Theory:
4 options in predictions |
true pos (predicted good, did good)
False pos (predicted good, did bad) true neg (predicted bad, did bad) false neg (predicted bad, did well) |
|
Decision Making Theory:
How do you de crease false positives? |
increase predictor (1st choice, sometimes can't change criterion)
decrease criterion |
|
Developing a Predictor Test
|
1. Conceptualization (objective, administrtion, etc)
2. Test construction. Item format, write items. |
|
Item difficulty
|
want it between .3-.8 (30%-80% get it right)
|
|
Item Characteristic Curve
|
Shows degree to which item indicates testee has the trait being measured
|
|
Item Response Theory
|
Used to develop individually tailored, adaptive test. one answer determines following items.
|
|
Test Revision
|
Items are retained after validation, then cross validated. Results in SHRINKAGE (validity coefficient is smaller on new sample)
|
|
Factors Affecting Validity Coefficient
|
Range of Scores (want broad, unrestricted range)
Reliability of Predictor (caps validity) Reliability of Predictor and Criterion (correction for attenuation) Criterion Contamination: Y (criterion) is subjective scored, rater has knowledge of predictor. Inflates validity |
|
*Correction for Attenuation
|
tells you how much more valid instrument would be if X or Y was perfect
|
|
Construct Validity (trait)
2 subtypes |
Convergent (degree to which test aligns w/similar instruments - MonoTrait) - moderate/high correlation
Divergent (alignment with tests measuring different traits) - Heterotrait want low correlation |
|
Construct Validity - what table shows different validity types?
|
Multi-trait/Multi method matrix
|