Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
28 Cards in this Set
- Front
- Back
RANSAC |
Random sampling consensus 1. Sample two points to estimate line 2. Calculate numer of inliers or posterior likelihood for relation 3. Choose relation to maximize number of inliers (in LMedS, Least Median of Squares) 4. Calculate error of all data 5. Choose relation to minimize median of errors |
|
Robust regression |
Gives good model estimate even if many outliers |
|
LMS |
Sample min number of matches to estimate model, calculate error (residual) for all data, choose sample that minimized median of residuals |
|
Problems with regression methods |
ransac: Need to choose good threshold lms: no good solution if outliers > 50% k-NN: performs worse in higher dimensions |
|
k-NN regression |
Similar to k-NN classifier (choose k closest points and take the average of responses) Larger K = lower variance |
|
Ridge regression |
Sacrifice bias to decrease variance in LMS. Reduces coefficients |
|
Lasso |
Least Absolute Shrinkage and Selection Operator. Same as ridge regression, but can zero coefficients (perform feature selection) |
|
Probability Based Methods: Pros/Cons |
Pros: All aspects of learning, modelling and inference can be cast under the same theory Cons: Hard to derive closed solutions (need approximations), inefficient for large datasets |
|
Inference |
Given training data D, estimate posterior probability of answer y: P(y|x, D) |
|
Inference parametric/non-parametric |
P: Estimate optimal parameter θ^ from data and use it to compute posterior NP: Estimate posterior by marginalizing out the parameter θ (use data instead) |
|
Occam's razor |
Choose the simplest explanation for the observed data |
|
Naive bayes: What if none of instances with target val y have attr. xi? |
Add pseudocounts (a form of regularization, or smoothing) |
|
Perceptron learning |
Incremental learning, weights only change if values are wrong. wi <- wi + n(t - o)xi Always converges if problem is solvable |
|
Delta rule |
Incremental learning, weights always change. wi <- wi + n(t - wTx)xi Converges in the mean Will find optimal solution even if problem cannot be solved |
|
Bagging |
Bootstrap AGgregatING (Only good for high V, low B classifiers) Use bootstrap replicates of training data - sample with replacements On each replicate use one model |
|
Decision trees - Variance/Bias? |
High variance (dependent on training data) Low bias (Averages decision boundaries that are good approximations to the decision boundary) |
|
Boosting |
Loop: 1. Apply learner to weighted samples 2. Increase weights of misclassified samples |
|
Adaboost |
1. Train weak classifier (choose one with lowest error) 2. Compute reliability coefficient (error must be lower than 0.5, otherwise, break) 3. Update weights 4. Normalize the weights |
|
Boosting properties |
Test error -> 0 asymptotically Why? Algorithm not satisfied with getting 0 training error |
|
Weak classifier |
ht(x) = 1 if f^jt(x) > theta t -1 otherwise Corresponds to a filter type and a threshold |
|
Dropout |
Remove unnecessary nodes in convolutional neural networks (ConvNets) |
|
Random forests |
1. Sample training data (same as bagging) 2. Feature selection at each node |
|
PCA |
Two criterion can be used: 1. Maximize variance 2. Minimize average squared error between x and its approximation |
|
Information compression |
Extract class characteristics, throw away the rest |
|
CLAFIC |
Class featuring information compression |
|
Describe subspace methods for classification |
For each class, compute a low dimentional subspace that represents the distribution in the class Determine the class of unknown input by comparing which subspace best approximates the input |
|
RSS |
Residual sum of squares |
|
EM |
Expectation maximization |