Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
43 Cards in this Set
- Front
- Back
Hypothesis space for a decision tree |
All possible decision trees
|
|
Hypothesis space for a learner |
All possible outcomes for the specific learner |
|
How do you construct a decision tree? |
Ask questions about status/target/attributes sequentially |
|
What is the definition of entropy? |
Sum[i]( -p_i*log2(p_i) ) Wher p_i is the probability of event i |
|
What is overfitting? |
Overfitting is when the learned models are overly specialized for the training samples. This leads to poor generalization |
|
Name two reasons for overfitting |
- Non-representative sample - Noisy examples - Too complex model |
|
What is Occam's razor? |
"The simplest solution is often the correct one" |
|
How do you prevent overfitting? |
- Seperate available data into two sets; one for training and one for validation. |
|
What is "bagging"? |
Bootstrap aggregration for decision trees. |
|
Which error is produced by bias? |
The difference between the average (expected) prediction of our model and the correct value. |
|
Which error is produced by variance? |
The variability of a model prediction for a given data point between different realizations of the model. |
|
What is linear regression? |
Linear regression tries to estimate a function f which predict the output of the model |
|
What is RANSAC an abbreviation for? |
RANdom SAmpling Consensus |
|
Describe the RANSAC algorithm |
Using a randomly selected set S, determine which points in S are within a given distance to the model. If the number of point in S which satisfy the distance criteria are greater than some threshold, re-estimate the model using the points in S. Repeat the above N times and select the largest set S, consensus set, and re-estimate using this S. |
|
What is the difference between Ridge Regression and Least Squares? |
Ridge Regression uses a shrinkage penalty factor. |
|
Which features does Ridge Regression include? |
All features are included when using Ridge Regression |
|
What does the acronym 'The Lasso' stand for? |
Least Absolute Shrinkage and Selection Operator |
|
What is a mathematical benefit of using Lasso over Ridge? |
Some of lasso's coefficients will be exaclty zero |
|
What is a discrete value? |
A discrete value is a value from a predefined set |
|
How can we tell that two events are independent of each other? |
P(A|B) = P(A)
|
|
Which type of value requires classification, discrete or continous?
|
Discrete-value problems use classification. |
|
Which type of value requires regression, discrete or continous? |
Continous-value problems use regression. |
|
On which assumption is the Naive Bayes Classifier based? |
That all events are independent |
|
What is the basic premise of an artificial neuron? |
Using several inputs, construct a value representing all the inputs, compare against a threshold and return a +/- answer (usually). |
|
When does Perceptron Learning converge? |
Always, if the problem is solvable. |
|
Using Perceptron Learning, when does weight change? |
When the output is wrong. |
|
What is another name for Delta Rule? |
LMS-rule |
|
Using the Delta Rule, when do weights change? |
Always, the separating plane is always nudged a little. |
|
When does LMS-rule converge? |
Only in the mean |
|
When does the Delta Rule converge? |
Only in the mean |
|
What is an advantage of using LMS over Perceptron? |
LMS will find an optimal solution even if the problem can't be fully solved. |
|
When using hyperplanes, one faces certain structural risks. What is one counter-measure? |
The use of margins, which allows for some buffer surrounding the hyperplane. |
|
Which problems might occur when scattering low-dimension data into higher dimensions?
|
1. Many free parameters -> bad generalization
2. Extensive computation |
|
What is the main purpose of the kernel function when working with SVM's? |
To transform low-dimensional data into high-dimensional data, but only by using scalar products of the low-dimensional values. |
|
Name two common types of kernel functions |
Polynomial kernels Radial base kernels |
|
What is the point of having multiple layers in Artificial Neurons? |
A layered network can create arbitrary decision surfaces, i.e. non-linear. |
|
Are multi-layered ANN's continuous or discrete in values? |
The threshold functions output is continuous and the input signal may be of varying character. |
|
What is the difference between the output of single layered and multi layer Artificial Neurons? |
The first has discrete value output, and the second continuous. |
|
What is the difference between Decision Forest and Bagging? |
Decision Forest, also known as Random Forest, is a combination of Bagging and a random feature selection. |
|
What are ensemble methods? |
Ensemble methods combine weak learners to harness their combined strengths |
|
What is the general idea behind the ensemble method Boosting? |
Combine multiple hypotheses |
|
When using BackProp in an ANN, what is a hypotheses? |
A set of weights for all the connections |
|
|
|