• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/24

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

24 Cards in this Set

  • Front
  • Back
Data mining definition
previously unknown and potentially useful patterns from lots of data
Data mining goal
a single target or outcome variable
supervised learning
target out come, training data, classification and prediction
unspervised learning
segment data, no target, assocation, visualization, reduction
overfitting
to much fit on a dataset, won't fit with new data
address overfitting issue with
training and validation sets
normalizing data
puts all variables on same scale
Association Rule, supervised or unsupervised?
Unsupervised
AR interpret Confidence
60% MEANS THAT 60% of customers that purchased A also bought B
Assocation Rules IF and then parts are called...
antecedent and consequent
Confidence % =
support(a,b)/support(a)
AR Lift =
confidence/support(b)
If lift < 1 then...
better off randomly choosing to get B
Supervised learning
you are trying to predict a variable, a specific outcome
This model can handle missing values
CART
Limitation of Logit
ANN is quicker, Ann has no hidden layer during logit or MLR
Calculate Logit P
p=1/(1+e^-z)
A benefit of ANN
can compute MLR or Logit with no hidden layers
MAE mEAN aBSOLUTE eRROR
average absolute value of errors
Average error
average of all errors
RMSE
square errors, find average, take sqrt
odds
where P is probability % p/(1-p)
How many clusters if trying to extract sub pops?
<10
How many cluster if trying to understand the major population?
>10