Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
machine learning

Machine Learning

by arielalvarez88, Nov. 2016

Favorite

Add to folder

Flag

Related Essays

Unit 4 Pestle Analysis Paper
Quartiles are a procedure of diagnosing the way values are utilized to part an arrangement of numbers into four equivalent gatherings. You initiate by puttin...
Pros And Cons Of AI: Machine Learning
A statistics term which means what it sounds. Loads of data from which patterns and results are inferred. Examples include visits to sites, social networking...
Subcategories Of Artificial Intelligence
Machine learning is the subfield of artificial intelligence that deals with training an agent to perform a task or a set of tasks, without the need to be exp...
Differences Between Experiments And Demonstrations Of A Laboratory
Presentation and analysis of data: Statistics is a useful way to collect and organize data to make future inferences and to refer back to. Through the help...
Paper Bag Princess Monologue Analysis
I understand through statistics and regression. Thank you for sending her assessment. I would like to set a conference time with you next week. Just let m...
EDRS 673: Pre-Generated Data
On the other hand, z-cores and t-tests are statistical methods I struggle to fully understand. In general, I understand the purpose of each concept, but I am...
Independent Variable Essay
Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Toget...
Edl 690 Unit 6 Paper
After loading the NVivo reports in Blackboard, we reviewed inferential statistics. I modelled how to use SPSS to conduct independent samples t tests, one-w...
Gastric Cancer Research Paper
We perform feature extraction, selection and ranking technique to identify the pertinent set
Rhetorical Questions
All these features can be further explored in depth thus acting as a baseline to the research involving RQs and their use for

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/26

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

26 Cards in this Set

Front
Back

	How to obtain the probability of getting two specific outcomes if the events are mutually independent?	Multiply the two probabilities
	Bayes Equation for P(x\|Oc)?	P(x\|Oc)= P(Oc\|x) * P(x) / P(Oc)
	In P(x\|Oc) , how is it P(Oc) called?	Marginal likelihood
	What is Marginal Likelihood and why it is not so important?	In P(x\|Oc) the Marginal Likelihood is P(Oc). When comparing two posterior probabilities for the same problem it is not that important because, both will have the same Marginal likelihood, so the factor between them will be the same (P1 will still be 10 times grater than P2). If you are want exact values rather than comparing then it is important.
	What is the likelyhood?	Its a probability than can be derived from data alone. It doesn't take into account previous knowledge.
	What is the posterior probability?	Posteriors probability depends not only data but in previous experience.
	What does PCA does?	Founds components (vectors) in the direction that maximizes the variance of the data sample compared to itself (not to a bigger group). PCA finds several vectors that are perpendicular to each other.
	What is normalize?	Rescale all values of a sample between 0 and 1.
	What is Standarization?	Rescale all values by subtracting the mean and dividing by the std deviation. It will leave the mean of the population in 0 and the standard deviation in 1.
	What t-SNE?	It is a graphical representation technique in 2D of an N-dimension dataset. It can give you some information about how close the samples are, even could visually separate samples in clusters.
	What values of Learning rate to use to test t-SNE?	between 50 and 200.
	What is the null hypothesis of Pearson Correlation?	There is no statistically significant relationship between the [insert variable 1] and [insert variable 2]. In other words: Variable 1 varies independently from Variable 2
	What does a low PValue means?	That the null hypothesis is wrong.
	What is a confusion Matrix?
	Formula of precision and what does it means?	High precision means that when you say something is of the being evaluated, you are right most of the time. Is how good is your possitive prediction with respect to all your possitive predictions.
	Formula Recall?	High Recall means that most span emails are classified correctly
	Formula F1 score?
	What are the 4 C's of Machine Learning?	Correcting (fixing wrong values like 800 in age it is probably 80, or N in sex is probably male), Completing (fillna with mean for example), Creating (creating features from others, bins with qcut and cut), Converting (formating, string to categorical)
	What does pd.qcut does?	Divide the range of values in a sample (from min to max) in X ranges. All ranges will have the same amount of samples but they their length can vary. In other words it founds the limits of the quantiles. If X = 4, it founds the 4 value ranges which are the boundaries of the quartiles. Then it maps each value to the range it is in.
	What does pd.cut does?	It divides the sample range (from min to max) in find X (a param) ranges of the same length (max-min = same for all ranges) and maps each sample to the range it is in.
	What is a quantile?	It is a division of the sample in a way that all divisions has the same number of sample. If the sample is divided in 4 quantiles it is called a quartile.
	How to calculate Inter Quartile Range?	IQR = Q3 − Q1
	What is a confusion matrix?	See picture. Reference is the actual response (y values) and prediction what you predicted
	What is a good tool to see which features are good to cluster or classify?	The density plot. It shows you how much each feature overlaps in each class. Features that don't overlap much, e.g. Petal.length and Petal.width in the picture, are good for classification and clustering.
	When using train/validation/test, what temptation you need to resist when comparing models?	If we have 2 models (doesn't matter if they use different algorithms or is the same with different hyper params values), and 1 performs best with validation data, that is the one you want to use. You don't want to fall in the temptation of choosing the one that perform best in test data.
	What package you can use in R for kfold and tunning for classification and regression?	caret package has a train method knn_fit <- train(as.factor(V11)~V1+V2+V3+V4+V5+V6+V7+V8+V9+V10, data, method = "knn", # choose knn model trControl=trainControl( method="repeatedcv", # k-fold cross validation number=10, # number of folds (k in cross validation) repeats=5), # number of times to repeat k-fold cross validation preProcess = c("center", "scale"), # standardize the data tuneLength = kmax) # max number of neighbors (k in nearest neighbor)

Share This Flashcard Set