term1 Definition1term2 Definition2term3 Definition3
Please sign in to your Google account to access your documents:
Week 01
Dependence
A condition in which two random variables are not independent. X and Y are positively dependent if the conditional probability, P(X|Y), of X given Y is greater than the probability of X, P(X), or equivalently if P(X&Y) > P(X)*P(Y). They are negatively dependent if the inequalities are reversed.Finally, the outcome of one depends on whether the other occurred.
Notice that in what I see as an extreme case to help illustrate this example the p(x) or p(y) or both could be zero in which case if (x&y) is not null or empty, it will be greater than zero the result of p(x)*p (y).
Now that if x and y are disjoint (mutually exclusive) then the p(x&y) is zero.
Therefore conditional probability does not include sets that are disjoint.
Examples: The probability of a union (marriage) vs. finding a single person
Or natural occurring elements in combination vs. individual components.
Week 02
Independence
Two processes are INDEPENDENT if knowing the outcome of one event provides no useful information about the outcome of the other event.
P(A|B) = P(A), then A and B are independent. In other words, B has no likelihood on the outcome of A
Complement
The sum of all variables are equal to 1 but cannot be greater than 1
Union
The joining of two sets to make one set where repeated (overlapping) elements are included only once.
The Venn Diagram is a good visual example. Union is considered disjoining and is called a disjunction where the components are disjuncts.
Law of Large Numbers
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times.
According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
p-value
P(observed OR more extreme outcome | HsubKnot is true).
The probability of obtaining a test statistic result at least as extreme or as close to the one that was actually observed, assuming that the null hypothesis is true.
Another way of saying this, is the p-value is the probability of observing data at least as favorable to the alternative hypothesis as our current data set, if the null hypothesis is true P(e|h), where e is the observed data, event or evidence and h is the hypothesis.
We typically use a summary statistic, in some cases the mean, to help compute the p-value, by finding the summary statistic's z-score, and evaluate the hypothesis. (See pg. 179 OpenIntro Statistics for and example of a z-score and corresponding p-value.)
A researcher will often "reject the null hypothesis" when the p-value turns out to be less than a predetermined significance level, often 0.05[3][4] or 0.01. Such a result indicates that the observed result would be highly unlikely under the null hypothesis.
Boxplot
In descriptive statistics, a box plot or boxplot is a convenient way of graphically depicting groups of numerical data through their quartiles. This is a good way of showing categorical to numerical variables. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points.
Box and whisker plots are uniform in their use of the box: the bottom and top of the box are always the first and third quartiles, and the band inside the box is always the second quartile (the median). But the ends of the whiskers can represent several possible alternative values, among them:
* the minimum and maximum of all of the data[1] (as in Figure 2)* the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile (often called the Tukey boxplot)[2][3] (as in Figure 3)* one standard deviation above and below the mean of the data* the 9th percentile and the 91st percentile* the 2nd percentile and the 98th percentile.
Barplot
Histogram
In statistics, a histogram is a graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson.[1]
A histogram is a representation of tabulated frequencies, shown as adjacent rectangles or squares (in some situations), erected over discrete intervals (bins), with an area proportional to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval.
The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1.
The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.[2] The rectangles of a histogram are drawn so that they touch each other to indicate that the original variable is continuous.
Normal Probability Plot
The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of
In a normal probability plot (also called a "normal plot"), the sorted data are plotted vs. values selected to make the resulting image look close to a straight line if the data are approximately normally distributed. Deviations from a straight line suggest departures from normality. The plotting can be manually performed by using a special graph paper, called normal probability paper. With modern computers normal plots are commonly made with software.
The normal probability plot is a SECIAL CASE of the Q–Q probability plot for a normal distribution. The theoretical quantiles are generally chosen to approximate either the mean or the median of the corresponding order statistics.
Need help typing ? See our FAQ (opens in new window)
Please sign in to create this set. We'll bring you back here when you are done.
Discard Changes Sign in
Please sign in to add to folders.
Sign in
Don't have an account? Sign Up »
You have created 2 folders. Please upgrade to Cram Premium to create hundreds of folders!