Normal distributions - Aidan Helfant's Digital Garden

up:: [[Statistics MOC]]k Tags:: # Normal distributions ^0dcb22 Normal distributions are a type of distribution in which most of the values cluster in the middle and the rest taper of as you get closer to the extremes. One of the most common distribution graphs if you convert a sample or populations [[Measures of Central Tendency in Distributions#^deb895|mean]] and it's [[Variance and Standard Deviation#^36d688|standard deviations]] into [[Variance and Standard Deviation#^eb8a36|z scores]] is to get a normal distribution which looks like these. ![[Pasted image 20220908081758.png]] Interestingly for a perfectly normal distribution, a z score of +1 will always have a percentile rank of 34.13%, a z score of +2 will always be 13.59% and a z score of +3 will always be 2.28%. Using this fact allows you to find the percentile ranks of scores through their z score on a normal distribution. But what if they aren't even whole numbers? ### The Unit Table The unit table gives a full listing of all the z scores in a perfectly normal distribution. This allows you to find the relevant percentile ranks for any score. ![[Pasted image 20220908082218.png]] With the unit table you can find the proportion corresponding to a specific z score, to a specific value of x, or the x value or z score corresponding to a particular proportion. Notice that finding these proportions is often the same as finding the [[Variables, Scales, Real Limits, Percentile ranks#^13693d|percentile ranks]] which gives the cumulative percentage associated with a particular score. The [[p-value]] is the probability of obtaining your test statistic or a more extreme result given that the null hypothesis is true. You can also find the p-value or probability of a z score using the unit table. For example, find the p-value of z = +- 1.55: ![[Pasted image 20220914154134.png]] In this example the proportion comes out to .935. 1 - .935 = .065. But because this is a two tailed test we have to double the proportion meaning the p-value corresponding to a z score of +- 1.55 is .13. ## Assessing How Unusual a Given Sample Mean Is ^a3a2c3 The previous calculations all are calculated with the idea that the sample size is 1. But what happens if we have a different sample size? ### Central Limit Theorem The [[Central Limit Theorem (CLT)]] states that for any population with mean μ and standard deviation σ (irrespective of the shape of the distribution!), the complete distribution of sample means for sample size n will have a mean of μ and a standard deviation of σ/sqrt(n) and will approach a normal distribution as n approaches infinity. But we can reasonably estimate the distribution is normal when n is 30 or greater or the population is normally distributed. ### Steps for Assessing How Unusual a Given Sample Mean is 1. Find standard error of means for that sample size with the equation σ/sqrt(n) 2. Find the deviation of the sample mean by subtracting the sample mean score from the population mean μ 3. Find the z score for the sample mean by using the z score equation with the standard error of the means as the denominator ## Rstudio The Functions pnorm and qnorm You can use the R functions pnorm() and qnorm() instead of using the unit normal table. In its simplest form, pnorm() converts Z‐scores to the area under the curve that is to the left of that Z‐score. The qnorm() function does the opposite: given a probability, it will return the Z‐score that is the boundary of that probability, when that probability is to the left of the Z‐score. More generally, Given a value X, and the mean and standard deviation of a normal distribution, pnorm() returns the probability that a random value drawn from that normal distribution is less than the given argument X (or, if you change one of the arguments from its default, the probability that a random value is greater than X). If you want to work with sample means instead of single scores, you simply generate the appropriate distribution of sample means and use pnorm() on that. You tell the pnorm() function at least three things:  The abscissa coordinate at which you want to draw your boundary.  The mean of the normal distribution (defaults to 0)  The standard deviation of the normal distribution (defaults to 1) (REMEMBER: these two values plus the knowledge that a distribution is normal will completely define the distribution and its underlying probabilities). Related: Created: [[28-09-2022]] ___ # Resources