Normal Distribution

Pranav Garg
Nerd For Tech
Published in
4 min readMay 14, 2021

--

A Normal Distribution is also known as Gaussian distribution or Bell Curve. It is one of the most common distributions used in the field of statistics and Probability.

It has some basic properties:-
1. It is a continuous probability distribution.
2. A normal Distribution can have any mean and standard deviation but a normal distribution with 0 mean and a standard deviation unit is called a Standard Normal Distribution.
3. Mean, Mode, and Median are the same.
4. Total area under the curve has a value of 1.
5. Curve is symmetrical around the mean, that’s why it is also known as Bell-Shaped Curve.
6. Around 68.2% of the data is present within 1 Standard Deviation around the mean.
7. Around 95% of the data falls within two standard deviations of the mean.
8. Around 99.7% of the data falls within three standard deviations of the mean.
9. Increasing the mean moves the curve right, while decreasing it moves the curve left.
10. A small standard deviation results in a narrow curve, while a large standard deviation leads to a wider curve.

Normal Distribution has 2 basic parameters used for calculating the probability density function (pdf). Its formula is:-

pdf of Normal Distribution

The Normal Distribution curve is as follow:-

Normal Distribution Curve

Normal Distribution works basically over the Central limit Theorem. It says that if we take large samples from the population with replacement, then the distribution of the sample will follow the Normal Distribution Curve.

As per convention, a sample size of count 30 or more is considered large.

To convert a Normal Distribution into a Standard Normal Distribution, one has to standardize the data points, such that its mean becomes 0 and standard deviation becomes 1. Its formula is:-

Standardization ( Z-value )

Let's see how to calculate the normal Distribution using Python Libraries. We will be covering some scenarios like calculating the Probability of Specific Data Occurance or percentage of the value in a given range.

Importing Libraries

from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Plotting a Normal Distribution Curve

## plotting a normal standard distribution curvex = np.arange(1,10, 0.01)
mean = np.mean(x)
sd = np.std(x)
pdf = norm.pdf(x, loc = mean, scale = sd)
sns.lineplot(x=x, y=pdf)
plt.show()

In the above code, we are making a list of the data in between 1 to 10 having a step size of 0.01. After that, we calculate the mean and standard deviation of the list and pass them as arguments along with the data in the norm.pdf function to get the Probability Density Function of the data. Here scale refers to the standard deviation and loc refers to the mean of the data.

Normal Distribution Curve

Let's see now how to get the probability of a specific data point through python code.

Consider the scenario, where you want to know the probability of getting SAT scores less than 1250 and you know that scores are normally distributed and the sample mean of the scores is 1120 and the standard deviation is 140.

from scipy.stats import normmean = 1120
sd = 140
prob = norm(loc = mean, scale = sd).cdf(1250)
print(round(prob * 100,2))

The probability would be 82.34% that your SAT score will be less than 1250.
Technically we are calculating the area under the curve till 1250. It is just Integration of the density function with limits equals -∞ to 1250.

Consider another scenario, where you want to know the probability of getting SAT score between 1100 and 1250.

from scipy.stats import normmean = 1120
sd = 140
upper_prob = norm(loc = mean, scale = sd).cdf(1250)
lower_prob = norm(loc = mean, scale = sd).cdf(1100)
prob = upper_prob - lower_prob
print(round(prob * 100,2))

The probability would be 38.02% of getting marks in between 1100 and 1250.

Consider one more scenario, where you want to find the percentile of the data in the given distribution. Considering the same data mentioned above, let's calculate the 82.34 percentile of the given distribution. There are 2 methods to do this.

## method 1val1 = norm.ppf(q = .8234, loc = mean, scale = sd) print(round(val1,1))## method 2val2 = ( norm.ppf(.8234) * sd ) + mean
print(round(val2,1))

In method 1, define a ppf function of norm and pass all the details as percentile, mean and standard deviation, this will directly give the value of the data, i.e. 1250.0
Second method is that make a ppf function of the percentile and multiply it by standard deviation then add it to the mean of the distribution, it will give the same result, i.e. 1250.0

norm.ppf(.9)

This function gives the z value for the 90th percentile.

If you liked and appreciate the writing, 👏👏 clap, and share.

You can find the code here
You can connect with me on Linkedin

--

--