Skip to content

Effortlessly Using the Normal Cumulative Distribution Function in Python

[

Introduction to Statistics in Python Course Outline

Summary Statistics

  • In this chapter, you’ll explore summary statistics including mean, median, and standard deviation.
  • Learn how to accurately interpret summary statistics.
  • Develop critical thinking skills to choose the best summary statistics for your data.
  • View Chapter Details

Random Numbers and Probability

  • Learn how to generate random samples and measure chance using probability.
  • Work with real-world sales data to calculate the probability of a salesperson being successful.
  • Use the binomial distribution to model events with binary outcomes.
  • View Chapter Details

More Distributions and the Central Limit Theorem

  • Explore the normal distribution, one of the most important probability distributions in statistics.
  • Create histograms to plot normal distributions.
  • Gain an understanding of the central limit theorem.
  • Expand your knowledge of statistical functions by adding the Poisson, exponential, and t-distributions to your repertoire.
  • View Chapter Details

Distribution of Amir’s Sales

To understand the distribution of Amir’s sales, we can utilize the normal distribution. Here’s a sample code in Python to plot a histogram of Amir’s sales data:

import numpy as np
import matplotlib.pyplot as plt
sales_data = [30, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100]
# Plotting the histogram
plt.hist(sales_data, bins=5, edgecolor='black')
plt.xlabel('Sales')
plt.ylabel('Frequency')
plt.title('Distribution of Amir\'s Sales')
plt.show()

Probabilities from the Normal Distribution

To calculate probabilities from the normal distribution, we can use the norm function from the scipy.stats module. The following code demonstrates how to calculate the probability of sales falling within a certain range:

from scipy.stats import norm
mean = 70
std = 10
# Calculate the probability of sales between 60 and 80
prob = norm.cdf(80, mean, std) - norm.cdf(60, mean, std)
print(f"The probability of sales between 60 and 80 is: {prob:.2f}")

Simulating Sales under New Market Conditions

If you want to simulate sales under new market conditions, you can use the norm.rvs function to generate a random sample from a normal distribution. Here’s an example code snippet:

from scipy.stats import norm
mean = 70
std = 10
# Simulate sales for 100 days
sales_simulated = norm.rvs(loc=mean, scale=std, size=100)
print("Simulated sales data for 100 days:")
print(sales_simulated)

The Central Limit Theorem

The central limit theorem states that the sampling distribution of the mean approaches a normal distribution as the sample size increases. To visualize sampling distributions, you can use the following code:

import numpy as np
import matplotlib.pyplot as plt
# Generate 1000 samples of size 30 from a uniform distribution
samples = [np.random.uniform(0, 1, 30) for _ in range(1000)]
# Calculate the mean of each sample
means = [sample.mean() for sample in samples]
# Plot the sampling distribution
plt.hist(means, bins=30, edgecolor='black')
plt.xlabel('Mean')
plt.ylabel('Frequency')
plt.title('Sampling Distribution')
plt.show()

The Mean of Means

The mean of means refers to the fact that the average of sample means is equal to the population mean. Here’s a code snippet to demonstrate this:

import numpy as np
population = np.random.normal(70, 10, 1000)
# Generate 50 samples of size 100 from the population
samples = [np.random.choice(population, size=100) for _ in range(50)]
# Calculate the mean of each sample
sample_means = [sample.mean() for sample in samples]
# Calculate the average of sample means
mean_of_means = np.mean(sample_means)
print(f"The mean of means is: {mean_of_means:.2f}")

The Poisson Distribution

The Poisson distribution is used to model the number of events occurring in a fixed interval of time or space. To identify the lambda parameter for a Poisson distribution, you can use the following code:

import numpy as np
import matplotlib.pyplot as plt
# Generate 1000 random numbers from a Poisson distribution with lambda = 5
data = np.random.poisson(lam=5, size=1000)
# Plot the histogram
plt.hist(data, bins=10, edgecolor='black')
plt.xlabel('Number of Events')
plt.ylabel('Frequency')
plt.title('Poisson Distribution')
plt.show()

Tracking Lead Responses

To track lead responses, you can use the exponential distribution. The following code snippet demonstrates how to generate random lead response times:

from scipy.stats import expon
# Generate 1000 random lead response times from an exponential distribution with lambda = 0.5
response_times = expon.rvs(scale=1/0.5, size=1000)
print("Lead Response Times:")
print(response_times)

More Probability Distributions

In addition to the normal and Poisson distributions, there are many other probability distributions available in Python. Some examples include the exponential distribution, gamma distribution, and beta distribution.

The t-Distribution

The t-distribution is used when the sample size is small or the population standard deviation is unknown. To utilize the t-distribution, you can use the t function from the scipy.stats module. Here’s an example code snippet:

from scipy.stats import t
# Calculate the probability of a t-value larger than 2 for a t-distribution with 10 degrees of freedom
prob = 1 - t.cdf(2, df=10)
print(f"The probability of a t-value larger than 2 is: {prob:.2f}")

Correlation and Experimental Design

In this chapter, you’ll learn how to quantify the strength of a linear relationship between two variables. You’ll also explore how confounding variables can affect the relationship between two other variables. Furthermore, you’ll understand how a study’s design can influence its results and potentially affect the reliability of your conclusions.

Note: This article is a summary of the Python Learn course on Introduction to Statistics. The examples and code snippets are intended to help learners understand the concepts and apply them in Python programming.