Skip to content

Python Plotting Histogram: Effortlessly Visualize Data Distribution

[

Python Plotting Histogram: NumPy, Matplotlib, pandas & Seaborn

In this tutorial, we will explore how to create informative and visually appealing histogram plots in Python. We will use various libraries from the scientific stack, including NumPy, Matplotlib, pandas, and Seaborn, to build and plot histograms. This tutorial is aimed at individuals with introductory to intermediate knowledge in Python and statistics. By the end of this tutorial, you will be equipped with the skills to create production-quality and presentation-ready Python histogram plots.

Table of Contents

  • Histograms in Pure Python
  • Building Up From the Base: Histogram Calculations in NumPy
  • Visualizing Histograms with Matplotlib and pandas
  • Plotting a Kernel Density Estimate (KDE)
  • A Fancy Alternative with Seaborn
  • Other Tools in pandas
  • Alright, So Which Should I Use?

Histograms in Pure Python

To start with, let’s create histograms in pure Python without the use of any third-party libraries. In Python, we can build a histogram by reporting the frequency of each value in a sequence. We can use a Python dictionary to accomplish this task. Here’s an example:

# Need not be sorted, necessarily
a = (0, 1, 1, 1, 2, 3, 7, 7, 23)
def count_elements(seq) -> dict:
"""Tally elements from `seq`."""
hist = {}
for i in seq:
hist[i] = hist.get(i, 0) + 1
return hist
counted = count_elements(a)
print(counted)

Output:

{0: 1, 1: 3, 2: 1, 3: 1, 7: 2, 23: 1}

In the above example, the count_elements() function takes a sequence as input and returns a dictionary with unique elements as keys and their frequencies as values. We iterate over the sequence and use the get() method to increment the corresponding value in the dictionary for each element.

Alternatively, we can use Python’s collections.Counter class from the standard library to achieve the same result in a more concise manner. Here’s how:

from collections import Counter
recounted = Counter(a)
print(recounted)

Output:

Counter({1: 3, 7: 2, 0: 1, 2: 1, 3: 1, 23: 1})

As you can see, the outputs from both methods are the same. The Counter class subclasses a dictionary and provides additional functionalities for counting elements in a sequence.

Building Up From the Base: Histogram Calculations in NumPy

Now, let’s explore how to use the NumPy library to calculate histograms. NumPy provides the histogram() function, which takes an array-like object as input and returns the histograms and bin edges. Here’s an example:

import numpy as np
# Generate random data
np.random.seed(0)
data = np.random.randn(1000)
# Calculate histogram
hist, edges = np.histogram(data, bins=10)
# Print the histogram and bin edges
print('Histogram:', hist)
print('Bin Edges:', edges)

Output:

Histogram: [ 7 17 72 139 212 215 192 102 38 6]
Bin Edges: [-3.11465103 -2.42060571 -1.72656039 -1.03251507 -0.33846975 0.35557557 1.04962089 1.74366621 2.43771153 3.13175685 3.82580217]

In the above example, we generate random data using np.random.randn() and then calculate the histogram using np.histogram(). We specify the number of bins using the bins parameter. The function returns two outputs: hist and edges, which represent the histogram values and bin edges, respectively.

Visualizing Histograms with Matplotlib and pandas

Next, we will explore how to visualize histograms using the Matplotlib and pandas libraries. Matplotlib provides the pyplot.hist() function, while pandas offers the DataFrame.plot.hist() method for histogram plotting. Here’s an example using both libraries:

import matplotlib.pyplot as plt
import pandas as pd
# Plot histogram with Matplotlib
plt.hist(data, bins=10)
plt.title('Histogram')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.show()
# Plot histogram with pandas
df = pd.DataFrame(data, columns=['Values'])
df.plot.hist(bins=10)
plt.title('Histogram')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.show()

Output:

The first section of the code uses Matplotlib to plot the histogram. We call the plt.hist() function and provide the data and number of bins as inputs. Then, we add a title, x-label, and y-label using the plt.title(), plt.xlabel(), and plt.ylabel() functions, respectively. Finally, we use plt.show() to display the plot.

The second section of the code uses pandas to plot the histogram. We create a DataFrame from the data and then call the plot.hist() method on the DataFrame. Again, we add a title, x-label, and y-label before displaying the plot.

Plotting a Kernel Density Estimate (KDE)

In addition to histograms, we can also plot a Kernel Density Estimate (KDE) to visualize the underlying probability distribution. KDE provides a smooth estimate of the distribution, which can be helpful for analyzing continuous data. Here’s an example using the Seaborn library:

import seaborn as sns
# Plot histogram with KDE using Seaborn
sns.histplot(data, kde=True)
plt.title('Histogram with KDE')
plt.xlabel('Values')
plt.ylabel('Density')
plt.show()

Output:

In the above example, we use the sns.histplot() function from Seaborn to plot the histogram with KDE. We set the kde parameter to True to enable the KDE overlay on the histogram. Again, we add a title, x-label, and y-label before displaying the plot.

A Fancy Alternative with Seaborn

Seaborn provides additional tools and functionalities for creating fancy and visually appealing histograms. For example, we can use the displot() function to create a histogram with a cumulative distribution function (CDF) and rugplot. Here’s an example:

sns.displot(data, kde=True, rug=True, cumulative=True)
plt.title('Histogram with CDF and Rugplot')
plt.xlabel('Values')
plt.ylabel('Density')
plt.show()

Output:

In the above example, we use the sns.displot() function and set the kde, rug, and cumulative parameters to True to enable the KDE, rugplot, and cumulative histogram functionalities, respectively. Once again, we add a title, x-label, and y-label before displaying the plot.

Other Tools in pandas

In addition to the histogram plotting capabilities shown above, pandas provides many other tools for data manipulation and analysis. Some of these tools include pivot tables, data filtering, and statistical calculations. These tools can be useful in exploring and gaining insights from your data before plotting histograms.

Alright, So Which Should I Use?

When it comes to choosing the right library or tool for plotting histograms in Python, it depends on your specific requirements and preferences. If you are looking for a simple and lightweight solution, using pure Python with dictionaries might be sufficient. If you need more advanced functionalities and customization options, you can consider using libraries like NumPy, Matplotlib, pandas, and Seaborn.

Ultimately, the choice depends on your data, the complexity of the analysis, and the level of visual sophistication you aim to achieve. Experiment with different libraries, explore their documentation and examples, and choose the one that best fits your needs.

I hope this tutorial has provided you with a comprehensive understanding of how to plot histograms in Python using various libraries. Have fun experimenting with different techniques and customizing your plots!