Skip to content

Effortlessly Create Histograms using pyplot hist

[

Python Histogram Plotting: NumPy, Matplotlib, pandas & Seaborn

In this tutorial, we will explore how to create Python histogram plots using libraries from the scientific stack: NumPy, Matplotlib, pandas, and Seaborn. Histograms are a powerful tool for visualizing probability distributions, and they can be easily understood by a wide range of audiences. Whether you are an intermediate Python programmer interested in data visualization or a data scientist looking for production-quality plots, this article is your one-stop shop for building and plotting histograms in Python.

Histograms in Pure Python

To understand the fundamentals of constructing histograms, let’s start by building them in pure Python without the use of any third-party libraries.

First, we need to define our data. For example, let’s consider the following sequence of commute times:

a = (0, 1, 1, 1, 2, 3, 7, 7, 23)

Now, we can define a function called count_elements() that will tally the frequency of each unique value in the sequence:

def count_elements(seq) -> dict:
"""Tally elements from seq."""
hist = {}
for i in seq:
hist[i] = hist.get(i, 0) + 1
return hist

This function uses a dictionary to store the frequencies, with each unique element in the sequence as the key and its count as the value. Using a loop, we iterate over the sequence and increment the corresponding value in the dictionary.

Alternatively, we can achieve the same result using the collections.Counter class from Python’s standard library:

from collections import Counter
recounted = Counter(a)

The Counter class subclasses a Python dictionary and provides a convenient way to count the frequency of elements in a sequence.

Now that we have the frequency distribution, we can proceed to plot the histogram using various libraries.

Building Up From the Base: Histogram Calculations in NumPy

NumPy is a popular library for scientific computing in Python, and it provides efficient ways to process large datasets. One of its functionalities is calculating histograms.

To create a histogram using NumPy, we can use the numpy.histogram() function. This function takes in the data and automatically determines the appropriate bin edges and frequencies. Here’s an example:

import numpy as np
bins, frequencies = np.histogram(a)

The np.histogram() function returns two arrays: bins and frequencies. The bins array contains the edges of the bins, and the frequencies array contains the number of occurrences within each bin.

Now, we can visualize the histogram using Matplotlib.

Visualizing Histograms with Matplotlib and pandas

Matplotlib is a powerful plotting library that integrates well with NumPy. It provides a variety of functions and options for customizing plots, including histograms.

To plot a histogram using Matplotlib, we can use the matplotlib.pyplot.hist() function. This function takes in the data and automatically calculates the histogram. Here’s an example:

import matplotlib.pyplot as plt
plt.hist(a, bins='auto')
plt.show()

The plt.hist() function automatically calculates the histogram from the data and plots the results. The bins='auto' argument tells Matplotlib to determine the optimal number of bins automatically.

Additionally, we can use pandas, a library built on top of NumPy and Matplotlib, to plot histograms directly from a DataFrame. Here’s an example:

import pandas as pd
df = pd.DataFrame({'data': a})
df['data'].plot.hist(bins='auto')
plt.show()

The df['data'].plot.hist() function is a convenient way to plot histograms directly from a DataFrame column. We can specify the number of bins using the bins argument.

A Fancy Alternative with Seaborn

Seaborn is a high-level data visualization library that provides a more aesthetically pleasing and informative alternative to Matplotlib. It integrates well with pandas and provides additional functionalities for plotting histograms.

To plot a histogram using Seaborn, we can use the seaborn.histplot() function. This function takes in the data and automatically calculates the histogram. Here’s an example:

import seaborn as sns
sns.histplot(a, bins='auto')
plt.show()

The sns.histplot() function provides a visually appealing histogram with additional features such as shading and a kernel density estimate.

Other Tools in pandas

In addition to histograms, pandas provides other useful tools for data analysis and visualization. These include scatter plots, box plots, and line plots. By leveraging the power of pandas, we can easily explore and visualize our data in various ways.

Alright, So Which Should I Use?

Now that you have learned how to create histograms using different libraries, you might be wondering which one you should use. The answer depends on your specific needs and preferences.

If you are looking for a simple and lightweight solution, pure Python with dictionaries or the collections.Counter class can be sufficient. However, these methods might not be as efficient when dealing with large datasets.

If you need more advanced functionalities and customization options, NumPy and Matplotlib provide a comprehensive set of tools for histogram calculations and plotting. These libraries are widely used in the scientific community and offer extensive documentation and support.

If aesthetics and ease of use are important to you, Seaborn is a great choice. It builds on top of Matplotlib and provides visually appealing plots with minimal effort.

Ultimately, the choice is yours. Experiment with different libraries and find the one that suits your needs best.

In conclusion, Python offers a variety of options for building and plotting histograms. Whether you prefer a minimalistic approach or a visually appealing plot, the scientific stack libraries have got you covered. Armed with this knowledge, you can create stunning and informative histogram plots using Python.