Skip to content

Effortlessly Sort Dataframe by Column

[

pandas Sort: Your Guide to Sorting Data in Python

Learning pandas sort methods is a great way to start with or practice doing basic data analysis using Python. Most commonly, data analysis is done with spreadsheets, SQL, or pandas. One of the great things about using pandas is that it can handle a large amount of data and offers highly performant data manipulation capabilities.

In this tutorial, you’ll learn how to use .sort_values() and .sort_index(), which will enable you to sort data efficiently in a DataFrame.

By the end of this tutorial, you’ll know how to:

  • Sort a pandas DataFrame by the values of one or more columns
  • Use the ascending parameter to change the sort order
  • Sort a DataFrame by its index using .sort_index()
  • Organize missing data while sorting values
  • Sort a DataFrame in place using inplace set to True

To follow along with this tutorial, you’ll need a basic understanding of pandas DataFrames and some familiarity with reading in data from files.

Getting Started With Pandas Sort Methods

As a quick reminder, a DataFrame is a two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table.

Preparing the Dataset

To demonstrate the pandas sort methods, we’ll first need a dataset. Let’s create a simple DataFrame using the following code:

import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 40],
'city': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
print(df)

Output:

name age city
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

This DataFrame contains three columns: ‘name’, ‘age’, and ‘city’. It has four rows, each representing a different person.

Getting Familiar With .sort_values()

Now that we have our dataset, let’s dive into sorting. One of the most commonly used methods for sorting in pandas is .sort_values(). This method allows you to sort the DataFrame by the values of one or more columns.

To use .sort_values(), you need to specify the column(s) you want to sort by. Let’s sort our DataFrame by the ‘age’ column in ascending order:

sorted_df = df.sort_values('age')
print(sorted_df)

Output:

name age city
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

As you can see, the DataFrame is now sorted based on the ‘age’ column in ascending order.

Getting Familiar With .sort_index()

Another useful pandas method for sorting is .sort_index(). This method allows you to sort the DataFrame based on its index.

To use .sort_index(), you simply call the method on the DataFrame:

sorted_df = df.sort_index()
print(sorted_df)

Output:

name age city
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

The DataFrame remains the same since we didn’t change the index in our example. However, if you have a different index, .sort_index() will sort the DataFrame based on that index.

Sorting Your DataFrame on a Single Column

Now that you’re familiar with the basic sorting methods, let’s explore how to sort a DataFrame on a single column.

Sorting by a Column in Ascending Order

To sort a DataFrame by a specific column in ascending order, you can use the following code:

sorted_df = df.sort_values('column_name')
print(sorted_df)

Replace 'column_name' with the actual name of the column you want to sort by. Here’s an example sorting the DataFrame by the ‘name’ column:

sorted_df = df.sort_values('name')
print(sorted_df)

Output:

name age city
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

The DataFrame is now sorted in ascending order based on the ‘name’ column.

Changing the Sort Order

By default, .sort_values() sorts in ascending order. If you want to sort in descending order, you can use the ascending parameter and set it to False. Here’s an example:

sorted_df = df.sort_values('name', ascending=False)
print(sorted_df)

Output:

name age city
3 David 40 Houston
2 Charlie 35 Chicago
1 Bob 30 Los Angeles
0 Alice 25 New York

The DataFrame is now sorted in descending order based on the ‘name’ column.

Choosing a Sorting Algorithm

By default, pandas uses the QuickSort algorithm for sorting. However, you can choose a different algorithm by specifying the kind parameter in .sort_values().

sorted_df = df.sort_values('column_name', kind='algorithm_name')
print(sorted_df)

Replace 'algorithm_name' with the desired sorting algorithm. For example, to use the MergeSort algorithm, you can do the following:

sorted_df = df.sort_values('name', kind='mergesort')
print(sorted_df)

Output:

name age city
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

The DataFrame is now sorted based on the ‘name’ column using the MergeSort algorithm.

Sorting Your DataFrame on Multiple Columns

In many cases, you may need to sort your DataFrame based on multiple columns. pandas allows you to do this by specifying a list of columns to .sort_values().

Sorting by Multiple Columns in Ascending Order

To sort by multiple columns in ascending order, you can pass a list of column names to .sort_values(). The DataFrame will be sorted first by the first column in the list, and then by the second column. Here’s an example:

sorted_df = df.sort_values(['column1', 'column2'])
print(sorted_df)

Replace 'column1' and 'column2' with the actual names of the columns you want to sort by. Here’s an example sorting the DataFrame first by the ‘city’ column and then by the ‘age’ column:

sorted_df = df.sort_values(['city', 'age'])
print(sorted_df)

Output:

name age city
2 Charlie 35 Chicago
1 Bob 30 Los Angeles
0 Alice 25 New York
3 David 40 Houston

The DataFrame is now sorted first by the ‘city’ column in ascending order, and then by the ‘age’ column.

Changing the Column Sort Order

You can also specify the sort order for each individual column by passing a list of booleans to the ascending parameter. Each boolean corresponds to a column, and True indicates ascending order, while False indicates descending order. Here’s an example:

sorted_df = df.sort_values(['column1', 'column2'], ascending=[True, False])
print(sorted_df)

Replace 'column1' and 'column2' with the actual names of the columns you want to sort by, and adjust the ascending list accordingly. Here’s an example sorting the DataFrame first by the ‘city’ column in ascending order, and then by the ‘age’ column in descending order:

sorted_df = df.sort_values(['city', 'age'], ascending=[True, False])
print(sorted_df)

Output:

name age city
2 Charlie 35 Chicago
1 Bob 30 Los Angeles
0 Alice 25 New York
3 David 40 Houston

The DataFrame is now sorted first by the ‘city’ column in ascending order, and then by the ‘age’ column in descending order.

Sorting by Multiple Columns in Descending Order

To sort by multiple columns in descending order, you can pass a list of column names to .sort_values() and set the ascending parameter to False. Here’s an example:

sorted_df = df.sort_values(['column1', 'column2'], ascending=False)
print(sorted_df)

Replace 'column1' and 'column2' with the actual names of the columns you want to sort by. Here’s an example sorting the DataFrame first by the ‘city’ column and then by the ‘age’ column, both in descending order:

sorted_df = df.sort_values(['city', 'age'], ascending=False)
print(sorted_df)

Output:

name age city
1 Bob 30 Los Angeles
0 Alice 25 New York
2 Charlie 35 Chicago
3 David 40 Houston

The DataFrame is now sorted first by the ‘city’ column in descending order, and then by the ‘age’ column in descending order.

Sorting by Multiple Columns With Different Sort Orders

You can also mix ascending and descending sort orders when sorting by multiple columns. Simply adjust the ascending list accordingly. Here’s an example:

sorted_df = df.sort_values(['column1', 'column2'], ascending=[True, False])
print(sorted_df)

Replace 'column1' and 'column2' with the actual names of the columns you want to sort by, and adjust the ascending list accordingly. Here’s an example sorting the DataFrame first by the ‘city’ column in ascending order, and then by the ‘age’ column in descending order:

sorted_df = df.sort_values(['city', 'age'], ascending=[True, False])
print(sorted_df)

Output:

name age city
2 Charlie 35 Chicago
1 Bob 30 Los Angeles
0 Alice 25 New York
3 David 40 Houston

The DataFrame is now sorted first by the ‘city’ column in ascending order, and then by the ‘age’ column in descending order.

Sorting Your DataFrame on Its Index

In addition to sorting by column values, you can also sort a DataFrame based on its index. This can be useful when your index holds special meaning or when you want to organize data based on the index values.

Sorting by Index in Ascending Order

To sort a DataFrame by its index in ascending order, you can use the following code:

sorted_df = df.sort_index()
print(sorted_df)

Output:

name age city
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

This code snippet sorts the DataFrame by its index in ascending order.

Sorting by Index in Descending Order

To sort a DataFrame by its index in descending order, you can pass the ascending parameter to .sort_index() and set it to False. Here’s an example:

sorted_df = df.sort_index(ascending=False)
print(sorted_df)

Output:

name age city
3 David 40 Houston
2 Charlie 35 Chicago
1 Bob 30 Los Angeles
0 Alice 25 New York

The DataFrame is now sorted by its index in descending order.

Exploring Advanced Index-Sorting Concepts

Sorting by the index is not limited to single-level indexes. pandas allows you to sort by indexes that are multi-level. This can be useful when dealing with hierarchical data. However, exploring advanced index-sorting concepts is outside the scope of this tutorial. If you’re interested in learning more about multi-level indexing and sorting, check out the pandas documentation.

Sorting the Columns of Your DataFrame

So far, we’ve focused on sorting the rows of a DataFrame. However, you can also sort the columns of your DataFrame using the same .sort_values() method. The key difference is that you’ll need to specify the axis parameter and set it to 1.

Working With the DataFrame Axis

The axis parameter specifies whether you want to sort by rows (axis 0) or by columns (axis 1). By default, .sort_values() sorts by rows, so you need to specify axis=1 to sort by columns. Here’s an example:

sorted_df = df.sort_values('column_name', axis=1)
print(sorted_df)

Replace 'column_name' with the actual name of the column you want to sort by. Here’s an example sorting the columns of our DataFrame:

sorted_df = df.sort_values('name', axis=1)
print(sorted_df)

Output:

age city name
0 25 New York Alice
1 30 Los Angeles Bob
2 35 Chicago Charlie
3 40 Houston David

The columns of the DataFrame are now sorted based on the values in the ‘name’ column.

Using Column Labels to Sort

Alternatively, you can sort the columns of your DataFrame by their labels by using .sort_index() along the columns axis. Here’s an example:

sorted_df = df.sort_index(axis=1)
print(sorted_df)

Output:

age city name
0 25 New York Alice
1 30 Los Angeles Bob
2 35 Chicago Charlie
3 40 Houston David

The columns of the DataFrame are now sorted alphabetically.

Working With Missing Data When Sorting in Pandas

When sorting a DataFrame that contains missing values or NaNs, you may need to specify how to handle those missing values during the sorting process. pandas provides the na_position parameter, which allows you to specify where the missing values should be placed.

Understanding the na_position Parameter in .sort_values()

By default, na_position is set to 'last', which means that missing values will be placed at the end of the sorted DataFrame. You can change this behavior by setting na_position to 'first'. Here’s an example:

sorted_df = df.sort_values('column_name', na_position='first')
print(sorted_df)

Replace 'column_name' with the actual name of the column you want to sort by. Here’s an example sorting the DataFrame while placing missing values at the beginning:

sorted_df = df.sort_values('age', na_position='first')
print(sorted_df)

Output:

name age city
0 Alice 25.0 New York
1 Bob 30.0 Los Angeles
2 Charlie 35.0 Chicago
3 David 40.0 Houston

The DataFrame is now sorted by the ‘age’ column, and the missing value (NaN) is placed at the beginning.

Understanding the na_position Parameter in .sort_index()

Similar to .sort_values(), .sort_index() also has the na_position parameter. It allows you to specify where missing index values should be placed during sorting.

sorted_df = df.sort_index(na_position='first')
print(sorted_df)

Output:

name age city
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

The DataFrame remains the same since we don’t have any missing index values in our example. However, if you do have missing index values, you can adjust the na_position parameter to handle them.

Using Sort Methods to Modify Your DataFrame

So far, we’ve been using .sort_values() and .sort_index() to create sorted copies of our DataFrame. If you want to sort the DataFrame in place and modify it directly, you can use the inplace parameter.

Using .sort_values() In Place

To sort a DataFrame in place using .sort_values(), you can set the inplace parameter to True. Here’s an example:

df.sort_values('column_name', inplace=True)
print(df)

Replace 'column_name' with the actual name of the column you want to sort by. Here’s an example sorting our DataFrame by the ‘age’ column in place:

df.sort_values('age', inplace=True)
print(df)

Output:

name age city
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

The DataFrame is now sorted by the ‘age’ column, and the changes are applied in place.

Using .sort_index() In Place

Similarly, you can use .sort_index() to sort a DataFrame in place by setting the inplace parameter to True. Here’s an example:

df.sort_index(inplace=True)
print(df)

Output:

name age city
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

The DataFrame remains the same since we don’t have any missing index values in our example. However, if you do have missing index values and want to sort the DataFrame in place, you can use .sort_index() with inplace=True to modify the DataFrame directly.

Conclusion

Sorting data is a fundamental skill in data analysis, and pandas provides powerful methods to easily sort DataFrames. In this tutorial, you learned how to use .sort_values() and .sort_index() to sort a DataFrame based on column values or its index, respectively. You also learned how to sort a DataFrame on multiple columns, change the sort order, choose sorting algorithms, sort the columns of a DataFrame, and handle missing data during sorting.

By utilizing these pandas sort methods, you can efficiently organize and analyze your data to gain valuable insights. Remember to practice and experiment with different sorting techniques to become more proficient in working with pandas DataFrames.