Skip to content

Effortlessly check if string contains substring in Python

[

How to Check if a Python String Contains a Substring

by Martin Breuss

If you’re new to programming or come from a programming language other than Python, you may be looking for the best way to check whether a string contains another string in Python. Identifying such substrings comes in handy when you’re working with text content from a file or after you’ve received user input. You may want to perform different actions in your program depending on whether a substring is present or not.

In this tutorial, we will focus on the most Pythonic way to tackle this task, using the membership operator in. Additionally, we’ll learn how to identify the right string methods for related, but different, use cases. Finally, we’ll also learn how to find substrings in pandas columns, which is helpful if you need to search through data from a CSV file.

How to Confirm That a Python String Contains Another String

If you need to check whether a string contains a substring, use Python’s membership operator in. In Python, this is the recommended way to confirm the existence of a substring in a string. The in membership operator gives you a quick and readable way to check whether a substring is present in a string.

For example, if we have a string raw_file_content and we want to check if it contains the substring “secret”, we can do the following:

raw_file_content = """Hi there and welcome.
This is a special hidden file with a SECRET secret.
I don't want to tell you The Secret,
but I do want to secretly tell you that I have one."""
if "secret" in raw_file_content:
print("Found!")

Output:

Found!

As you can see, the in operator returns True if Python found the substring “secret” in raw_file_content. We can use this intuitive syntax in conditional statements to make decisions in our code.

If we want to check whether the substring is not in the string, we can use the not in operator:

if "secret" not in raw_file_content:
print("Not found!")

Output:

Not found!

Generalize Your Check by Removing Case Sensitivity

By default, the in operator is case sensitive. This means that if you’re searching for a substring in a string and the case doesn’t match exactly, the in operator will return False. If you want to make your check case insensitive, you can convert both the string and the substring to lowercase before performing the check:

raw_file_content = """Hi there and welcome.
This is a special hidden file with a SECRET secret.
I don't want to tell you The Secret,
but I do want to secretly tell you that I have one."""
if "secret" in raw_file_content.lower():
print("Found!")

Output:

Found!

By converting both the string and the substring to lowercase using the lower() method, we ensure that the check is case insensitive. This is useful when you want to ignore the case of the strings you’re comparing.

Learn More About the Substring

If you want to know not only if a string contains a substring but also its position within the string, you can use the find() method. The find() method returns the index of the first occurrence of the substring in the string, or -1 if the substring is not found. Here’s an example:

sentence = "The brown fox jumps over the lazy dog."
index = sentence.find("fox")
if index != -1:
print(f"The substring 'fox' was found at index {index}")
else:
print("The substring was not found")

Output:

The substring 'fox' was found at index 10

In this example, the find() method returns 10 because “fox” is found at index 10 of the string sentence. If the substring is not found, the method returns -1.

Find a Substring With Conditions Using Regex

If you need to find a substring that matches a specific pattern or condition, regular expressions (regex) can be a powerful tool. The re module in Python provides functions for working with regular expressions. Here’s an example that demonstrates how to find all occurrences of a substring that starts with a capital letter and ends with a period:

import re
sentence = "Hello! This is a sentence. Another sentence is coming."
matches = re.findall(r"\b[A-Z][^.]*\.", sentence)
if matches:
print("Matches found:")
for match in matches:
print(match)
else:
print("No matches found")

Output:

Matches found:
Hello!
This is a sentence.

In this example, the findall() function from the re module is used to find all non-overlapping occurrences of the pattern that matches a substring starting with a capital letter and ending with a period. The r"\b[A-Z][^.]*\." pattern is a regular expression pattern that matches a word boundary (\b), followed by an uppercase letter ([A-Z]), any number of non-period characters ([^.]*), and a period (\.).

Find a Substring in a pandas DataFrame Column

If you’re working with tabular data and want to find substrings in one or more columns of a pandas DataFrame, you can use the str.contains() method. This method returns a Boolean Series indicating whether each element of the column contains the substring. Here’s an example:

import pandas as pd
data = {
"Name": ["John Doe", "Jane Smith", "Mike Johnson", "Emily Brown"],
"Age": [25, 30, 35, 40],
"City": ["New York", "London", "Paris", "Tokyo"]
}
df = pd.DataFrame(data)
substring = "Jo"
contains_substring = df["Name"].str.contains(substring)
print(df[contains_substring])

Output:

Name Age City
0 John Doe 25 New York
2 Mike Johnson 35 Paris

In this example, the str.contains() method is used to create a Boolean Series indicating whether each element of the “Name” column contains the substring “Jo”. The resulting Boolean Series is then used to filter the DataFrame using boolean indexing.

Key Takeaways

Checking if a Python string contains a substring can be done using the membership operator in. By default, the in operator is case sensitive, but you can make your check case insensitive by converting both the string and the substring to lowercase. If you need more information about the substring, such as its position within the string, you can use the find() method. If you need to find substrings that match specific patterns or conditions, regular expressions (regex) can be a powerful tool. And if you’re working with tabular data in a pandas DataFrame, you can use the str.contains() method to find substrings in specific columns.

Now that you have a good understanding of how to check if a Python string contains a substring, you can use this knowledge to perform various string operations and make your code more powerful and flexible.