Skip to content

Easily Check if a String Contains a Pattern in Python

[

How to Check if a Python String Contains a Substring

If you’re new to Python programming or if you’re coming from a different programming language, you might be wondering how to check if a string contains a substring in Python. This can be useful when working with text content from a file or when dealing with user input. You may need to perform different actions based on whether a substring is present in a string or not.

In this tutorial, we will explore the most Pythonic way to accomplish this task, using the membership operator in. Additionally, we will learn about different string methods that can be used for related but different use cases. We will also see how to find substrings in pandas DataFrame columns, which can be particularly helpful when working with tabular data.

Note:

How to Confirm That a Python String Contains Another String

To check whether a string contains a substring in Python, we can use the membership operator in. This operator provides a quick and readable way to determine if a substring is present in a string.

raw_file_content = """Hi there and welcome.
This is a special hidden file with a SECRET secret.
I don't want to tell you The Secret,
but I do want to secretly tell you that I have one."""
print("secret" in raw_file_content)

Output:

True

Here, we are using the in operator to check if the substring "secret" is present in the raw_file_content string. The expression returns True because the substring exists in the string.

If we want to check if the substring is not present in the string, we can use the not in operator:

print("secret" not in raw_file_content)

Output:

False

Since the substring “secret” is present in raw_file_content, the not in operator returns False.

The in operator returns a boolean value: True if the substring is found in the string, and False if it is not found. We can use this in conditional statements to make decisions based on the presence of a substring:

if "secret" in raw_file_content:
print("Found!")

Output:

Found!

In the above code snippet, we check if the substring “secret” is present in raw_file_content. If it is, we print the message “Found!” to the terminal.

Generalize Your Check by Removing Case Sensitivity

By default, the in operator is case-sensitive. This means that it will only consider a match if the substring appears with the exact same case in the string. If you want to perform a case-insensitive check, you can modify the strings to be compared using the lower() method:

raw_file_content = raw_file_content.lower()
substring = "secret"
print(substring.lower() in raw_file_content)

Output:

True

Here, we convert both the raw_file_content string and the substring string to lowercase using the lower() method. This way, when we check for the presence of the substring, we do it in a case-insensitive manner.

Learn More About the Substring

If you want to retrieve additional information about the substring, such as its index or the number of occurrences in the string, you can use the find() and count() methods.

The find() method returns the index of the first occurrence of the substring in the string, or -1 if the substring is not found:

index = raw_file_content.find("secret")
print(index)

Output:

32

In this example, the find() method returns the index 32, which corresponds to the first occurrence of the substring “secret” in the raw_file_content string.

The count() method returns the number of occurrences of the substring in the string:

count = raw_file_content.count("secret")
print(count)

Output:

2

Here, the count() method returns 2 because the substring “secret” appears twice in the raw_file_content string.

Find a Substring With Conditions Using Regex

In some cases, you may need to find a substring that matches particular conditions. This is where regular expressions, or regex, can be useful. Python provides the re module for working with regular expressions.

import re
if re.search(r"\b(secret|hidden)\b", raw_file_content):
print("Found!")

Output:

Found!

In this example, we use the re.search() function to search for the regular expression pattern \b(secret|hidden)\b in the raw_file_content string. This pattern matches the words “secret” or “hidden” when they appear as separate words, surrounded by word boundaries.

If the pattern is found, we print the message “Found!” to the terminal.

Note that we use the r prefix before the regular expression pattern to create a raw string. This is recommended to avoid issues with backslashes.

Find a Substring in a pandas DataFrame Column

If you’re working with tabular data and need to find substrings in pandas DataFrame columns, you can use the .str.contains() method. This method allows you to search for a substring within each element of a column.

First, let’s create a sample DataFrame:

import pandas as pd
data = {
"Name": ["John Doe", "Jane Smith", "Bob Johnson"],
"Email": ["john@example.com", "jane@example.com", "bob@example.com"]
}
df = pd.DataFrame(data)

The DataFrame df has two columns: “Name” and “Email”. We can use the .str.contains() method on the “Name” column to check if a substring is present in each name:

substring = "Doe"
filtered_df = df[df["Name"].str.contains(substring)]
print(filtered_df)

Output:

Name Email
0 John Doe john@example.com

Here, we use the .str.contains() method to create a boolean mask that indicates whether the substring “Doe” is present in each element of the “Name” column. We then use this mask to filter the DataFrame and obtain the rows where the substring is found.

Key Takeaways

In this tutorial, we have learned how to check if a Python string contains a substring. We use the membership operator in to check for the presence of a substring in a string. We can also use the not in operator to check if a substring is not present.

To generalize the check and make it case-insensitive, we can convert both the string and the substring to lowercase using the lower() method.

We explored additional string methods like find() and count() to retrieve information about the substring, such as its index and the number of occurrences in the string.

We also saw how to use regular expressions to find substrings that match specific conditions.

Finally, we learned how to find substrings in pandas DataFrame columns using the .str.contains() method.

With these techniques, you’ll be able to efficiently and effectively check for substrings in Python strings and pandas DataFrame columns.