Skip to content

Comparing Substrings in Python

CodeMDD.io

How to Check if a Python String Contains a Substring

If you’re new to programming or come from a programming language other than Python, you may be looking for the best way to check whether a string contains another string in Python. Identifying such substrings comes in handy when you’re working with text content from a file or after you’ve received user input. You may want to perform different actions in your program depending on whether a substring is present or not.

In this tutorial, you’ll focus on the most Pythonic way to tackle this task, using the membership operator in. Additionally, you’ll learn how to identify the right string methods for related, but different, use cases. Finally, you’ll also learn how to find substrings in pandas columns. This is helpful if you need to search through data from a CSV file. You could use the approach that you’ll learn in the next section, but if you’re working with tabular data, it’s best to load the data into a pandas DataFrame and search for substrings in pandas.

How to Confirm That a Python String Contains Another String

If you need to check whether a string contains a substring, use Python’s membership operator in. In Python, this is the recommended way to confirm the existence of a substring in a string:

raw_file_content = """Hi there and welcome.
This is a special hidden file with a SECRET secret.
I don't want to tell you The Secret,
but I do want to secretly tell you that I have one."""
"secret" in raw_file_content

The in membership operator gives you a quick and readable way to check whether a substring is present in a string.

Note: If you want to check whether the substring is not in the string, then you can use not in:

"secret" not in raw_file_content

Because the substring "secret" is present in raw_file_content, the not in operator returns False.

When you use in, the expression returns a Boolean value:

  • True if Python found the substring
  • False if Python didn’t find the substring

You can use this intuitive syntax in conditional statements to make decisions in your code:

if "secret" in raw_file_content:
print("Found!")

In this code snippet, you use the membership operator to check whether "secret" is a substring of raw_file_content. If it is, then you’ll print a message to the terminal.

Generalize Your Check by Removing Case Sensitivity

In some cases, you may want to check whether a substring is present in a string, disregarding the case of the characters. You can achieve this by converting both strings to lowercase using the lower() method:

raw_file_content = raw_file_content.lower()
if "secret" in raw_file_content:
print("Found!")

By converting raw_file_content to lowercase, you ensure that the search for the substring is case-insensitive. Keep in mind that this will make the search less specific, as it will match both lowercase and uppercase versions of the substring.

Learn More About the Substring

If you want to gain more insights into the substring, such as its index position, you can use the find() or index() methods. Both methods return the index of the first occurrence of the substring within the string:

position = raw_file_content.find("secret")
print(position)

This will print the index position where the substring "secret" is found. If the substring is not present in the string, both methods will return -1.

Additionally, you can extract the substring itself using slicing. Slicing allows you to extract a specific portion of the string:

substring = raw_file_content[position:position + len("secret")]
print(substring)

This will print the extracted substring. Keep in mind that position + len("secret") is used to indicate the end position of the substring.

Find a Substring With Conditions Using Regex

Python’s re module provides powerful tools for working with regular expressions. If your substring search requires complex patterns or conditions, you can leverage regular expressions to find substrings. Here’s an example that uses regular expressions to find substrings that start with an uppercase letter:

import re
list_of_words = re.findall(r'\b[A-Z]\w+', raw_file_content)
print(list_of_words)

This will print a list of words that start with an uppercase letter. Regular expressions offer a wide range of options and flexibility for substring searches. Keep in mind that regex patterns can be complex, so it’s important to understand the regular expression syntax.

Find a Substring in a pandas DataFrame Column

If you’re working with tabular data in pandas and need to search for substrings, you can use the str.contains() method. This method allows you to check whether a substring is present in a specific column of a pandas DataFrame:

import pandas as pd
df = pd.DataFrame({"text": ["Hello, world!", "Welcome to Python", "Python is great!"]})
if df["text"].str.contains("Python").any():
print("Found!")

This will print “Found!” if the substring “Python” is present in any row of the “text” column of the DataFrame.

Key Takeaways

Now you know how to check if a Python string contains a substring using the membership operator in. You’ve also learned how to generalize the check by removing case sensitivity, how to gain more insights about the substring, and how to find substrings using regular expressions in complex cases. Finally, you’ve learned how to find substrings in pandas DataFrame columns.

By understanding these concepts, you’ll be able to efficiently search for substrings in Python strings and manipulate them according to your program’s logic.