Skip to content

How to Check if a Python String Contains a Substring

CodeMDD.io

How to Check if a Python String Contains a Substring

If you’re new to programming or come from a programming language other than Python, you may be looking for the best way to check whether a string contains another string in Python.

Identifying such substrings comes in handy when you’re working with text content from a file or after you’ve received user input. You may want to perform different actions in your program depending on whether a substring is present or not.

In this tutorial, you’ll focus on the most Pythonic way to tackle this task, using the membership operatorin. Additionally, you’ll learn how to identify the right string methods for related, but different, use cases.

Finally, you’ll also learn how to find substrings in pandas columns. This is helpful if you need to search through data from a CSV file. You could use the approach that you’ll learn in the next section, but if you’re working with tabular data, it’s best to load the data into a pandas DataFrame and search for substrings in pandas.

How to Confirm That a Python String Contains Another String

>>> raw_file_content = """Hi there and welcome.
... This is a special hidden file with a SECRET secret.
... I don't want to tell you The Secret,
... but I do want to secretly tell you that I have one."""
>>> "secret" in raw_file_content
True

The in membership operator gives you a quick and readable way to check whether a substring is present in a string. You may notice that the line of code almost reads like English.

>>> "secret" not in raw_file_content
False

Because the substring “secret” is present in raw_file_content, the not in operator returns False.

When you use in, the expression returns a Boolean value:

  • True if Python found the substring
  • False if Python didn’t find the substring

You can use this intuitive syntax in conditional statements to make decisions in your code:

>>> if "secret" in raw_file_content:
... print("Found!")
...
Found!

In this code snippet, you use the membership operator to check whether “secret” is a substring of raw_file_content. If it is, then you’ll print a message to the terminal.

Generalize Your Check by Removing Case Sensitivity

By default, membership operator in is case sensitive. This means that when you use in, Python will only return True if the substring you’re searching for is in the exact same case as in the string you’re searching.

If you want to perform a case-insensitive check, you can convert both the string and the substring to either lowercase or uppercase. Then, you can use the in operator to check whether the lowercase or uppercase substring exists in the lowercase or uppercase string.

Here’s an example:

>>> raw_file_content = """Hi there and welcome.
... This is a special hidden file with a SECRET secret.
... I don't want to tell you The Secret,
... but I do want to secretly tell you that I have one."""
>>> "secret" in raw_file_content.lower()
True

In this example, raw_file_content.lower() converts the entire string to lowercase. Then, you use in to check whether “secret” exists in the lowercase string. This will return True because the lowercase version of “secret” is present in the lowercase string.

By performing a lowercase or uppercase conversion, you can generalize your check and make it case-insensitive.

Learn More About the Substring

Here’s an example:

>>> raw_file_content = """Hi there and welcome.
... This is a special hidden file with a SECRET secret.
... I don't want to tell you The Secret,
... but I do want to secretly tell you that I have one."""
>>> substring = "secret"
>>> index = raw_file_content.index(substring)
>>> index
34

In this example, raw_file_content.index(substring) returns the index position of the first occurrence of "secret" in raw_file_content. The index position is 34.

Find a Substring With Conditions Using Regex

If you need to find a substring with specific conditions, such as finding all occurrences that match a certain pattern or extracting substrings based on a pattern, regular expressions can be a powerful tool.

Here’s an example that demonstrates how to find all occurrences of a substring that match a specific pattern using regular expressions:

import re
raw_file_content = """Hi there and welcome.
This is a special hidden file with a SECRET secret.
I don't want to tell you The Secret,
but I do want to secretly tell you that I have one."""
pattern = r"\b[A-Z][A-Za-z]+\b"
matches = re.findall(pattern, raw_file_content)
print(matches)

In this example, re.findall(pattern, raw_file_content) searches for all occurrences of the substring that matches the regular expression pattern in raw_file_content. The pattern \b[A-Z][A-Za-z]+\b matches words that begin with an uppercase letter and are followed by one or more uppercase or lowercase letters.

The result of re.findall() is a list of substrings that match the pattern. In this case, the output will be ['Hi', 'This', 'Secret'].

Find a Substring in a pandas DataFrame Column

If you’re working with tabular data and need to search for substrings in specific columns, it can be more efficient and convenient to use pandas. pandas is a powerful library for data manipulation and analysis in Python, and it provides many useful functions for working with strings in tabular data.

Here’s an example that demonstrates how to find a substring in a pandas DataFrame column:

import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie', 'Dave'],
'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com', 'dave@example.com']
}
df = pd.DataFrame(data)
substring = 'example'
df['contains_substring'] = df['email'].str.contains(substring, case=False)
print(df)

In this example, df['email'].str.contains(substring, case=False) creates a new column in the DataFrame called ‘contains_substring’. The values in this column are True if the substring ‘example’ is present in the corresponding email address, and False otherwise.

This allows you to easily filter, sort, or perform other operations based on the presence of a specific substring in a DataFrame column.

Key Takeaways

Checking whether a Python string contains a substring is a common task in many Python programs. The most Pythonic way to perform this check is to use the membership operator in. By default, in is case sensitive, but you can easily perform a case-insensitive check by converting both the string and the substring to either lowercase or uppercase.

If you need to find the index position of the first occurrence of a substring in a string, you can use the str.index() method. If you want to avoid raising a ValueError exception when the substring is not found, you can use the str.find() method.

Regular expressions provide a powerful tool for working with substrings that match specific patterns. You can use the re module in Python to search for substrings with specific conditions.

If you’re working with tabular data, it may be more efficient to use pandas to search for substrings in specific columns. pandas provides many convenient functions for working with strings in tabular data.

By understanding the different methods and approaches available, you can confidently handle substring checks in your Python programs and manipulate strings in a variety of ways.

CodeMDD.io