Skip to content

Effortlessly Check if Python str Contains a Substring

[

How to Check if a Python String Contains a Substring

If you’re new to programming or come from a programming language other than Python, you may be looking for the best way to check whether a string contains another string in Python. Identifying such substrings comes in handy when you’re working with text content from a file or after you’ve received user input. You may want to perform different actions in your program depending on whether a substring is present or not.

In this tutorial, you’ll focus on the most Pythonic way to tackle this task, using the in membership operator. Additionally, you’ll learn how to identify the right string methods for related, but different, use cases. Finally, you’ll also learn how to find substrings in pandas columns. This is helpful if you need to search through data from a CSV file. You could use the approach that you’ll learn in the next section, but if you’re working with tabular data, it’s best to load the data into a pandas DataFrame and search for substrings in pandas.

How to Confirm That a Python String Contains Another String

If you need to check whether a string contains a substring, use Python’s membership operator in. In Python, this is the recommended way to confirm the existence of a substring in a string:

raw_file_content = """Hi there and welcome.
This is a special hidden file with a SECRET secret.
I don't want to tell you The Secret,
but I do want to secretly tell you that I have one."""
"secret" in raw_file_content

The in membership operator gives you a quick and readable way to check whether a substring is present in a string. You may notice that the line of code almost reads like English.

Note: If you want to check whether the substring is not in the string, then you can use not in:

"secret" not in raw_file_content

Because the substring "secret" is present in raw_file_content, the not in operator returns False.

When you use in, the expression returns a Boolean value:

  • True if Python found the substring
  • False if Python didn’t find the substring

You can use this intuitive syntax in conditional statements to make decisions in your code:

if "secret" in raw_file_content:
print("Found!")

In this code snippet, you use the membership operator to check whether "secret" is a substring of raw_file_content. If it is, then you’ll print a message to the terminal.

Generalize Your Check by Removing Case Sensitivity

Sometimes you want to perform a substring check regardless of the letter case in the string. For example, you might want to search for "secret" in a string, regardless of whether the substring is capitalized, all lowercase, or a mix of both. To achieve this, you can convert both the string and the substring to the same letter case before performing the check.

In Python, you have the methods .lower() and .upper() available for strings to convert all characters to lowercase or uppercase, respectively. You can use these methods together with the membership operator to perform a case-insensitive substring check.

Here’s an example:

file_content = raw_file_content.lower()
if "secret" in file_content:
print("Found!")

In this example, you convert raw_file_content to lowercase using the .lower() method, and then assign the result to the variable file_content. Now, regardless of whether the substring “secret” is written in uppercase or lowercase in the original string, the in operator will check for its existence in the lowercase file_content string.

This also works with the .upper() method if you want to perform a case-insensitive check but prefer uppercase syntax for your code:

file_content = raw_file_content.upper()
if "SECRET" in file_content:
print("Found!")

In this case, you convert raw_file_content to uppercase using the .upper() method, and then assign the result to the variable file_content. The in operator now performs a case-insensitive check for the uppercase substring “SECRET” in the uppercase file_content string.

Learn More About the Substring

In some cases, you may want to extract or manipulate the substring that you’re looking for. Python provides several string methods that can help you with this task. Here are a few commonly used methods:

  • .index(): Returns the index of the first occurrence of a substring in a string.
  • .find(): Returns the index of the first occurrence of a substring in a string, or -1 if the substring is not found.
  • .count(): Returns the number of occurrences of a substring in a string.
  • .replace(): Replaces all occurrences of a substring with a new string.
  • .split(): Splits a string into a list at each occurrence of a substring.

Let’s see some examples of how these methods work:

file_content = "The secret code is 42 and the secret answer is 21."
# Index of first occurrence
print(file_content.index("secret")) # Output: 4
# Index of first occurrence or -1 if not found
print(file_content.find("Secret")) # Output: 4
print(file_content.find("unknown")) # Output: -1
# Count occurrences
print(file_content.count("secret")) # Output: 2
# Replace all occurrences
print(file_content.replace("secret", "hidden")) # Output: The hidden code is 42 and the hidden answer is 21.
# Split the string
print(file_content.split(" ")) # Output: ['The', 'secret', 'code', 'is', '42', 'and', 'the', 'secret', 'answer', 'is', '21.']

These methods provide you with powerful tools to manipulate strings and extract relevant information from them.

Find a Substring With Conditions Using Regex

Python’s membership operator in and the string methods covered so far are very useful for basic substring checks. However, if you need to search for substrings that meet certain conditions or have specific patterns, regular expressions (regex) provide a more flexible and powerful solution.

The re module in Python provides functions for working with regular expressions. To search for substrings using regex, you can use the re.search() function, which returns a Match object if a match is found, or None if no match is found.

Here’s an example of how to use re.search() to find a substring that starts with “secret” and ends with a digit:

import re
file_content = "The secret code is 42 and the secret answer is 21."
match = re.search(r"secret.*\d", file_content)
if match:
print("Found!")

In this code, the regular expression r"secret.*\d" is used to define the search pattern. The pattern starts with “secret”, followed by any number of characters (.*), and ends with a digit (\d). If this pattern is found in the file_content string, the if statement will execute and print “Found!“.

Regular expressions provide a powerful way to search for substrings with complex patterns. You can define patterns for specific characters, digits, whitespace, and much more. However, regular expressions are beyond the scope of this tutorial, so if you’re interested in learning more, refer to the official Python documentation or additional resources on regular expressions.

Find a Substring in a pandas DataFrame Column

If you’re working with tabular data and need to search for substrings in specific columns, pandas provides easy-to-use methods that can save you time and effort. Let’s say you have a pandas DataFrame with a column called “text” containing text data, and you want to find all rows that contain a certain substring.

Here’s an example:

import pandas as pd
data = {
"text": [
"This is the first example",
"The second example",
"Yet another example"
]
}
df = pd.DataFrame(data)
# Filter DataFrame based on substring
sub_df = df[df["text"].str.contains("example")]
print(sub_df)

In this code, the .str.contains() method is used to check whether each value in the “text” column contains the substring “example”. This method returns a boolean mask, which can be used to filter the DataFrame. In this case, all rows that contain the substring “example” are selected and stored in the sub_df DataFrame.

You can modify the substring and column name according to your specific use case. This method is particularly useful when working with large datasets or when you need to perform complex filtering operations based on substrings.

Key Takeaways

  • You can use the membership operator in to check whether a string contains a substring. This provides a quick and readable way to confirm the existence of a substring in a string.
  • Python also provides useful string methods like .index(), .find(), .count(), .replace(), and .split() to work with substrings and perform various operations.
  • If you need to search for substrings with specific conditions or patterns, regular expressions provide a more flexible solution. The re module in Python allows you to work with regular expressions.
  • When working with tabular data in pandas, you can use the .str.contains() method to filter rows based on substrings in specific columns. This can save you time and effort when searching for substrings in large datasets.

Now that you know the different ways to check if a Python string contains a substring, you can confidently handle tasks that require substring identification and manipulation. Whether you’re analyzing text data, processing user input, or searching through large datasets, these techniques will help you efficiently work with substrings in Python.