How to Categorize Keywords By Search Intent Using Python: A Complete Guide [2024]
Table of Contents
Setting Up Your Python Environment
Support The Author: If you’ve found value in this article, please leave me a positive Google My Business review!
A positive review would really help go a long way! It’s super easy and only takes a few seconds. Everything is much appreciate, especially the positive feedback and how this content may have helped you. For any negative feedback or questions about this article, please reach out to me at hello@raredog.co and I’d be happy to help.
( Bonus: Check out our latest guide on how to scrape web content in bulk using python for just the main areas of the site! )
Python is a versatile and powerful programming language that can be used for a wide range of tasks, including data analysis and manipulation. In this guide, I'll walk through how you can use Python to categorize a list of keywords according to search intent, using a pre-prepared Excel file.
Setting Up Your Python Environment
Before we get started, you'll need to have Python installed on your computer. If you don't have Python installed yet, follow these steps:
Go to the official Python website.
Download the latest version of Python.
Open the downloaded file and follow the instructions to install Python.
After installing Python, we'll need to install a few Python libraries that we'll be using in this project: Pandas, NLTK, and Openpyxl. Here's how you can install them:
Open Command Prompt (on Windows) or Terminal (on macOS or Linux).
Type the following commands and press Enter after each one:
pip install pandas pip install nltk pip install openpyxl
These commands will install the necessary Python libraries using pip, which is a package manager for Python.
Pandas: This is a powerful data manipulation library that provides data structures and functions needed to manipulate structured data. It provides tools for reading and writing data, handling missing data, merging datasets, reshaping data, and much more.
NLTK (Natural Language Toolkit): This is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.
Openpyxl: This is a Python library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files. It's a comprehensive tool to create, modify and read Excel documents while maintaining all of the Excel file’s properties.
Preparing Your Data
For this project, you'll need two Excel files:
An Excel file with your list of keywords. We'll refer to this file as
kws.xlsx
in this guide.The structure here is important! The first (and only) column should be titled “Keyword”, with your list of keywords underneath it. Also the name of the file is important as well if you want to copy and paste the python script below as is (otherwise you’ll have to modify the python script example I give you here to match the name). File extension is important as well, needs to be an xlsx (or you can update the script to use csv’s but I use xlsx here)
An Excel file with your search intent categories and corresponding keyword modifiers.
This file should have two columns: one for the search intent category and one for the keyword modifiers that correspond to that category. The first column should be named “Search Intent Type” and the second column should be named “Keyword Modifiers”. We'll refer to this file as
kw-intent.xlsx
in this guide (name of file and file type need to match this).
Place these two Excel files in a new folder on your computer. For this guide, we'll use the folder path (update to your path as needed) “C:\Users\YourUsername\Documents\PythonScripts\KeywordCategorization\
”
Creating the Python Scripts
Next, we'll create two Python scripts: one for preprocessing the data and one for categorizing the intents of the keywords.
Preprocessing Script
import nltk from nltk.corpus import stopwords from nltk.stem import WordNetLemmatizer import pandas as pd import re # Initialize lemmatizer and stopwords lemmatizer = WordNetLemmatizer() stopwords = set(stopwords.words('english')) def preprocess(text): # Convert to string text = str(text) # Convert to lowercase text = text.lower() # Remove special characters text = re.sub(r'\W', ' ', text) # Remove single characters text = re.sub(r'\s+[a-zA-Z]\s+', ' ', text) # Substitute multiple spaces with single space text = re.sub(r'\s+', ' ', text, flags=re.I) # Lemmatize and remove stopwords text = [lemmatizer.lemmatize(word) for word in text.split() if word not in stopwords] return ' '.join(text) # Apply preprocess function to a list of words def preprocess_list(word_list): return [preprocess(word) for word in word_list] # Load data keywords_df = pd.read_excel('kws.xlsx') intent_df = pd.read_excel('kw-intent.xlsx') # Keep a copy of the original keywords keywords_df['Original Keyword'] = keywords_df['Keyword'] # Preprocess keywords keywords_df['Keyword'] = keywords_df['Keyword'].apply(preprocess) # Split the "Keyword Modifiers" column into lists of words intent_df["Keyword Modifiers"] = intent_df["Keyword Modifiers"].str.split(", ") # Preprocess each word in the "Keyword Modifiers" column intent_df["Keyword Modifiers"] = intent_df["Keyword Modifiers"].apply(preprocess_list) # Save the processed data to Excel files keywords_df.to_excel('preprocessed_kws.xlsx', index=False) intent_df.to_excel('preprocessed_kw-intent.xlsx', index=False)
This script loads your keywords and search intent data, preprocesses the data (converts it to lowercase, removes special characters, lemmatizes the words, and removes English stopwords), and then saves the preprocessed data back to Excel files.
Intent Categorization Script
import pandas as pd # Load the preprocessed data keywords_df = pd.read_excel('preprocessed_kws.xlsx') intent_df = pd.read_excel('preprocessed_kw-intent.xlsx') # Convert the Keyword Modifiers column into lists of words # Since the "Keyword Modifiers" column has been preprocessed and saved as a string, we need to evaluate the string to get the list back intent_df["Keyword Modifiers"] = intent_df["Keyword Modifiers"].apply(eval) # Initialize a new column in keywords_df to store the intents keywords_df["Intent"] = "Uncategorized" # Loop through each keyword for i, keyword in keywords_df["Keyword"].items(): # Initialize a list to store the intents of the current keyword intents = [] # Convert keyword to string and split it into individual words keyword_words = set(str(keyword).split()) # Loop through each intent for _, row in intent_df.iterrows(): # Check if any word from the current intent appears in the keyword_words set if any(word in keyword_words for word in row["Keyword Modifiers"]): # If it does, append the intent to the list intents.append(row["Search Intent Type"]) # If the list of intents is not empty, assign it to the Intent column of the current keyword if intents: keywords_df.at[i, "Intent"] = ", ".join(intents) # Save the results to a new Excel file keywords_df[['Original Keyword', 'Intent']].to_excel('keywords_with_intents.xlsx', index=False)
This script loads the preprocessed data, checks each keyword to see if it contains any of the keyword modifiers associated with each search intent, and then assigns the corresponding search intent(s) to each keyword. The results are saved to a new Excel file.
Running the Python Scripts
Now that you have your Python scripts ready, you can run them to categorize your keywords.
Open Command Prompt or Terminal.
Navigate to the folder where you saved your Python scripts. You can do this by typing
cd
followed by the folder path. For this guide, the command would be:
cd C:\Users\YourUsername\Documents\PythonScripts\KeywordCategorization\
Run the preprocessing script by typing
python preprocess.py
and pressing Enter.Once the preprocessing script has finished running, run the intent categorization script by typing
python categorize.py
and pressing Enter.
The intent categorization script will create a new Excel file (keywords_with_intents.xlsx
) in the same folder. This file will contain your list of keywords along with their categorized intents.
Troubleshooting Common Errors
If you encounter any errors while running the Python scripts, here are a few common issues and how you can resolve them:
ModuleNotFoundError
: If you see an error message likeModuleNotFoundError: No module named 'pandas'
, it means that the Python library (in this case, Pandas) is not installed. You can install it by typingpip install pandas
in Command Prompt or Terminal and pressing Enter.FileNotFoundError
: If you see an error message likeFileNotFoundError: [Errno 2] No such file or directory: 'kws.xlsx'
, it means that the Python script cannot find the Excel file. Make sure that your Excel files are in the same folder as your Python scripts and that the filenames in your Python scripts match the actual filenames.AttributeError
: If you see an error message likeAttributeError: 'float' object has no attribute 'split'
, it means that one or more of your keywords are recognized as a float (numeric) type, not a string. This can happen if your data contains numeric values or NaN (Not a Number) values, which are default interpreted as float by pandas. The Python scripts in this guide are designed to handle this issue by converting keywords to strings before splitting them.
Round Up
In this guide, we've walked through how you can use Python to categorize a list of keywords according to search intent. With a basic understanding of Python and a pre-prepared Excel file, you can automate this categorization process and gain valuable insights from your keyword data. Happy coding!
Feeling lost in all this technical SEO jargon and meta-marketing?
We can help. Reach out to us and we’ll help you stand out from the pack by optimizng your site and company’s SEO presence.
Support The Author: If you’ve found value in this article, please leave me a positive Google My Business review!
A positive review would really help go a long way! It’s super easy and only takes a few seconds. Everything is much appreciate, especially the positive feedback and how this content may have helped you. For any negative feedback or questions about this article, please reach out to me at hello@raredog.co and I’d be happy to help.