top of page

Mastering Keyword Density: An Automated Python Approach for Modern SEOs

  • Writer: Nadav Harari
    Nadav Harari
  • Oct 19, 2023
  • 5 min read

Updated: Oct 31, 2023


Ever wondered how to effortlessly analyze keyword density across multiple articles without manually sifting through each one? What if I told you that you could do it all in one go, right from the comfort of Google Colab? Intrigued? You should be. In this article, I'll walk you through a Python script that's not just a game-changer—it's your new secret weapon. Simply upload an Excel file with a list of URLs and your target keyword, and let the magic happen. This is the exact method I've used to optimize content strategies for top-tier clients, and now I'm sharing it with you. Ready to elevate your SEO game? Let's dive in.


What is Keyword Density?

Keyword density refers to the percentage that a specific keyword or phrase appears in a web page's content compared to the total number of words. In simpler terms, it's a metric that shows how often a keyword is used.


Why does this matter for SEO?

Search engines, in their quest to deliver the most relevant content to users, analyze various elements of web pages, and keyword density is one of them. A keyword's optimal presence signals to search engines that the content is relevant to that particular keyword, potentially boosting the page's ranking for that term.

By understanding how frequently a keyword appears in your content, you can gauge whether you're adequately addressing a particular topic or query.

On the other hand, over-optimization, or keyword stuffing, occurs when a keyword is used excessively, making the content sound unnatural which can negatively impact a website's rankings on search engine results pages.


As a rule of thumb, keyword density can be considered 'keyword stuffing' if the percentage of keyword density is 2% or more of the total word count on the page.



Quality Content vs. Keyword Stuffing

The SEO landscape has evolved significantly over the years. Gone are the days when cramming keywords into content could guarantee high search rankings. Modern search algorithms prioritize user experience, rewarding content that is genuinely valuable, engaging, and relevant. Keyword stuffing, a tactic once widely used, is now penalized as it degrades user experience. Today's best practice is to focus on creating high-quality content that naturally integrates keywords, ensuring it meets users' needs while also being search engine-friendly.

As a rule of thumb, keyword density can be considered keyword stuffing if the percentage of keyword density is 2% or more of the total word count on the page.


Introducing the Python Script

My Keyword Density Calculator automates the task of calculating keyword density for any given web page. Simply provide the URLs and target keywords, and the script will fetch the content, analyze it, and provide you with detailed insights on keyword occurrences, total word count, and, of course, the keyword density.



Additional Features: User Agent, Cloudscraper and Content Filtering

User Agent: Websites often employ mechanisms to differentiate between regular users and automated bots. By using a user agent, my script mimics the behavior of a real browser, ensuring that it accesses web content without drawing undue attention or being flagged. Additionally, you can whitelist your own user agent in Cloudflare for use with this script.

Simply add your own user agent string within the quotation marks: user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"

Cloudscraper Integration: With the rise of protective measures like Cloudflare, it's become increasingly common for bots or scrapers to be blocked. Cloudscraper helps bypass such anti-bot measures, ensuring that the script retrieves the required content seamlessly.


Focused Content Analysis: Not all parts of a webpage are relevant when analyzing content for SEO. Headers, footers, and navigation elements often contain repetitive phrases or links that can skew keyword density calculations. My script is designed to ignore these sections, honing in on the main content body for a more accurate and relevant keyword density analysis. This is the part of the script that accomplishes this: for tag in soup.find_all(['header', 'footer', 'nav']):


If your site uses different classes than those mentioned above, simply add them within the parentheses.


Tools and Libraries Used


This script harnesses the power of several Python libraries to perform its magic:


  • pandas: A powerful data manipulation library, pandas helps in organizing and processing the data efficiently.

  • BeautifulSoup: This library is essential for parsing HTML content, allowing the script to extract and analyze text from web pages.

  • cloudscraper: In the age of advanced bot detection mechanisms like Cloudflare, cloudscraper steps in to ensure smooth content retrieval without being blocked.

  • nltk: The Natural Language Toolkit assists in tokenizing the content, breaking it down into individual words and sentences for accurate analysis.


Step-by-Step Guide to Using the Script


For those new to this, here's a detailed guide to get you started:


Pre-requisites:


Google Colab: An online platform that allows you to run Python scripts in the cloud. No setup required, and it's completely free.

Input Excel File: This should contain the URLs you want to analyze in one column and the corresponding keywords in another. Ensure this file is saved in Excel format.


Walkthrough:


Installing Necessary Libraries:


  1. Launch Google Colab and create a new Python notebook.

  2. Copy and paste the script into the first cell.

  3. Run the cell. This will install all the required libraries such as pandas, BeautifulSoup, nltk, and cloudscraper.


Uploading the Input File:


  1. After the libraries are installed, the script will prompt you to upload your input Excel file.

  2. Click on the 'Choose Files' button and select your Excel file from your computer. The file will then be uploaded to the Colab environment.


Keyword density input XLSX file with 2 columns
Keyword density input XLSX file with 2 columns


Running the Script:


  1. Once the file is uploaded, the script will automatically continue its execution.

  2. It will fetch content from each URL, calculate keyword densities, and display some debug information like word counts and keyword occurrences.

Interpreting the Output CSV File:


  1. After the script finishes running, it will generate a CSV file named 'keyword_density.csv'.

  2. This file will automatically be downloaded to your computer.

  3. Open it using any spreadsheet software (like Microsoft Excel or Google Sheets). The file will contain columns for the URL, targeted keyword, occurrences of the keyword, total word count of the page, and the keyword density presented as a percentage.

  4. Use this data to evaluate your content's SEO optimization and make any necessary adjustments.


Keyword density output CSV file with 5 columns
Keyword density output CSV file with 5 columns


By following these steps, you'll be able to harness the power of my Python script and gain valuable insights into your content's keyword density, helping you refine and perfect your SEO strategy.


Conclusion


My Keyword Density Calculator, designed to calculate keyword density, stands as a testament to the power of automation in modern SEO practices. Gone are the days when manual calculations and guesswork sufficed. Today, precision, speed, and efficiency reign supreme, and tools like this pave the way for such advancements.


But like all tools, its true potential is unlocked when adapted to specific needs. Whether you're an individual blogger, an SEO professional at an agency, or a business owner, I encourage you to mold this script, refine it, and make it your own. SEO is as much an art as it is a science, and having the right tools in your arsenal is half the battle won.



FAQ

Does the script distinguish between lowercase and uppercase keywords?

Yes, before analyzing the content or keywords, the script converts both the text from the webpage and the target keywords to lowercase. This ensures that the keyword density calculation is case-insensitive and will count occurrences of a keyword regardless of how it's capitalized in the content or the input file.

Do I need to know how to code in order to use this script?

Do I need to install anything on my computer to run the script?

Why does the script take so long to run when I upload an xlsx file with a lot of URLs?




About the Author

I am Nadav Harari, an SEO specialist with a passion for data analysis and digital marketing. Feel free to contact me at Nadav@hararidigital.com or follow me on LinkedIn.



bottom of page