Mastering Entity Gap Analysis with Google NLP and Python
- Nadav Harari
- Nov 20, 2023
- 7 min read
Struggling with on-page SEO and not seeing the expected results? This is a familiar issue, particularly in niche markets where quality content is key. Enter the Google NLP API, a tool that goes beyond old-school keyword tactics, harnessing Google's advanced Natural Language Processing technology to revolutionize your SEO approach.
This article unveils a custom Python script that aligns your content with Google's intricate language understanding. Dive into a strategy where your content doesn't just engage your audience but also harmonizes perfectly with Google's algorithms. Ready to elevate your SEO game with the Google NLP API? This script is designed for SEO pros ready to make a real impact.
Key Points
1. Learn how to access Google Cloud, enable the NLP API, and download it as a JSON file.
2. Try the script with a simple input Excel file containing a list of URLs for analysis.
3. Utilize the script's output file to review comprehensive analysis results, including sentiment, NLP entities, and content categories for each URL.
4. Employ Excel's conditional formatting to pinpoint NLP entity gaps in your content compared to competitors'.
Get ready to outsmart the competition! 🤓
What is Google NLP API and How Can You Access It?
Google NLP API is a sophisticated tool that represents a cutting-edge application of artificial intelligence, enabling machines to understand, interpret, and respond to human language in a way that's both insightful and practical.
Google's NLP API is designed to analyze text and extract meaning from it. It can perform several tasks that are pivotal for SEOs and marketers, including:
Sentiment Analysis: Determining the overall sentiment of a text, whether it's positive, negative, or neutral. This feature can help you uncover the prevailing sentiment around your brand, product, or service, and strategize accordingly.
"Score of the sentiment ranges between -1.0 (negative) and 1.0 (positive) and corresponds to the overall emotional leaning of the text.” Natural Language API Basics.
Entity Recognition: Identifying and categorizing entities like people, organizations, and locations mentioned in the text. This analysis is crucial for fine-tuning your content to emphasize certain entities or to grasp the focal points in competitors' content.
"Entity Analysis provides information about entities in the text, which generally refer to named "things" such as famous individuals, landmarks, common objects, etc.” Natural Language API Basics.
Content Classification: Categorizing content into a range of topics, which helps in understanding the broader context of the text. You can compare your content category with the one for your competitor to see if there is any discrepancy. These categories range in specificity, from broad categories like /Computers & Electronics to highly specific categories such as /Computers & Electronics/Programming/Java (Programming Language).
"The Natural Language API filters the categories returned by the classifyText method to include only the most relevant categories for a request. For instance, if /Science and /Science/Astronomy both apply to a document, then only the /Science/Astronomy category is returned, as it is the more specific result.” Classifying Text.
How to Access Google NLP API
Gaining access to Google's NLP API involves a few steps but is quite straightforward:
1. Google Cloud Account: First, you'll need a Google Cloud account. If you don't have one, you can easily sign up for it. Google often offers new users a free trial with credit to start using its cloud services.
2. Creating a Project: Once you have your Google Cloud account, create a new project in the Google Cloud Console. This project will be where you'll use the NLP API.
3. Enable the API: In your Google Cloud project, navigate to the 'API & Services' dashboard and enable the Natural Language API. This step is crucial as it grants your project the ability to use the API.

4. Set up Authentication: For the script to access the API, you'll need to set up authentication. This is done by creating service account credentials in the form of a JSON file. This file contains the necessary keys and tokens to programmatically access the API.
a. Creating a Service Account:
In your Google Cloud Console, navigate to the 'IAM & Admin' section and select 'Service Accounts'.

Click on 'Create Service Account' and enter a name that easily identifies the purpose of this account (e.g., 'nlp-api-service-account').
Grant the necessary roles to the service account. For using the NLP API, roles like 'Owner' or 'Editor' can be assigned, but it's generally recommended to follow the principle of least privilege—assigning only the permissions necessary to perform its intended functions.
b. Generating the Service Account Key:
Once the service account is created, you need to generate the keys for it.
In the service account details, click on 'Add Key' and then select 'Create new key'.
Choose 'JSON' as the key type. This will trigger the download of a JSON file to your computer.

Introducing the Script: A Revolution in Content Analysis
Now that you've set up the Google NLP API and understand its benefits for analyzing sentiment, entities, and content categories, allow me to introduce my script. It's tailored to perform these analyses on a list of URLs you provide. After the script runs, you will receive an output file containing:
Sentiment analysis for each URL, including both yours and your competitors'.
A list of all NLP entities for each URL, sorted from the highest to lowest salience scores (plus, I'll show you how to conduct an entity gap analysis using an Excel formula).
Content categorization for each URL, along with their respective salience scores.
Input file
Upload an Excel file (.xlsx) containing a list of URLs, starting from cell A2. This list can include URLs from your website and competitors' sites that you wish to analyze for sentiment, entities, and categories using Google's NLP API.

Executing the Script
Now that you have prepared the input file, let's guide you through using the script. Simply click the button below to get started:
Run the Script: Press the play button on each cell to execute the script. This will automatically install necessary libraries and run the script's code.
Upload the Input Excel File: Click on the folder icon in the sidebar. Use the upload button to upload your Excel file with URLs.
Upload JSON Credentials: In the same sidebar, use the upload button to upload the JSON file with your Google Cloud credentials.
Wait for the script to finish executing and perform the analysis.
Download the Results: download the file by right-clicking on the file in the sidebar and choosing 'Download'.
That’s it. You now possess the output file!
Let's analyze the results.
Output file
The script generates an Excel file with three distinct tabs, each providing specific insights derived from the analysis. Here's what each tab contains:
Tab 1: Analysis Results
This tab contains:
Column A - URL: Lists each URL that was analyzed.
Column B - Content: Shows a summary or the entire content fetched from the URL.
Column C - Sentiment Score: Displays the sentiment analysis result, indicating the overall emotional tone of the content (positive, negative, or neutral). The score of the sentiment ranges between -1.0 (negative) and 1.0 (positive) and corresponds to the overall emotional leaning of the text.

Tab 2: Entity Details
This tab contains:
Column A - URL: The URL of the content where the entities were found.
Column B - Entity Name: Names of the entities (like people, organizations, places) identified in the content by Google's NLP API.
Column C - Entity Type: The type or category of each entity.
Column D - Salience Score: A numerical score indicating the importance or prominence of each entity within the content. For each URL, the script sorts the entities' salience scores in descending order, from the highest to the lowest, to facilitate easier analysis.

Tab 3: Category Details
This tab contains:
Column A - URL: The URL of the content being classified.
Column B - Category Name: The name of the category or categories that Google's NLP API has assigned to the content.
Column C - Confidence Score: A score representing the confidence level of the API in its categorization.

How to Analyze the Output File Like a Pro
Let’s say you're an affiliate for 'Ahrefs', an SEO software company, and you aim to boost the Google rankings of your page, example.com/ahrefs-review/. You input the top competing URLs for the keyword "Ahrefs review" into the input file and run the script. Once the output file is ready, here's how to interpret the results:
1. Sentiment Analysis (First Tab): You'll see the sentiment results for each URL. For instance, the URL on Reddit shows a negative sentiment score of -0.4. Review this URL to understand the cons of the product and consider incorporating this feedback into your content. If you're the owner of Ahrefs, this is actionable data to address these reviews.

2. Entity Analysis (Second Tab): This tab helps you identify the top entities for each URL by salience score, sorted in descending order. You can easily compare the dominant entities in your URL with those of competing URLs. But what if there are hundreds of NLP entities for each URL? How can you perform an entity gap analysis efficiently? Follow these steps:
Select the Range in Column B: Start from B2 and go down to the end of the actual data.
Apply Conditional Formatting:
Go to the Home tab, click on Conditional Formatting, and select 'New Rule'.
Choose 'Use a formula to determine which cells to format'.
Use this formula:
Replace "YOUR URL HERE" with your website's URL.
Adjust the range in the formula to match your data (e.g., change 1000 to 1500 if your data ends at B1500).
Click ‘Format’, choose ‘Fill’, and select a color to highlight the missing entities for your URL.

Now, you've highlighted all the missing entities for your URL, sorted by salience scores.

3. Category Details (Third Tab): This tab presents the categories assigned by Google's NLP API to your content versus your competitors'. It's particularly useful when you notice significant discrepancies between your content and that of your competitors.
Wrapping up
The Google NLP API, empowered by my custom Python script, revolutionizes SEO by moving beyond traditional keywords towards a deeper understanding of content.
This tool's capabilities in sentiment analysis, entity recognition, and content classification provide invaluable insights for optimizing not just for search engines, but for real user engagement.
The script's detailed outputs, from identifying entity gaps to analyzing emotional tones, equip you with a powerful, AI-driven approach to refine your SEO strategy. With the incredible combination of Google's NLP technology and the analytical power of Python, you're all set to take your strategies to a whole new level. Dive in and see the magic happen!
FAQ
What's the cost of the Google NLP API?
The costs can be found on this page. The rates are very affordable, and there's a certain quota available each month free of charge.
Can I use the script without setting up the NLP API in Google Cloud?
What if the script cannot fetch the content for some of the URLs in the input file?
About the Author
I am Nadav Harari, an SEO specialist with a passion for data analysis and digital marketing. Feel free to contact me at Nadav@hararidigital.com or follow me on LinkedIn.