top of page

Revamping a Failed Redirect Map: A Case Study on Site Migration for a Packaging Machine Producer

  • Writer: Nadav Harari
    Nadav Harari
  • Oct 17, 2023
  • 6 min read

Updated: Oct 23, 2023

Picture this: You've just migrated to a new CMS, and instead of a smooth transition, you're hit with a plummet in organic traffic. Sound like a nightmare? It doesn't have to be. In this case study, I am not just giving you a lifeline—I am handing you a full-fledged toolkit. I've got three Python scripts that will be your new best friends:


1. A script that automates redirects based on the actual content of each URL.

2. A script that fetches snapshots of your old URLs using the Wayback Machine and the CDX Server API.

3. And a script that crafts redirects based on the concatenated title and meta-descriptions of each page.

Forget generic redirects that dump users onto your homepage; this is about creating a laser-focused redirect map that ensures every old URL points to its most relevant new counterpart. This is the recovery blueprint you wish you had before, and it's proven to reclaim lost traffic. Ready to turn that migration fiasco into a win? Let's get started.


Background


Audion has been a Dutch industrial packaging machine producer since 1947. Despite their long-standing reputation, they recently faced a significant challenge. They approached me when their organic traffic began declining in April of this year. They had migrated to a new CMS and changed all the URLs on their site. Instead of creating a proper redirect map to guide users to the new URLs, they redirected all pages to the homepage. This means that product pages, category pages, and other types of pages now redirect to the homepage instead of to their respective new URLs.



The Problem with the Existing Redirect Map


The first thing I did was to verify whether the URLs were actually being 301-redirected to their respective new URLs. I launched Screaming Frog and pasted the URLs into 'List mode.'

Indeed, they were 301-redirected, but not to their respective URLs on the new site. The majority redirected to the homepage, which is far from ideal.




Current redirect map from Screaming Frog
Screaming Frog - Current redirect map with all pages redirecting to the homepage

Why This is an Issue


The main issue with these redirects is twofold: misallocation of link authority and compromised search engine rankings. Each old page on your site has built up some level of authority and search engine rankings. Redirecting these to irrelevant pages means you're funneling that authority to the wrong place. This negatively impacts the performance of the new, relevant pages on search engine results pages, as both traffic and authority are being diverted.


The best practice is to ensure that each old page you redirect points to a new page with similar content. By aligning the old pages with new ones that meet the searcher's intent, you're more likely to preserve your search rankings post-site migration.



Tools and Approaches Considered


Creating a proper redirect map where thousands of URLs redirect to their respective versions on the new site is not a simple task. It's a complex endeavor that requires careful planning and execution. The manual option is out of the question. So, what could be a viable solution?


My initial approach

I decided to try an ‘Automate a redirect map’ script I found on SearchEngineLand.com. I created a modified version of the script that you can test in Google Colab. This version uses Python to scrape content from two TXT files of source and target URLs. It then compares the similarity of the content using PolyFuzz and finally exports the matched pairs of URLs to a CSV file. However, I hit a snag: the old URLs were already 301-redirected, making it impossible for me to scrape their content.

So, what was I to do?


The Wayback Machine Dilemma


What is the Wayback Machine?


The Wayback Machine is a digital archive that lets you see what websites looked like in the past. Think of it as a time machine for the internet. You enter a website's URL, and it shows you snapshots of that website from different dates. It's useful for researching old content, checking changes made to websites over time, or recovering information that has been removed or updated.


How Did I Plan to Use It?

I decided to write a Python script that fetches the most recent snapshots of URLs from archive.org/web/. My thought was that if I had these old URL versions, I could then scrape their content using the ‘Automate a redirect map’ script from SearchEngineLand.com. However, this theory was debunked when the Web Archive API blocked my requests after processing only a handful of URLs.



Discovering the CDX Server API

The CDX API (Capture Index API) is a specialized web service designed to provide programmatic access to web archive data. It's often used in conjunction with web archiving services like the Wayback Machine. Unlike some other APIs that limit the number of requests, the CDX API is generally more accommodating when it comes to handling a large volume of queries, making it ideal for extensive projects.


With this API, I was able to fetch all of the historical snapshots of URLs along with their most recent dates. Here is a link to the script I used in Google Colab. It takes a TXT input file with URLs and returns an output file with the most recent Wayback Machine URLs corresponding to the original URLs.



Other Challenges and How I Dealt with Them


1. Dealing with More Than One Locale

The old URLs, as well as the URLs on the new website, target multiple languages like Dutch and German. I needed to find a way to create redirect maps for each locale.

The Solution:

I created three redirect maps, one for each locale. I ensured that specific languages were matched with their corresponding versions. For example, /nl/ URLs were matched against /nl/ URLs on the new website. The same was done for /de/ URLs. For other languages that are not supported on the new website, I redirected them to their /en/ version.



2. Thin content


I wanted to use the original ‘Automate a redirect map’ script from SearchEngineLand.com that I shared earlier. I placed the URLs in Screaming Frog and extracted all the content placed within the HTML <p> tags using the custom extraction feature.


Screaming Frog Extraction Settings
Screaming Frog Extraction Settings


Here is the final outcome, found in the ‘Custom Extraction’ tab.

Screaming Frog - Content Extraction Output
Screaming Frog - Content Extraction Output

However, many of the old pages lacked substantial content. In some cases, the same content appeared across multiple pages, such as the address and contact information. This led to issues of duplicate content when running the original 'Automate a redirect map' script, resulting in false positives. I needed to find another way to create a redirect map.


The solution:


Switching to Title Tags and Meta Descriptions.

I opted to compare the concatenated title tags and meta descriptions of each page, using them as the primary content for comparison. The reason is that this meta content was more descriptive and unique to each page.


Here's How I Did It:

a. I used this tool to scrape the titles and meta descriptions from both the old URLs (Web Archive snapshots) and the new articles.


b. I created an xlsx file with 4 columns: Column A - Web Archive URLs (Source URLs) Column B - Web Archive URLs Concatenated titles and meta-description (Source URLs Content)

Column C - New URLs (Target URLs)

Column D - New URLs Concatenated titles and meta-description (Target URLs Content)

I used the 'Concat' formula to concatenate both the titles and meta-descriptions.


Input XLSX file with 4 columns
Input XLSX file with 4 columns

c. I created a different version of the script that takes the input XLSX file I created in the previous stage and applied the same PolyFuzz Python library to create sentence pair matches.

Here is the script - Link.


Here is a sample of the outcome xlsx files for the /nl/ Locale.

Final output file of the Python script for the /nl/ subdirectory.
Final output file of the Python script for the /nl/ subdirectory.

Key Takeaways


Preparing a redirect map is crucial to maintain organic traffic levels and rankings.

If you're facing a future migration, here are some key takeaways:

  • Create a redirect map that creates pairs of URLs that are closely related. If there is not a decent alternative, redirect to the category page. If there's still no option, redirect to the homepage.

  • If there are a lot of URLs to assess and a manual assessment is not an option, use an automated way to create the redirect map. Here is my free Python script based on the script from Searchengineland.com.

  • If you need to create a redirect map after a bad implementation that you cannot revert, use the Web Archive CDX API to fetch the most recent snapshots of URLs. This will allow you to use their content for the new redirect map. Here is the link to my Python script in Google Colab for fetching the Wayback Machine URL snapshots using CDX API.

  • If the content of old URLs is not optimized, or if it's duplicated or thin, consider using titles and meta-descriptions as your primary content instead.

  • Use this script to create a redirect map. The input should be an xlsx file containing two pairs of URLs along with their corresponding content. The output file will be an xlsx file with matching source and target URLs together with their similarity score.

Conclusion


Navigating the complexities of a website migration, especially one that involves multiple GEOs and thousands of URLs, is no small feat. This case study has shown that even when faced with a poorly executed initial redirect map, there are innovative solutions available. By leveraging powerful tools like Python scripts, the Wayback Machine, and the CDX Server API, we were able to rectify the issue and create a more effective redirect map. The key lies in being adaptable and willing to pivot your strategy when faced with unexpected challenges. Whether you're preparing for a future migration or trying to clean up after one, the methods and tools discussed here offer a robust framework for maintaining your site's SEO health.



bottom of page