The Problem With Traffic Growth
At HubSpot, we grow our organic traffic by making two trips up from the laundry room instead of one.
The first trip is with new content, targeting new keywords we donât rank for yet.
The second trip is with updated content, dedicating a portion of our editorial calendar to finding which content is losing the most traffic â and leads â and reinforcing it with new content and SEO-minded maneuvers that better serve certain keywords. Itâs a concept we (and many marketers) have come to call âhistorical optimization.â
But, thereâs a problem with this growth strategy.
As our websiteâs traffic grows, tracking every single page can be an unruly process. Selecting the right pages to update is even tougher.
Last year, we wondered if there was a way to find blog posts whose organic traffic is merely âat riskâ of declining, to diversify our update choices and perhaps make traffic more stable as our blog gets bigger.
Restoring Traffic vs. Protecting Traffic
Before we talk about the absurdity of trying to restore traffic we havenât lost yet, letâs look at the benefits.
When viewing the performance of one page, declining traffic is easy to spot. For most growth-minded marketers, the downward-pointing traffic trendline is hard to ignore, and thereâs nothing quite as satisfying as seeing that trend recover.
But all traffic recovery comes at a cost: Because you canât know where youâre losing traffic until youâve lost it, the time between the trafficâs decline, and its recovery, is a sacrifice of leads, demos, free users, subscribers, or some similar metric of growth that comes from your most interested visitors.
You can see that visualized in the organic trend graph below, for an individual blog post. Even with traffic saved, youâve missed out on opportunities to support your sales efforts downstream.

If you had a way to find and protect (or even increase) the pageâs traffic before it needs to be restored, you wouldnât have to make the sacrifice shown in the image above. The question is: how do we do that?
How to Predict Falling Traffic
To our delight, we didnât need a crystal ball to predict traffic attrition. What we did need, however, was SEO data that suggests we could see traffic go bye-bye for particular blog posts if something were to continue. (We also needed to write a script that could extract this data for the whole website â more on that in a minute.)
High keyword rankings are what generate organic traffic for a website. Not only that, but the lionâs share of traffic goes to websites fortunate enough to rank on the first page. That traffic reward is all the greater for keywords that receive a particularly high number of searches per month.
If a blog post were to slip off Googleâs first page, for that high-volume keyword, itâs toast.
Keeping in mind the relationship between keywords, keyword search volume, ranking position, and organic traffic, we knew this was where weâd see the prelude to a traffic loss.
And luckily, the SEO tools at our disposal can show us that ranking slippage over time:

The image above shows a table of keywords for which one single blog post is ranking.
For one of those keywords, this blog post ranks in position 14 (page 1 of Google consists of positions 1-10). The red boxes show that ranking position, as well as the heavy volume of 40,000 monthly searches for this keyword.
Even sadder than this articleâs position-14 ranking is how it got there.
As you can see in the teal trendline above, this blog post was once a high-ranking result, but consistently dropped over the next few weeks. The postâs traffic corroborated what we saw â a noticeable dip in organic page views shortly after this post dropped off of page 1 for this keyword.
You can see where this is going ⦠we wanted to detect these ranking drops when theyâre on the verge of leaving page 1, and in doing so, restore traffic we were âat riskâ of losing. And we wanted to do this automatically, for dozens of blog posts at a time.
The âAt Riskâ Traffic Tool
The way the At Risk Tool works is actually somewhat simple. We thought of it in three parts:
- Where do we get our input data?
- How do we clean it?
- What are the outputs of that data that allow us to make better decisions when optimizing content?
First, where do we get the data?
1. Keyword Data from SEMRush
What we wanted was keyword research data on a property level. So we want to see all of the keywords that hubspot.com ranks for, particularly blog.hubspot.com, and all associated data that corresponds to those keywords.
Some fields that are valuable to us are our current search engine ranking, our past search engine ranking, the monthly search volume of that keyword, and, potentially, the value (estimated with keyword difficulty, or CPC) of that keyword.
To get this data, we used the SEMrush API (specifically, we use their âDomain Organic Search Keywordsâ report):

Using R, a popular programming language for statisticians and analytics as well as marketers (specifically, we use the âhttrâ library to work with APIs), we then pulled the top 10,000 keywords that drive traffic to blog.hubspot.com (as well as our Spanish, German, French, and Portuguese properties). We currently do this once per quarter.
This is a lot of raw data, which is useless by itself. So we have to clean the data and warp it into a format that is useful for us.
Next, how do we actually clean the data and build formulas to give us some answers as to what content to update?
2. Cleaning the Data and Building the Formulas
We do most of the data cleaning in our R script as well. So before our data ever hits another data storage source (whether that be Sheets or a database data table), our data is, for the most part, cleaned and formatted how we want it to.
We do this with a few short lines of code:

What weâre doing in the code above, after pulling 10,000 rows of keyword data, is parsing it from the API so itâs readable and then building it into a data table. We then subtract the current ranking from the past ranking to get the difference in ranking (so if we used to rank in position 4, and we now rank 9, the difference in ranking is -5).
We further filtered so we only surface those with a difference in ranking of negative value (so only keywords that weâve lost rankings for, not those that we gained or that remained the same).
We then send this cleaned and filtered data table to Google Sheets where we apply tons of custom formulas and conditional formatting.
Finally, we needed to know: what are the outputs and how do we actually make decisions when optimizing content?
3. At Risk Content Tool Outputs: How We Make Decisions
Given the input columns (keyword, current position, historical position, the difference in position, and the monthly search volume), and the formulas above, we compute a categorical variable for an output.
A URL/row can be one of the following:
- âAT RISKâ
- âVOLATILEâ
- Blank (no value)

Blank outputs, or those rows with no value, mean that we can essentially ignore those URLs for now. They havenât lost a significant amount of ranking, or they were already on page 2 of Google.
âVolatileâ means the page is dropping in rank, but isnât an old-enough blog post to warrant any action yet. New web pages jump around in rankings all the time as they get older. At a certain point, they generate enough âtopic authorityâ to stay put for a while, generally speaking. For content supporting a product launch, or an otherwise critical marketing campaign, we might give these posts some TLC as theyâre still maturing, so it is worth flagging them.
âAt Riskâ is mainly what weâre after â blog posts that were published more than six months ago, dropped in ranking, and are now ranking between positions 8 and 10 for a high-volume keyword. We see this as the âred zoneâ for failing content, where itâs fewer than 3 positions away from dropping from page 1 to page 2 of Google.
The spreadsheet formula for these three tags is below â basically a compound IF statement to find page-1 rankings, a negative ranking difference, and the publish dateâs distance from the current day.

What We Learned
In short, it works! The tool described above has been a regular, if not frequent addition to our workflow. However, not all predictive updates save traffic right on time. In the example below, we saw a blog post fall off of page 1 after an update was made, then later return to a higher position.

And thatâs okay.
We donât have control over when, and how often, Google decides to recrawl a page and re-rank it.
Of course, you can re-submit the URL to Google and ask them to recrawl (for critical or time-sensitive content, it may be worth this extra step). But the objective is to minimize the amount of time this content underperforms, and stop the bleeding â even if that means leaving the quickness of recovery to chance.
Although youâll never truly know how many page views, leads, signups, or subscriptions you stand to lose on each page, the precautions you take now will save time youâd otherwise spend trying to pinpoint why your websiteâs total traffic took a dive last week.
Originally published Jun 11, 2020 7:30:00 AM, updated June 11 2020
Topics: