Screaming Frog is a website crawling tool used by SEO professionals. While it is a beloved tool by technical SEOs, it is often overlooked by content-focused SEOs. Today I want to share with you how this tool can be an asset for content teams and especially when doing content audits.
Table of Contents
Getting started with Screaming Frog
There are two versions of Screaming Frog; a free version and paid one.
If you want to crawl basic elements such as including broken links, page titles, and meta descriptions, or get a site visualization and the website is less than 500 URLs, then the free version will be sufficient.
If you are seeking deeper insights, API connections, the ability to crawl unlimited URLs, and custom extractions then the paid version is most appropriate. A licensed version includes a one-time payment per year and is priced generously for those using this tool regularly.
Unfortunately, this article is not a tutorial for absolute beginners, but we won’t leave you hanging. Here are some of our favorite tutorials for getting started with Screaming Frog.
Pre-Crawl Screaming Frog Tips for Content Audits
It is nearly impossible to make SEO suggestions for a website without doing a content audit. Our team has collectively done over 100 content audits and learned a few tricks along the way. We are going to share our favorite custom settings for Screaming Frog content audits, which we believe will save you loads of time allowing you to focus more on strategy.
The custom extraction feature is by far the most powerful and allows you to scrape specific elements from pages of a website that are not captured by basic crawling. This can include page elements such as publishing and modified date, blog categories and authors, or a count of H2s.
Screaming Frog has a great tutorial on how to use the custom extraction feature. Once you’ve made your way through the basics and successfully pulled a custom extraction then advance your way through Uproar’s guide sharing great examples using XPath and Regex extractions.
Here are a few examples of how we use custom extractions to elevate our content audits and presentations for clients.
Google loves fresh content, but it also appreciates consistency. By extracting the publishing and last modified dates for blog posts you are able to take that data and gain new insights.
Creating a pivot table to uncover the count of published posts per year can then be translated into a column chart. Now we are able to share a visualization of the publishing frequency of blog posts. This is helpful when sharing results with clients as opposed to listing off a number of posts per year.
This is also a great way to identify outliers, which may bring up new information. Below, for example, Flow SEO decoded the SEO successes of Drift which shows a significant increase in publishing in 2019 compared to the previous year. This can be attributed to a few reasons; an acquisition migrated another site into theirs or a budget increase for content creation.
If you have never done an SEO writer analysis for your website, you might not have even considered that some of your authors bring in significantly more organic traffic and leads than others. A custom extraction in Screaming Frog can pull author names from blog posts.
That data can then be used in a pivot table combined with other data such as keywords and Google Analytics to gain better insights into each writer’s performance and value. In the screenshot below, you may notice that the authors highlighted in green are the top performers, but if you had to choose just one – it would be difficult. As individuals, we have our own personal strengths.
This is a great opportunity to create a collaborative space for your writers to connect and share their expertise (ranking for keywords, getting conversions, etc) with each other.
Blogs are often the most archaic part of any website, they hold the skeletons and ghosts of product updates past. If you never had a blog strategy or considered what was posted there then I highly recommend including this in your content audit.
Extracting blog categories is great for understanding the balance of topics within posts. By using a simple pivot table and pie chart suddenly now you have a great visualization for yourself and the client. This can be further blended with data such as organic sessions or conversions to see which topics resonate with your audience the most.
Blog posts, as top-of-funnel content pieces, are often the first point of contact with your audience. Consider this when auditing your website as you’ll want to make a great first impression.
Connecting APIs for More Data
Our previous content audit workflow involved a lot of VLOOKUP stitching together a handful of reports from different platforms. Whenever any of those platforms updated said reports, our template would break down.
If you have Google Analytics and Google Search Console access to a domain, then you have the ability to connect the corresponding APIs to Screaming Frog to integrate more data into your crawls.
Depending on the account level of Ahrefs or Moz there is also the possibility to include this information, which is helpful when auditing your own sites or peeking in on a competitor.
Integrating these APIs into our content audits has elevated them, allowing us to provide clients with deeper insights while spending less time putting together spreadsheets.
Post-Crawl Screaming Frog Tips for Content Audits
Now you might find yourself with a huge spreadsheet with a ton of information. Some of it is incredibly valuable, and some of it is not so much. As this depends on the website and the purpose of your content audit, here are a few use cases we implemented at Flow SEO.
Conditional Formatting For Visual Insights
When combing over 1000+ rows and 30+ columns it can be overwhelming and difficult to catch errors. Conditional formatting is a useful way to quickly identify URLs that are lacking key on-page elements such as page titles, meta descriptions, or highlight pages that have not been updated in the past year.
If integrating Google Search Console into your crawl, conditional formatting can help identify pages that have not been crawled in the previous 30 days or URLs that are currently not on Google or contain issues.
Additionally, simple formulas such as =LEN to get title and meta description lengths can be paired with conditional formatting to identify on-page optimizations that don’t follow SEO best practices.
Using =MATCH to compare user-selected with Google-selected canonicals can identify any URLs that might be misunderstood by Google or may need adjustments made to their canonicals.
|Conditional Formatting Settings
|Title and Meta Description
|No one likes an empty title or meta description.
|Value is less than or equal to 10
|Inlinks refer to internal linking within the domain, which is useful for identifying orphan pages.
|Value is greater than or equal to 3
|Crawl depth is roughly click depth and best practices advise maxing out between 3-5.
|Date is earlier than one year ago
|Google likes fresh content that is up to date.
|Summary (from Google Search Console URL Inspection)
|Text contains “not” or “issues”
|Highlight URLs that Google does not include in search results.
|Coverage (from Google Search Console URL Inspection)
|Text contains “duplicate”, “not”, or “unknown”
Text contains “discovered”, “redirect”, “alternate”
|Alert any issues while Google is crawling new pages.
Site Architecture Visualizations
Before closing out your Screaming Frog crawl, there are valuable site architecture visualizations provided under the Visualizations menu. For sake of information overload, let’s talk about the crawl tree graph and force-directed crawl tree graph. Both options provide a visual representation of how Screaming Frog crawled a website, taking the shortest path to each page.
The force-directed crawl diagram shows a representation of the shortest crawl to each page, starting with the darkest green node. In this case, and most, it will be the homepage. The lighter the color of the node, the deeper the URL. Green nodes are indexable URLs while red nodes are not indexable.
The crawl tree graph is the same exact information as above but presented in a much easier-to-read and understandable way. Starting with the homepage on the far left and working to the right, each node is one crawl depth into the site and the lines represent the shortest link to the homepage.
This can help with identifying a range of technical issues a site may be experiencing. For example, you can easily spot URLs that might be not indexable but should be, which may be causing search visibility issues.
It can help identify opportunities for internal linking or cleaner site architecture. This is often the case with blogs that may have their categories set to no index, no follow.
Another analysis that can be done is identifying old URLs that have since been redirected, removed, or changed that can be updated on-page. This provides a much cleaner crawling experience for search engines which is rewarded provided the content is solid.
Take your content audits to the next level
Content audits should be the cornerstone of your SEO strategy. They provide information about how a website is structured and performing to allow you to make more informed decisions.
Simple content audits that examine only organic traffic and keywords will miss valuable information such as publishing frequency, content length, best-performing page types, key pages accidentally set to no index, or improper canonical usage.
These original analyses provide value to the client, beyond what is already available at their fingertips. It also provides an easy-to-follow guide of actionable steps and recommendations to make improvements to site visibility and performance.
Rather than placing a band-aid on page titles that are underperforming, identifying these page types, when they were published, and their authors, to understand where there may be knowledge gaps to address the root of the problem.
A content audit with deeper analysis and visualizations can be beneficial to get executive buy-in with proposed SEO strategies. This is through using a shared language, rather than speaking about 404s and canonicals, now you are speaking in terms of resources, investments, and strategies.
We LOVE all things content at Flow SEO, and regularly share our thoughts, tips, and expertise on our blog. You can get started here.