Crawl budget is the number of pages on a website that a search engine crawler crawls. After crawling, pages that are relevant, according to the bot, to user requests, are indexed. And only after that the site gets into the organic issue.
The crawl budget does not directly affect the ranking, but it is still important for promotion. If there are a lot of pages on the site, it’s not a fact that the bot will index all of them – it can index the wrong ones, and there won’t be enough time and budget for the important ones.
The crawl budget is not enough in several cases:
- if there are technical errors on the site, they should definitely be corrected in order to increase the crawl budget;
- if the robot spends the scanning volume on unnecessary sections, and skips the important ones. An important page will remain unscanned and will not even get to the hundredth position of the issue;
- if the site has abandoned and irrelevant content. Due to the fact that the site does not update information, bots will visit it less often, and the crawl budget will decrease.
Both a large and a small site can face a budget shortage situation.
Crawling all sites, depending on their size, is allocated a different budget. A small one – up to 1,000 pages – will most likely be allocated a smaller budget than a large site.
Most often, the minimum budget that is allocated initially is enough for a small site for good indexing. This means that all the necessary pages are indexed and are present in the search. A large site may well need an increase in budget.
Sites with history change their crawl budget with every crawl that occurs daily. PageRank, behavioral factors and the age of the resource affect the growth of the indicator.
In this article we will tell and show you how to determine the level of the crawling budget, This will help to analyze the sufficiency of the crawling budget, draw a conclusion about the presence of errors and move on to their further analysis and correction. Go!
Step 1. Determine the number of pages to be in the index
The pages that should be in the index are defined in ScreamingFrog. The service sees which pages have already been crawled – this will be the desired indicator.
If ScreamingFrog found these pages, then the bot will also find them and spend the crawl budget on them.
Step 2. Determine the number of times the robot hits the site
There are two ways to find out the number of times a robot hits a site: one is simpler, the other is more difficult. Let’s consider both.
Method 1 – using Google or Yandex services to calculate an approximate budget
Google Search Console and Yandex.Webmaster help determine the number of pages crawled by bots. In Google Search Console, you can find out information for a period of 90 days, while in Yandex you can choose any period.
The crawling budget, which we will find out thanks to information from Google Search Console and Yandex.Webmaster, will be approximate. This is due to the fact that these services count robots accessing any pages per day once.
That is, the robot will go to the wrong page once, find nothing there, and then go several more times in an attempt to find content there. We will not see all these attempts in the reports – only one robot call will be recorded, although in fact there may be 4-5 of them.
The method, although approximate, is ideal for sites with up to 50,000 pages. In our experience, such sites usually have a clear structure, and bots index all the necessary pages, so there is no need for complex analysis. If you need more details, you need to analyze the logs, which we will discuss in the next subparagraph.
To find out the crawl budget in Google Search Console, go to Settings → Crawl Statistics → Open Report.
In the report, we look at the overall crawl statistics and specifically at the “Total Crawl Requests” metric – we click on it to see the exact value.
You can also view other reports here. They are divided into groups by response, by file type and Googlebot, purpose. Here you can see where exactly the bot goes, what percentage of answers the site gives, how many bot visits correct pages and how many erroneous ones.
To find out the crawl budget using Yandex.Webmaster, select the “Indexing” → “Crawl Statistics” section. The service shows the number of hits to the site for the current day, but does not show the total indicator for the desired period.
Visual graphs and analysis of reports will help you identify possible problems during crawling, as well as track changes or problems in the crawling budget.
We recommend using site analysis using Google and Yandex services about once a week. This will help you quickly and easily understand the overall picture of the site and identify errors.
Method 2 – by analyzing server logs for accurate budget calculation
This method is more complicated than the option with Google and Yandex services, and will be a real must-have for sites with more than 70,00 pages – the larger the site, the more errors occur and, as a rule, it becomes more difficult to find them.
Log analysis allows not only calculating the exact crawling budget, but also looking at deeper data: bots follow the same path or different ones, when the bot went to which page, how deep it went, how many times it accessed the same page.
Logs are files containing information about the operation of a server or computer. Inside these files, the following data is collected:
- The IP address from which the visit took place;
- page address;
- GET request means getting data;
- page/2 – when the call was to the second page;
- server response code;
- the size of the transmitted information;
- general data about users (operating system, region, etc.).
To analyze logs, we recommend Screaming Frog Log Analyzer, LogViewer, Loggly. The Screaming Frog Log Analyzer program conveniently displays graphs with the number of calls of various robots.
We recommend checking the logs at least once every six months if the site has 50,000-70,000 pages and once every 3 months for sites with more than 70,000 pages. This is much more difficult than checking in Google and Yandex and requires special knowledge, but it helps to identify serious flaws in the technical part of the site.
For example, errors with response codes 300, 404 and 500 and incorrect site structure, due to which pages can be indexed for a long time, not get into the search and deprive you of conversions.
Step 3: Plug in the formula and determine your budget
Now you should have all the data on hand:
- the number of pages that should be in the index;
- the number of robots visiting the site.
You can start calculating data for the site.
First we need to figure out the average hits per day:
For example : Google Search Console determined that in 90 days, search engine bots accessed our site 6,051 times.
We calculate the average number of robots accessing the site per day:
6,051/90= 67 pages
Then it remains to calculate the crawl budget level:
The ScreamingFrog service determined that the site should have 150 pages in the index. Now we divide the resulting number of pages in the index by the average number of robot hits per day:
How to interpret the results:
- ≤ 3 — crawl budget is enough for the site;
- 4-10 – average result;
- > 10 – need to increase the budget.
If your crawl budget is less than 3, then you have a good crawl budget. If more than 3, but less than 10, then you need to work to eliminate errors. In this case, we recommend that you analyze the site in order to understand whether the budget is distributed correctly on the site.
And if the indicator is more than 10, then the site has vulnerabilities. A low crawl budget indicates that the site is likely to have technical errors. Also, the bot may not like the content on your site – for example, it does not respond to user requests.
In the case of our site, at the time of the audit, the crawl budget was 2.23, which is considered a good indicator – there is no reason to worry.
In the next part of the article, we will analyze in detail what actions to take to fix technical errors on the site and increase the crawl budget.