This is an article for those who want to fix technical flaws on the site, but do not know where to start. Follow our advice and search robots will see on your site what they should and what they should not see.
In an ideal world, the number of pages of a site that should be in the index is equal to the number of pages of the site itself. But that doesn’t happen. Much more often, the crawling budget is spent on old and unclaimed pages, the more important ones go unnoticed by robots and do not get into the search results.
Optimization of the crawling budget is necessary in order not to waste it, but to attract crawling bots to important and necessary sections and pages, to exclude all rubbish from the index.
In the first part of the article , we talked about how to calculate the crawl budget, and in this part, we will focus on tips that will help prevent or eliminate technical errors on the site. This optimizes your crawl budget and will positively affect your rankings.
Grow your link mass
Backlinks that point to our site from other sources help establish trust with search engines and improve the authority of the page, which leads to increased site authority.
This is one of the most important SEO factors . The presence of backlinks on the page will show the search engine that the site is trusted. The crawler will visit these pages more often and the crawl budget will increase.
There are many ways to get links from other sites. Here are the most popular:
- buy links on exchanges;
- develop crowd marketing – post links on forums and blogs.
When it comes to crowd marketing, links should only be placed on trusted sites that search engines trust. This should be done as naturally as possible – without anchor. And even if it seems to the user that the anchor link looks more natural, alas, search engines think the opposite – they value non-anchor links.
Increase site speed to speed up page crawling by bots
The faster the site loads, the faster the bot will crawl it. This will affect the number of processed URLs – it will increase. The crawl budget tends to change in direct proportion to the time spent per page.
Site speed can be checked in Google PageSpeed Insights. The service will prompt specific actions that can be taken to increase the download speed.
On one of our projects, we found that the bot spent 6 seconds checking one page. This is quite a lot – remember, the user closes the page after about 3 seconds. We aim to reduce this time to 2 seconds.
In Google PageSpeed Insights, the results were poor in 2 out of 6 indicators: the service noted the large weight of the images and the inappropriate format – jpeg instead of webp and avif.
We delved into the issue in detail and found out: some of the product cards were indexed, and some were not – due to the fact that each of them had at least five images. The number of photos, their weight and format made it difficult for bots to scan.
To increase the loading speed, we used accelerated pages. And for quick loading of pictures, we compressed and converted them, then saved them in the ratio of 0.25x, 0.5x, 0.75x, 1x, and then set up tags to display the picture in accordance with the screen size of the device.
A month and a half later, we noted that all product cards began to be indexed in search engines. The bot started spending 2 seconds to check one page. And since the site already had a lot of useful content that met user requests, this, coupled with an increase in download speed, affected the crawling budget – it increased by 3 times.
Get rid of problematic response codes so that bots don’t waste time checking them
If the content is present on the page, the response code will be 200 (“OK”). If a redirect to another page is required, the code will be 301 (“Go here instead”). These codes are considered ideal as they lead the bot to useful content.
But there are also problematic codes that do not show content, but simply waste the bot’s time. These include:
- 302 redirect – content is temporarily unavailable, but it will definitely appear. For example, in the summer you removed the section with fur coats and down jackets. And the robot will knock on your site all summer long, waiting for content on these pages. If your site has pages with a 302 response code, we recommend checking them and setting them up for 301 redirects. Over time, 301s will stop being crawled and replaced with targeted ones, and the bot will not check the page each time waiting for content.
- 404 code – page not found. In fact, these are broken links that you need to get rid of. The bot will periodically visit these pages, because it is given a signal that, perhaps, the page will appear later.
- 500 code – the server is unavailable. This is a clear sign of a low-quality resource, which the robot will visit less and less to scan.
Such codes waste the time of the crawler to determine their answer. This leads to the fact that the bot does not have enough time to bypass the necessary pages.
We also recommend removing low-value pages from the site. You can check their availability in Yandex.Webmaster. The service will consider a page of little value if it is a duplicate, does not contain content visible to the robot, or the content is simply not in demand.
Go to “Indexing” → “Pages in search” → “Excluded”.
Then we find in the list the indicator “Low-value or low-demand page”. On the project to promote an online clothing and footwear store, he turned out to be 3.77%. This is not a bad result and it was represented mainly by broken links (code 404), which we subsequently removed.
If your indicator is from 20%, we recommend that you sound the alarm – it is likely that duplicate pages and hidden content will be added to broken links. When a quarter of a site is low-value pages, bots can lose trust in it. It is necessary to identify the causes as soon as possible and eliminate them in order not to lose in the indexing.
Update robots and sitemap so that the robot understands the structure of the site
Robots and sitemap are xml files that help search bots crawl the site correctly and better understand its structure:
- Garbage pages, duplicates, pagination pages are closed in the robots.txt file. Robots will understand that there is no need to spend time checking them. This way we will save the crawling budget.
- The sitemap file contains all the pages of the site that need to be indexed. It is advisory in nature and makes the robot understand the priority for scanning.
On one of the projects that came to us for an audit, we found a complete absence of robots and sitemap. Crawling and indexing of the site was very slow and inefficient due to moving from one domain to another and a large number of redirects. It was imperceptible to users, and robots tried to crawl all pages, spending the budget on it.
After the introduction of robots and sitemap, the number of robots visiting the site increased from 100 to 300. The crawling budget increased by 3 times, which improved the crawling of the site as a whole.
Remove redirect chains
Redirects remain a great solution to redirect problems:
- If there are duplicate pages, they make it clear to the robot which page to look at. If earlier the bot indexed two pages for one request, now it will know that there is one traffic page, and all the rest have moved to it.
- It will help to send a bot or user to a live page. Relevant if we deleted a page with errors (code 404), but with good behavioral factors and which were viewed by bots. This is how we keep traffic on the page we need.
However, when the robot receives a 301 redirect, it will go through all the URLs in the chain and use up your crawl budget. We explain – the bot will see the first link, and the second – after switching to it. And only after that it will go to the page with the correct URL.
A chain of redirects will confuse the robot and will not allow it to immediately get to the desired page. Again, we are talking here about several pages with a 301 code, and not about a single redirect.
And imagine that there will be many such chains – this is not noticeable to the user, but the robot will be forced to move from link to link in order to find the desired page.
Over time, redirect pages go out of the view of search engines and only the final URLs are indexed. This allows you to save your crawl budget.
Set up relinking to important pages of the site
When crawling and indexing a site, the bot most often prefers pages that have weight. To create it, you need to set up linking between pages.
We use unique and varied anchors with keywords and add links to relevant pages. According to our experience, the optimal number of internal links per page is from 7 pieces.
A good structure enhances the importance of pages by directing link juice to the right section with the help of linking. This helps crawlers find the right pages without wasting their budget, and helps the user quickly reach the right page. This improves the site’s usability and behavioral metrics, which will signal the PS to increase the budget.
Use the Last-Modified header to tell crawlers about changes to the page
The Last-Modified header tells the user’s browser or Web crawler information about the date and time the current page was last modified.
Here’s what it looks like:
Last-Modified pattern: <weekday>, <day> <month name> <year> <hour>:<minute>:<second> GMT
Example Last-Modified: Thu, 19 May 2022 13:23:00 GMT
Optimization of the crawling budget in this case occurs due to the fact that the bot initially understands which pages were added recently or edited. And instead of bypassing the entire site, indexing occurs pointwise. This is especially important for sites with a large number of pages.
When adding a header, page loading is accelerated and the load on the server is reduced, which means that the page indexing speed is significantly accelerated.
Determine the main page and merge the duplicates so that the robot crawls the page with high traffic
During the crawl, the bot can find duplicate pages – the same page under different URLs. Ultimately, he will give preference to one of them.
It would seem that everything is fine, but while the site is being crawled and indexed, the crawling budget is spent on duplicates. If there are few such pages, it is not critical. But for large sites, the presence of duplicates can significantly affect the indexing speed. In addition, the bot itself can choose as the main page that we do not need to promote.
Fix the remaining technical optimization flaws
Internal optimization flaws may not be critical for a small site, but with the growth of their number and the scale of the site, they will create a problem – they waste the time of the search bot and worsen the indexing of the site. They need to be eliminated.
Get rid of circular links so as not to mislead users
A circular reference is one that refers to itself.
For example, you are on a page with a white t-shirt → just below you are offered to go to a page with another white t-shirt → you go and get to the previous page and do not see any other t-shirt.
Firstly, it misleads the user and annoys him, as he spends his time searching. Second, it wastes link juice and crawl budget.
For large sites, this is a critical point, as it can significantly affect crawl speed and page indexing.
Most often, circular links are found in breadcrumbs – the navigation chain, when its tail ends with an active link to the current page. You don’t need to do this – the cyclic reference must be removed.
Link to lost pages so that users and bots can find you
Lost pages are pages that cannot be accessed through internal links. Because of this, users and bots will not be able to find them.
If you find such pages, analyze them:
- if you need a lost page , put links to it and thereby transfer the internal weight from the site;
- if the lost page is not needed , delete it.
Remove dangling nodes so you don’t lose page link juice
hanging knot – a page with no outgoing links. It accepts reference weight, but does not give it anywhere. It’s like you see a list on the site with articles on various topics, go to these pages and reach a page from which you can’t go anywhere.
In this case, the user is simply uncomfortable – in order to return to the previous page, he will have to press the “back” button or go to the search. And the robot in this case will be at a dead end, because it has nowhere to go from the page, and it cannot press the “back” button.
Most often, dangling nodes are not a serious problem, but you need to analyze the nature of such a page and, if possible, make adjustments:
- if the page contains content that does not fit on the main page , remove the dangling node, moving the content completely to the main page;
- if the page contains the desired content , add external links to it to important sections of the site.
Get ready for regular optimization
Maintaining the technical optimization of a site is a never-ending process, so you need to be ready to constantly make changes and track improvements.