Setting up CloudFlare to filter bots from your website

In this article, we will consider the detailed configuration of the CloudFlare (hereinafter referred to as CF) service to protect the site from all kinds of robots, bots, and parsers.

Who are we filtering?

I want to make a reservation right away, it makes no sense to use all the rules. Yes, they will be extremely effective, but they will also make life difficult for ordinary users, so choose the right filters wisely.

  • Users from unnecessary countries – suitable if you have a small regional commercial project and foreign traffic should not be there. I doubt that the concrete from Balashikha may be of interest to the Brazilians.
  • We force everyone who comes with IPV6 and without HTTPS to solve the captcha:
  • We block various services-analyzers, monitoring and parsers according to the list of user agents ;
  • We send to JS a check of all those who access the protocol below HTTP / 2 and direct visits. Many spiders like Screaming Frog work exactly on HTTP1 protocols, thus we can prohibit any site parsing at the CF level;

Getting Started

I will not describe how to add a site to CF, we will assume that you figured it out.

Go to the section Security – WAF

Click on the button Create firewall rules

In total, 5 rules are available to us, but within the framework of one rule, we can combine several filters if they imply the same action. There are 5 actions in total:

  • Block – access blocking.
  • Challenge (CAPTCHA) – Enable verification with Google reCAPTCHA.
  • JS Challenge – Show an interstitial notification like when I’m Under Attack mode is enabled.
  • Bypass – disable checking.
  • Allow – full access.

Next we start creating filters. Note that the ORDER of CREATION IS IMPORTANT . The earlier a rule is created, the more priority it has. The priority can be changed by dragging:

Additional terms

And – the rule will work when both conditions are met.

Or – the rule will work when at least one condition is met.

How easy it is to add rules

Settings can be entered manually, or use the “Edit expression” field

We open access to good bots

First of all, we open access to good bots. Please note that Mail.ru, CF, like many others, do not consider it good as a separate setting.

Rule: (cf.client.bot) or (http.user_agent contains “Mail.RU_Bot”)

Action: Allow

We block access from all countries except …

Useful if you are sure that you do not have traffic from other countries. You can check using Yandex.Metrics or Google Analytics in the Audience – Geography section ;

The filter logic is as follows: If the country is not “Russia” for example, then block

Rule: (ip.geoip.country ne “RU” and ip.geoip.country ne “BY” and ip.geoip.country ne “KZ”)

Action: Block

We show the captcha to everyone who breaks through IPV6 or HTTP

Important! You need to enable forced redirect from HTTP to HTTPS:

SSL/TLS – Edge Certificates – Always Use HTTPS

Rule: (ip.src in {::/0}) or (not ssl)

Action: LEGACY CAPTCHA

JS validation for direct hits and hits with a protocol below HTTP/2

In this rule, we filter everyone who uses HTTP 1 and HTTP 1.1 protocols. Most often these are bots. We immediately filter the PF of bots that warm up profiles through direct visits to various sites.

The start of warming up profiles through your site looks something like this:

We prohibit access to anyone who is not HTTP2 protocol, shitbots, DDoS bots and other indecency go mainly through HTTP / 1.0 and HTTP / 1.1https://wmsn.biz/m.php?p=143697

Rule: (not http.request.version in {“HTTP/2” “HTTP/3” “SPDY/3.1”}) or (http.referer eq “”)

Action: Block or JS challenge.

Blocking crawlers

Crawlers, parsers and other checkers create a crazy load on large resources. You can cut them off on the way. Actual user agents are taken from here and adapted for CF. Optionally, you can disable access to save data in the web archive by blocking the ia_archiver user agent

It will be necessary to create two separate rules, because the blocking of all user agents does not fit into one, so the configuration code is in two parts.

Rule 1:

(http.user_agent contains “Abonti”) or (http.user_agent contains “AspiegelBot”) or (http.user_agent contains “aggregator”) or (http.user_agent contains “AhrefsBot”) or (http.user_agent contains “Aport”) or (http.user_agent contains “asterias”) or (http.user_agent contains “Baiduspider”) or (http.user_agent contains “BDCbot”) or (http.user_agent contains “bidswitchbot”) or (http.user_agent contains “Birubot”) or (http.user_agent contains “BLEXBot”) or (http.user_agent contains “BUbiNG”) or (http.user_agent contains “BuiltBotTough”) or (http.user_agent contains “Bullseye”) or (http.user_agent contains “BunnySlippers”) or (http.user_agent contains “Butterfly”) or (http.user_agent contains “ca-crawler”) or (http.user_agent contains “CamontSpider”) or (http.user_agent contains “CCBot”) or (http.user_agent contains “Cegbfeieh”) or (http.user_agent contains “CheeseBot”) or ( http.user_agent contains “CherryPicker”) or (http.user_agent contains “coccoc”) or (http.user_agent contains “CopyRightCheck”) or (http.user_agent contains “cosmos”) or (http.user_agent contains “crawler”) or ( http.user_agent contains “Crescent”) or (http.user_agent contains “CyotekWebCopy/1.7”) or (http.user_agent contains “CyotekHTTP/2.0”) or (http.user_agent contains “DataForSeoBot”) or (http.user_agent contains “DeuSu “) or (http.user_agent contains “discobot”) or (http.user_agent contains “DittoSpyder”) or (http.user_agent contains “DnyzBot”) or (http.user_agent contains “DomainCrawler”) or (http.user_agent contains “DotBot”) or (http.user_agent contains “Download Ninja”) or (http .user_agent contains “EasouSpider”) or (http.user_agent contains “EmailCollector”) or (http.user_agent contains “EmailSiphon”) or (http.user_agent contains “EmailWolf”) or (http.user_agent contains “EroCrawler”) or (http .user_agent contains “Exabot”) or (http.user_agent contains “ExtractorPro”) or (http.user_agent contains “Ezooms”) or (http.user_agent contains “FairShare”) or (http.user_agent contains “Fasterfox”) or (http .user_agent contains “FeedBooster”) or (http.user_agent contains “Foobot”) or (http.user_agent contains “Genieo”) or (http.user_agent contains “GetIntent Crawler”) or (http.user_agent contains “Gigabot”) or (http.user_agent contains “gold crawler”) or ( http.user_agent contains “GrapeshotCrawler”) or (http.user_agent contains “grub-client”) or (http.user_agent contains “Harvest”) or (http.user_agent contains “hloader”) or (http.user_agent contains “httplib”) or (http.user_agent contains “HTTrack”) or (http.user_agent contains “humanlinks”) or (http.user_agent contains “HybridBot”) or (http.user_agent contains “ia_archiver”) or (http.user_agent contains “ieautodiscovery”) or (http.user_agent contains “Incutio”) or (http.user_agent contains “InfoNaviRobot”) or (http.user_agent contains “InternetSeer”) or (http.user_agent contains “IstellaBot”) or (http.user_agent contains “Java”) or (http.user_agent contains “Java/1.”) or (http.user_agent contains “JamesBOT”) or (http.user_agent contains “JennyBot”) or (http.user_agent contains “JS-Kit”) or (http.user_agent contains “k2spider”) or (http.user_agent contains “Kenjin Spider” “) or (http.user_agent contains “Keyword Density/0.9”) or (http.user_agent contains “kmSearchBot”) or (http.user_agent contains “larbin”) or (http.user_agent contains “LexiBot”) or (http.user_agent contains “libWeb”) or (http.user_agent contains “libwww”) or (http.user_agent contains “Linguee”) or (http.user_agent contains “LinkExchanger”) or (http.user_agent contains “LinkextractorPro”) or (http.user_agent contains “linko”) or (http.user_agent contains “LinkScan/8.1a Unix”) or (http .user_agent contains “LinkWalker”) or (http.user_agent contains “LinkpadBot”) or (http.user_agent contains “lmspider”) or (http.user_agent contains “LNSpiderguy”) or (http.user_agent contains “ltx71”) or (http .user_agent contains “lwp-trivial”) or (http.user_agent contains “lwp-trivial”) or (http.user_agent contains “magpie”) or (http.user_agent contains “Mata Hari”) or (http.user_agent contains “MaxPointCrawler” “) or (http.user_agent contains “MegaIndex”)user_agent contains “LinkExchanger”) or (http.user_agent contains “LinkextractorPro”) or (http.user_agent contains “linko”) or (http.user_agent contains “LinkScan/8.1a Unix”) or (http.user_agent contains “LinkWalker”) or (http.user_agent contains “LinkpadBot”) or (http.user_agent contains “lmspider”) or (http.user_agent contains “LNSpiderguy”) or (http.user_agent contains “ltx71”) or (http.user_agent contains “lwp-trivial “) or (http.user_agent contains “lwp-trivial”) or (http.user_agent contains “magpie”) or (http.user_agent contains “Mata Hari”) or (http.user_agent contains “MaxPointCrawler”) or (http.user_agent contains “MegaIndex”)user_agent contains “LinkExchanger”) or (http.user_agent contains “LinkextractorPro”) or (http.user_agent contains “linko”) or (http.user_agent contains “LinkScan/8.1a Unix”) or (http.user_agent contains “LinkWalker”) or (http.user_agent contains “LinkpadBot”) or (http.user_agent contains “lmspider”) or (http.user_agent contains “LNSpiderguy”) or (http.user_agent contains “ltx71”) or (http.user_agent contains “lwp-trivial “) or (http.user_agent contains “lwp-trivial”) or (http.user_agent contains “magpie”) or (http.user_agent contains “Mata Hari”) or (http.user_agent contains “MaxPointCrawler”) or (http.user_agent contains “MegaIndex”)LinkextractorPro”) or (http.user_agent contains “linko”) or (http.user_agent contains “LinkScan/8.1a Unix”) or (http.user_agent contains “LinkWalker”) or (http.user_agent contains “LinkpadBot”) or (http .user_agent contains “lmspider”) or (http.user_agent contains “LNSpiderguy”) or (http.user_agent contains “ltx71”) or (http.user_agent contains “lwp-trivial”) or (http.user_agent contains “lwp-trivial” ) or (http.user_agent contains “magpie”) or (http.user_agent contains “Mata Hari”) or (http.user_agent contains “MaxPointCrawler”) or (http.user_agent contains “MegaIndex”)LinkextractorPro”) or (http.user_agent contains “linko”) or (http.user_agent contains “LinkScan/8.1a Unix”) or (http.user_agent contains “LinkWalker”) or (http.user_agent contains “LinkpadBot”) or (http .user_agent contains “lmspider”) or (http.user_agent contains “LNSpiderguy”) or (http.user_agent contains “ltx71”) or (http.user_agent contains “lwp-trivial”) or (http.user_agent contains “lwp-trivial” ) or (http.user_agent contains “magpie”) or (http.user_agent contains “Mata Hari”) or (http.user_agent contains “MaxPointCrawler”) or (http.user_agent contains “MegaIndex”)user_agent contains “LinkWalker”) or (http.user_agent contains “LinkpadBot”) or (http.user_agent contains “lmspider”) or (http.user_agent contains “LNSpiderguy”) or (http.user_agent contains “ltx71”) or (http. user_agent contains “lwp-trivial”) or (http.user_agent contains “lwp-trivial”) or (http.user_agent contains “magpie”) or (http.user_agent contains “Mata Hari”) or (http.user_agent contains “MaxPointCrawler” ) or (http.user_agent contains “MegaIndex”)user_agent contains “LinkWalker”) or (http.user_agent contains “LinkpadBot”) or (http.user_agent contains “lmspider”) or (http.user_agent contains “LNSpiderguy”) or (http.user_agent contains “ltx71”) or (http. user_agent contains “lwp-trivial”) or (http.user_agent contains “lwp-trivial”) or (http.user_agent contains “magpie”) or (http.user_agent contains “Mata Hari”) or (http.user_agent contains “MaxPointCrawler” ) or (http.user_agent contains “MegaIndex”)) or (http.user_agent contains “lwp-trivial”) or (http.user_agent contains “magpie”) or (http.user_agent contains “Mata Hari”) or (http.user_agent contains “MaxPointCrawler”) or (http.user_agent contains Mega Index)) or (http.user_agent contains “lwp-trivial”) or (http.user_agent contains “magpie”) or (http.user_agent contains “Mata Hari”) or (http.user_agent contains “MaxPointCrawler”) or (http.user_agent contains Mega Index)

Rule 2:

(http.user_agent contains “memoryBot”) or (http.user_agent contains “Microsoft URL Control”) or (http.user_agent contains “MIIxpc”) or (http.user_agent contains “Mippin”) or (http.user_agent contains “Missigua Locator” “) or (http.user_agent contains “Mister PiX”) or (http.user_agent contains “MJ12bot”) or (http.user_agent contains “MLBot”) or (http.user_agent contains “moget”) or (http.user_agent contains ” MSIECrawler”) or (http.user_agent contains “msnbot”) or (http.user_agent contains “msnbot-media”) or (http.user_agent contains “NetAnts”) or (http.user_agent contains “NICErsPRO”) or (http.user_agent contains “Niki-Bot”) or (http.user_agent contains “NjuiceBot”) or (http.user_agent contains “NPBot”) or (http.user_agent contains “Nutch”) or (http.user_agent contains “Offline Explorer”) or (http.user_agent contains “OLEcrawler”) or (http.user_agent contains “Openfind”) or (http .user_agent contains “panscient.com”) or (http.user_agent contains “PostRank”) or (http.user_agent contains “ProPowerBot/2.14”) or (http.user_agent contains “PetalBot”) or (http.user_agent contains “ProWebWalker” ) or (http.user_agent contains “ptd-crawler”) or (http.user_agent contains “Purebot”) or (http.user_agent contains “PycURL”) or (http.user_agent contains “python-rcontainsuests”) or (http.user_agent contains “Python-urllib”) or (http.user_agent contains “QueryN Metasearch”) or (http.user_agent contains “RepoMonkey”) or (http.user_agent contains “Riddler”) or (http.user_agent contains “RMA”) or (http.user_agent contains “Scrapy”) or (http.user_agent contains “SemrushBot” ) or (http.user_agent contains “serf”) or (http.user_agent contains “SeznamBot”) or (http.user_agent contains “SISTRIX”) or (http.user_agent contains “SiteBot”) or (http.user_agent contains “sitecheck. Internetseer.com”) or (http.user_agent contains “SiteSnagger”) or (http.user_agent contains “Serpstat”) or (http.user_agent contains “Slurp”) or (http.user_agent contains “SnapPreviewBot”) or (http.user_agent contains “Sogou”) or (http.user_agent contains “Soup”) or (http.user_agent contains “SpankBot”) or (http.user_agent contains “spanner”) or (http.user_agent contains “spbot”) or (http.user_agent contains “Spinn3r”) or (http.user_agent contains “SpyFu”) or (http. user_agent contains “suggybot”) or (http.user_agent contains “SurveyBot”) or (http.user_agent contains “suzuran”) or (http.user_agent contains “SWeb”) or (http.user_agent contains “Szukacz/1.4”) or ( http.user_agent contains “Teleport”) or (http.user_agent contains “Telesoft”) or (http.user_agent contains “The Intraformant”) or (http.user_agent contains “TheNomad”) or (http.user_agent contains “TightTwatBot”) or (http.user_agent contains “Titan”) or (http.user_agent contains “toCrawl/UrlDispatcher”) or (http.user_agent contains “True_Robot”) or (http.user_agent contains “ttCrawler”) or (http.user_agent contains “turingos”) or (http.user_agent contains “TurnitinBot”) or (http.user_agent contains “UbiCrawler”) or (http.user_agent contains “UnisterBot”) or (http.user_agent contains “Unknown”) or (http.user_agent contains “uptime files”) or (http.user_agent contains “URLy Warning”) or (http .user_agent contains “User-Agent”) or (http.user_agent contains “VCI”) or (http.user_agent contains “Vedma”) or (http.user_agent contains “Voyager”) or (http.user_agent contains “WBSearchBot”) or (http.user_agent contains “Web Downloader/6.9”) or (http.user_agent contains “Web Image Collector”) or (http.user_agent contains “WebAuto”) or (http.user_agent contains “WebBandit”) or (http.user_agent contains “WebCopier”) or (http.user_agent contains “WebEnhancer”) or (http.user_agent contains “WebmasterWorldForumBot”) or (http.user_agent contains “WebReaper”) or (http.user_agent contains “WebSauger”) or (http.user_agent contains “Website Quester”) or (http.user_agent contains “Webster Pro”) or (http .user_agent contains “WebStripper”) or (http.user_agent contains “WebZip”) or (http.user_agent contains “Wotbox”) or (http.user_agent contains “wsr-agent”) or (http.user_agent contains “WWW-Collector- E”) or (http.user_agent contains “Yeti”)user_agent contains “WebAuto”) or (http.user_agent contains “WebBandit”) or (http.user_agent contains “WebCopier”) or (http.user_agent contains “WebEnhancer”) or (http.user_agent contains “WebmasterWorldForumBot”) or (http. user_agent contains “WebReaper”) or (http.user_agent contains “WebSauger”) or (http.user_agent contains “Website Quester”) or (http.user_agent contains “Webster Pro”) or (http.user_agent contains “WebStripper”) or ( http.user_agent contains “WebZip”) or (http.user_agent contains “Wotbox”) or (http.user_agent contains “wsr-agent”) or (http.user_agent contains “WWW-Collector-E”) or (http.user_agent contains “Yeti”)user_agent contains “WebAuto”) or (http.user_agent contains “WebBandit”) or (http.user_agent contains “WebCopier”) or (http.user_agent contains “WebEnhancer”) or (http.user_agent contains “WebmasterWorldForumBot”) or (http. user_agent contains “WebReaper”) or (http.user_agent contains “WebSauger”) or (http.user_agent contains “Website Quester”) or (http.user_agent contains “Webster Pro”) or (http.user_agent contains “WebStripper”) or ( http.user_agent contains “WebZip”) or (http.user_agent contains “Wotbox”) or (http.user_agent contains “wsr-agent”) or (http.user_agent contains “WWW-Collector-E”) or (http.user_agent contains “Yeti”)WebBandit”) or (http.user_agent contains “WebCopier”) or (http.user_agent contains “WebEnhancer”) or (http.user_agent contains “WebmasterWorldForumBot”) or (http.user_agent contains “WebReaper”) or (http.user_agent contains ” WebSauger”) or (http.user_agent contains “Website Quester”) or (http.user_agent contains “Webster Pro”) or (http.user_agent contains “WebStripper”) or (http.user_agent contains “WebZip”) or (http.user_agent contains “Wotbox”) or (http.user_agent contains “wsr-agent”) or (http.user_agent contains “WWW-Collector-E”) or (http.user_agent contains “Yeti”)WebBandit”) or (http.user_agent contains “WebCopier”) or (http.user_agent contains “WebEnhancer”) or (http.user_agent contains “WebmasterWorldForumBot”) or (http.user_agent contains “WebReaper”) or (http.user_agent contains ” WebSauger”) or (http.user_agent contains “Website Quester”) or (http.user_agent contains “Webster Pro”) or (http.user_agent contains “WebStripper”) or (http.user_agent contains “WebZip”) or (http.user_agent contains “Wotbox”) or (http.user_agent contains “wsr-agent”) or (http.user_agent contains “WWW-Collector-E”) or (http.user_agent contains “Yeti”)user_agent contains “WebmasterWorldForumBot”) or (http.user_agent contains “WebReaper”) or (http.user_agent contains “WebSauger”) or (http.user_agent contains “Website Quester”) or (http.user_agent contains “Webster Pro”) or ( http.user_agent contains “WebStripper”) or (http.user_agent contains “WebZip”) or (http.user_agent contains “Wotbox”) or (http.user_agent contains “wsr-agent”) or (http.user_agent contains “WWW-Collector” -E”) or (http.user_agent contains “Yeti”)user_agent contains “WebmasterWorldForumBot”) or (http.user_agent contains “WebReaper”) or (http.user_agent contains “WebSauger”) or (http.user_agent contains “Website Quester”) or (http.user_agent contains “Webster Pro”) or ( http.user_agent contains “WebStripper”) or (http.user_agent contains “WebZip”) or (http.user_agent contains “Wotbox”) or (http.user_agent contains “wsr-agent”) or (http.user_agent contains “WWW-Collector” -E”) or (http.user_agent contains “Yeti”)user_agent contains “WebStripper”) or (http.user_agent contains “WebZip”) or (http.user_agent contains “Wotbox”) or (http.user_agent contains “wsr-agent”) or (http.user_agent contains “WWW-Collector-E “) or (http.user_agent contains “Yeti”)user_agent contains “WebStripper”) or (http.user_agent contains “WebZip”) or (http.user_agent contains “Wotbox”) or (http.user_agent contains “wsr-agent”) or (http.user_agent contains “WWW-Collector-E “) or (http.user_agent contains “Yeti”)

Action: Block

Checking the correctness of the rules

There are two ways to check the correctness of the rules.

First way

  • Follow the CF report
  • We monitor analytics systems and control deviations.

Second way

  • We change the country through the VPN and try to get to the site. If the block works, it won’t let you in;
  • If the filter for direct traffic is enabled, we just try to go to the direct address.

Leave a Reply

Your email address will not be published.