Crawler Configuration & User Agents
To get the most accurate SEO data, you need to crawl like a search engine. 42crawl gives you full control over how our engine interacts with your site, allowing you to simulate different bots, bypass firewalls, and manage your crawl budget. Correct configuration is the first step toward optimizing your site's crawlability and mastering technical SEO.
Choosing the Right User Agent
The User Agent is like a digital ID card. Some sites serve different content to bots than they do to humans. This is a critical factor for generative engine optimization.
Built-in Presets
- 42crawl (Standard): Our default ID for standard SEO auditing.
- Googlebot: Mimics the official Google crawler. Use this to see exactly what Google sees.
- Chrome (Desktop/Mobile): Appears as a human browser. Essential for bypassing "Bot Management" filters that might block SEO crawlers.
- Safari (Mac): Simulates a macOS user experience.
Custom User Agents
Pro users can enter any custom string. This is perfect for:
- Identifying your own crawl traffic in server logs.
- Testing how your site responds to niche AI crawlers for GEO optimization.
- Bypassing specialized security rules on staging.
Read more about mastering crawler behavior here.
Advanced Audit Settings
You can find these settings in the Settings panel or the Advanced section of the Crawl Form.
1. Respect robots.txt
By default, 42crawl follows the rules in your robots.txt file.
- Pro Tip: Disable this if you want to audit pages that are currently "Disallowed" (like a staging section).
2. Crawl Depth
This determines how "deep" the bot goes from your homepage.
- 1 Level: Just the homepage.
- 2-3 Levels: Covers most small to medium sites.
- 4-5 Levels (Pro): Necessary for massive e-commerce stores.
3. Maximum Pages
Limit the total number of pages to get a quick "pulse check" of a large site.
4. Follow External Links
If enabled, 42crawl will check the health of every link pointing to other websites.
Best Practices for Accurate Audits
- Mobile-First is Mandatory: Periodically run a crawl as a mobile User Agent to match Google’s mobile-first indexing.
- Audit Your Staging Site: Check your environment before you push to production to protect your Core Web Vitals.
- Start Shallow: For very large sites, start with a depth of 2. It will reveal 90% of your template-level errors.
By mastering these settings, you turn 42crawl into a powerful tool for both technical SEO and generative engine optimization.