How do you get around collecting links on websites that want to block you crawling their site?
We do our best to crawl sites comprehensively, but in some rare cases, we don't crawl. This includes:
- websites that specifically block our bot in robots.txt we don't crawl
- CDNs sometimes that block all bots from crawling except Google.
Again, these are very rare cases compared to the number of domains we crawl.
- websites that specifically block our bot in robots.txt we don't crawl
- CDNs sometimes that block all bots from crawling except Google.
Again, these are very rare cases compared to the number of domains we crawl.
Updated on: 14/08/2019
Thank you!