Although crawling frequency
can be specified, optimal frequencies are hard to determine. The problem is that sites may not update as frequently as they are crawled. The result is suboptimal crawling, redundant data and a negative impact on the target site due to frequent, unproductive crawls
The solution is intelligent adaptive crawling where the crawler identifies pages that are updated more frequently by machine-learning. As a radical solution, crawls run more frequently on updated pages than dormant. The crawlers modify automatically to establish optimal frequencies based on site behavior and changes. They refine the list of URLs to process and extend the archive with semantic information about extracted content
Adaptive focused crawling is largely beneficial to extract data from forum-based sites where certain threads are more active than the ones that remain closed / latent.