Did you know that there are 12 factors to be considered while acquiring data from the web? If no, fret not! Download our free guide on web data acquisition to get started!
With websites that link back to the original source for a particular piece of information, there’s an inherent problem of maintaining freshness of those links.
Let’s take an example of a digital classified ad listing company that aggregates various ads from multiple sources on the web, and links each such ad back to its source for the second half of the purchase cycle. There are 3 different things that can eventuate if a particular ad no longer exists on the linked page:
on those links. Statuses to be returned for each check is decided in consultation with the client. On a daily basis (or as frequently as desired by the client), the client uploads its master list of links to be checked for freshness, either on PromptCloud’s API or on its own FTP server. PromptCloud’s crawlers then fetch the URL pages and interpret the data fetched using the rules in place. Appropriate status messages are returned for each link and uploaded back to the API in a format as specified by the client. Simple as that!
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.