Our previous update was about easing the data delivery process by providing option to download data directly from CrawlBoard. This update is yet another step in data accessibility improvement — now you’ll be able to merge the data available in API into fewer files and download the files directly in unzipped/zipped format apart from deduplicating the records.
We also understand the importance of recency in highly dynamic and fast-paced web data ecosysem. Hence, this update comes with same-day crawl scheduling capability which can be easily triggered from your end.
As mentioned above, this feature allows you to merge the extracted data available for different sites for certain time period and get them uploaded. In case you consume data via API, it will be available in the folder ‘post_upload_merged_files_ignore_billing’ which can be accessed by adding the following parameter:
`&folder=post_upload_merged_files_ignore_billing`. However, if you have opted for file upload to your FTP server or any of the cloud storage solutions, then the merged data will be available in your server.
As you can see, accessing this section is quite straightforward – click on “Data Download” in the left sidebar and open up the “Merge and Upload” tab. Now, follow the step given below:
Note that based on the data volume, the whole process would take approximately 5 to 30 minutes.
Although we already had crawl scheduling option (accessible from “Sites” link in the left sidebar), the ability to initiate crawl on the same day was not available. With this update you can initiate same-day crawls and it will be picked up by our system within an hour or two.
We’re super excited to launch these improvements — try them out and send us your feedback and suggestions.
Happy data crunching!