Did you know that there are 12 factors to be considered while acquiring data from the web? If no, fret not! Download our free guide on web data acquisition to get started!
he list of source websites and the data points were provided by the client.
A popular Job portal from USA was looking to automate their job listings by web crawling jobs across job boards and company job listings.
The client wanted job listings to be extracted from 20 job sites like Indeed, CareerBuilder and Monster. The data points that the client needed were Job postings including job titles, location, wages, company profiles, Job descriptions and candidate resumes.
The list of source websites and the data points were provided by the client. They wanted this data to be extracted on a daily basis, which means fresh data had to be provided every day. We started web crawling jobs to extract the required data fields from the list of websites provided by the client. This requirement comes under the site crawling service offering of ours since the crawlers have to be setup specifically for each site in the list. The client wanted the data in CSV format and be uploaded to their Dropbox account. Once the initial setup was done, our crawlers started delivering the data which was directly fed into the client’s Dropbox. We delivered close to 2 million job listings during the first crawl and about 200K records of clean and structured data on a daily basis thereafter.