The client wanted job listings to be extracted from 20 job sites like Indeed, CareerBuilder and Monster. The data points that the client needed were job postings including job titles, location, wages, company profiles, job descriptions and candidate resumes.
The list of source websites and the data points were provided by the client. They wanted this data to be extracted on a daily basis, which means fresh data had to be provided every day. We set up crawlers for scraping job sites to extract the required data fields. This requirement comes under the site crawling service, since the crawlers have to be setup specifically for each site in the list.
The client wanted the data in CSV format and be uploaded to their Dropbox account. Once the initial setup was done, our crawlers started delivering the data which was directly fed into the client’s Dropbox. We delivered close to 2 million job listings during the first crawl and about 200K records of clean and structured data on a daily basis thereafter