The Noise Reduction Project
Andrew Jefferson, the Chief of Information Technology of a leading data analytics and research firm, had been facing a lot of issues at work recently. His job was to maintain a continuous data flow into the company’s database for evaluation and analysis, while maintaining the servers of the company, so there is no obstruction in functioning of the firm. It was a golden time when the only problem he had to deal with was proper data structuring, back when the internet was still a new entity and not many ecommerce sites existed, and best of all, few unwanted ads, products or content disturbed his database building exercise. However in the last couple of years, data had become unmanageable for him and his growing team, and every day, they would face a fresh set of problems like server failure, structural clashes of data and above all the unwanted noise they received with each feed of data crawled. With the increasing business of the firm, the demand of structured, clean data was at an all-time high. Andrew knew, it was time to seek help outside the team to manage the functions at the firm properly.
How did PromptCloud Help?
- PromptCloud helped include a data layer into their current set-up that would allow continuous free flowing feeds free of “noise” so that the team could only focus on interesting approaches to analytics
- A crawler was set up that could extract product prices and specifications only for predefined categories in an automated manner on a daily basis.
- Based on the schema provided by client, the final data was delivered in an XML format via the Data API on a daily basis without any manual intervention from either side.
- Each record within a dataset had all details i.e. product name, product price, availability status, short and long descriptions, all image URL’s, SKU, dimensions, category, brand, source and the source URL from where it was fetched.
Benefits to the client
- Building a crawl and extraction process setup a continuous stream of data
- Any changes within the source sites were taken care of and clients were abstracted from such issues
- Any changes with respect to schema was done as requested
- Other categories could be added as per changing requirements
- Productivity increased since the data team could work on other projects. Client expanded into other verticals
- Low turnaround time of data improved the ability to market client’s services and capabilities
- Value addition from the project was 50 times the spend
- Data quality levels had increased alarmingly without any time investment from the team
We know Andrew got a cleaner data base with the new data layer minus the noise and moved onto further betterment of his company’s technological platforms.
Looking to outsource you crawling and data extraction requirements? Reach out to us at firstname.lastname@example.org or just Get Started