Dev build works. Team ships fast.
The scraper handles the happy path. Volume is low and the dev environment closely mirrors production. Everything looks good on the dashboard.
First anti-bot blocks hit. Proxy rotation added.
The site starts returning 403s. The team buys proxy packages and implements basic rotation. It works again — for a while. The first maintenance cycle begins.
Site redesign breaks selectors. Silent data loss begins.
A redesign or A/B test changes the DOM. Selectors fail silently — returning null instead of errors. The pipeline fills with missing values for two weeks before anyone notices.
Scale requirements increase. Architecture cracks.
The business wants more sources and higher frequency. The scraper — designed for one use case — starts failing under load. A full rewrite is scoped.
Maintenance consumes 40%+ of eng bandwidth.
The "scraping project" now has a full-time shadow owner. New features are blocked. The team is constantly firefighting. The TCO becomes hard to ignore.
Switch to managed infrastructure.
Most teams arrive here. The opportunity cost of continued DIY — in eng time, data reliability, and business risk — outweighs the cost of managed scraping infrastructure.