Did you know that there are 12 factors to be considered while acquiring data from the web? If no, fret not! Download our free guide on web data acquisition to get started!
Stairway to heaven, If you’re in the business of web scraping, that is.
It is legal to scrape publicly available data. There is a massive amount of data available in the public domain of the web. However, when it comes to the utilization of the same, little has been done to date. But today, service companies are providing data as a service, or building solutions that are backed by data. Say you want to know the prices of 20000 items across 5 different websites, some services can help you with that. Be it hiring recruits, or deciding what price would be right to list your house at, web scraping helps with all. However, even though web-scraping usually involves companies scraping data from the open Internet, many companies are opposed to this. Why? They claim data from the users as their own. And apparently, they are the only one who has any right to it. A big will for free and open access to public data was seen in the hiQ vs LinkedIn case recently.
Scraping data proved daunting for hiQ Labs – a data analytics company that had been scraping publicly accessible data from LinkedIn. The latter chose to invoke the Computer Fraud And Abuse Act (CFAA) and accused hiQ of accessing the information “without authorization”. However, in a landmark move, the U.S. Ninth Circuit Court of Appeals ruled in favour of hiQ Labs, thus paving the way for the “open internet”.
The CFAA is a federal cyber-security law that was created to prevent hacking of government security systems “without authorization”. But its vagueness of the term “authorization” meant that companies could mould it to fit their own needs whenever necessary, as in the hiQ vs. LinkedIn case. What hiQ did was simple, it would use scraped data to create HR-related analytics products. For instance, Keeper identified flighty employees, while Skill Mapper would assess employees and find gaps in the workforce. But then LinkedIn launched a similar set of products in 2017, and that is when the scenario started going south.
While it is a major win for data analytics, it also sheds light on a case of the Ninth Circuit that has managed to blur the outreach of the CFAA – the Facebook v. Power Ventures, a ruling that was also cited in the cease and desist letter of LinkedIn.
Power Ventures was a company that allowed an individual to manage all their social media accounts from one place. Unlike LinkedIn, where the data was publicly available, Power Ventures would ask for consent from the user. Therefore, it was the user that granted Power Ventures access to the data and not Facebook. Hence, though the company was “within authorization” in a way, it was still found to violate the CFAA.
There lies the trouble with the CFAA. While in theory, it should prevent hacking, it has become nothing more than a tool for major corporates. Every large enterprise interprets the law in its way and uses it to its advantage. Power Ventures was just an add-on feature that the user chose for himself; hiQ created analytical products that LinkedIn set its eyes on, and since the bigger companies wanted these third parties out of their forte, they called on the mighty CFAA.
While the court has located the lock on invoking the CFAA anytime one saw fit, it has still not shut the door completely. The more recent Stackla v. Facebook found yet another platform that got into controversy via web scraping.
With new cases popping up now and then, it will eventually fall on the court to clarify the CFAA and terms like “without authorization”. Data is present everywhere and creating a distinction between the legal and the illegal becomes of prime importance. The monopoly of data would be dangerous for innovation, and in the world of the fast-paced Internet, innovation is everything.
With the win in its bag, hiQ has cleared the path for the application of open web data. Web crawling and extracting is the cheapest way to gather data, and for far too long has been seen as a sceptical approach. One must understand that the only way small and big companies can compete in a level playing field is if the Internet and the data present on it remains free to use for all.
Can Google claim that the data it shows for a search result is its own? Can Wikipedia stop us from learning from its pages? After all, most of the information available in the public domain of the internet belongs to individuals or the market, and no company can claim to have a monopoly over it. What companies can compete on instead, is how well they can use the data and what services they can create. These services can digest the open data and produce a valuable output that can be used by businesses.
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
[contact-form-7 id=”5″ title=”Contact form 1″]