“Web scraping,” in quite literal terms, involves the scraping of data from the web. In the hands of a search engine, web scraping is the activity that generates search results by assessing millions of websites for information relevant to search queries. On the other side, in the hands of businesses (using scraping tools), the legality of it becomes questionable.
The Computer Fraud and Abuse Act (CFAA) prohibits unauthorized use of computers and information therein – which includes web scraping. However, the scope of this activity remains unclear yet. Recently, the US Supreme Court ruled in favor of Van Buren v. the United States by announcing that accessing permissible data, even though for unauthorized/prohibited use, cannot be said to be a violation of the CFAA.
The “greyness” of the question of the legality of scraped data cannot be clarified without taking a deep look into the ecosystem of web scraping, what it entails, and what makes it legal or illegal.
Is Scraping a Website Legal?
A lot many factors command how legal it is to scrape web data. The ubiquitous nature of web scraping may fall under the ambit of the Trespass to Chattel laws, where unauthorized use of a person’s information could become a legal issue.
Additionally, a multitude of other laws, acts and regulations have been mobilized today to protect consumer privacy and information theft. You may have heard of the General Data Protection Act (GDPA), the Children’s Online Privacy Protection Act (COPPA), and the Health Insurance Portability and Accountability Act (HIPAA) – all of these protection measures have been put in place to prevent unchecked abuse of private consumer data.
However, with the ruling of Van Buren v. the United States, it would seem that web scraping, under certain circumstances, may be alright.
In a Ninth Circuit Court of Appeals ruling for the case of LinkedIn v. hiQ Labs, it was announced that scraping information from public profiles was alright since this activity wasn’t covered under the ambit of CFAA (because the scraped data was available publicly). It did, however, cause LinkedIn to restrict user profiles from being accessed publicly – a login by the viewer is now required.
The requirement of logging into your user account on a website to view the information contained therein brings all your activities thence under the terms and conditions of the website. These terms and conditions may have clauses that deter or prohibit web scraping – if you still engage in extracting data, you may get into a legal mess.
Speaking of which, this is precisely why LinkedIn mandated logins to view user profiles – to restrict web scraping information of its users.
With that said, the grey area still remains wide. So…is web scraping illegal? It largely depends on the kind of data you are trying to scrape and the nature of that data:
The data that you encounter on the internet is mostly public data. Unless you are required to log in to your account or agree to the terms of data use or authenticate your identity or credentials to access certain data, it is perfectly legal to scrape.
The only deterrent to web harvesting here would be the measures that these websites put in place to deflect your web scrapers (to protect their information, of course).
Personal Data/Private Data
It is illegal to scrape an individual’s personal information. Personal information could be anything – name, address, financial details, health details, date of birth, any other contact information, etc. Anything that gives away an individual’s personal identity (Personally Identifiable Information, or PII) is a red flag for web scraping. It is a strict no-no.
If you must, though, it is mandatory to seek that individual’s consent first. Additionally, if a legal motivation is a cause behind scraping PII, it must be made known.
Any data on the internet that is the intellectual property of the publisher is illegal to scrape. If you must use this data, its copyrights notwithstanding, you must credit the source of that information wherever you use it.
Terms of Service
Much like LinkedIn has mandated account logins to access its user profiles, a login instance almost always gets your consent on the website’s terms and conditions. These terms and conditions may contain clauses on data scraping. When you still release your scraper bots after logging in, you are risking a ban or even legal action.
How to Legally Scrape Data
To ensure that there are no legal actions taken against you, thoroughly understand the following aspects before you proceed with web scraping:
- Is the data publicly available?
- Does it reveal the PII of any individual?
- Does the website mention any prohibitions regarding scraping?
- Are there any laws, acts, policies, or regulations that control what information you can scrape and use?
Carefully weighing the answers to all of these questions would help determine the degree of grey your web scraping activity is in.
In quintessence, “Is it legal to scrape a website” is not the question. The real question is, “How legal is website scraping?“.
It is best to ensure that web scraping fetches only the data that is publicly available and not protected by any legally actionable clauses. You can also outsource web scraping to professional agencies like PromptCloud that know what they are doing.