The issues of legality and ethics surrounding web scraping are a massive grey area. While some may be in favor of web scraping, others might not share the same enthusiasm. This is what makes the subject so controversial.
Those in favor argue that web data has the potential to make the world better and that scraping is critical for data analysis and management done right. But on the other hand, critics object to the claim that web scraping gives an unfair advantage to scrapers.
The fact is that web scraping isn't bad as long as it's done properly. It can be beneficial for research purposes whether you want to promote your business or excel at academic projects.
In this post, we'll talk about which types of web scraping may be illegal, and the ruling of different authorities on its legality.
What Types Of Data Are Illegal To Scrap?
Unfortunately, many users are unaware that the final use case of the data has a significant influence on whether scraping is legal. The scraping of a website may be perfectly legal in some cases, but what you intend to do with the information makes it illegal in others.
There are two main types of data we must be concerned about:
Personal Data: Data that can be used directly or indirectly to identify an individual is personal data or personally identifiable information (PII). This includes medical or health records, bank information, date of birth, address, email, and name.
Copyrighted Data: This type of data is owned by businesses or people who have precise control over how it can be copied or captured. This is the same as using copyrighted images and songs. If you take the owner's data without permission, you could be breaking the law. Examples include articles and blogs, pictures, videos, music, and other creative property.
Web Scraping In The Eyes Of The Law
Before you start web scraping, reflect on the degree to which you can go to extract the data you need.
Currently, no legislation addresses web scraping directly, but several legal frameworks and broad principles have been applied in court over the use of scraped web data.
These court cases address illegal access to web data, copyright issues, trade secrets, and breach of contract issues.
Researchers and marketers must be aware of the possible ethical consequences of web scraping.
EU Laws
GDPR's jurisdiction makes up the entire European Economic Area (EEA). The GDPR has rules about protecting PII when data controllers get it and then give it to data processors.
The GDPR asserts that if there is a data breach, consumers and data security agencies must be told about it. If a company collects the PII of an EEA resident, it must follow the GDPR, no matter where it is in the world. There's no way around it.
The lawful bases of web scraping under Article 6 of GDPR include:
Consent: You are good to go if you have the consent of people whose websites you are scraping
Contract: This is when you are required by contract to scrape and process a website's data
Legal obligation: If scraping and processing web data help you fulfill a legal obligation, go ahead
Vital interests: If your scraping efforts can save lives, there is no doubt about their legality
Public tasks: It is perfectly legal when scraping is in the public interest or helps you do your duties as an official
Legitimate interest: As long as your web scraping doesn't override the rights or interests of people, you can argue that it is in your legitimate interest
US Laws
While the U.S. doesn't have anyone set federal privacy laws, it has a vast net of various state laws. That makes web scraping legality murky waters to navigate.
An example of this could be California Consumer Privacy Act (CCPA) and Computer Fraud and Abuse Act (CFAA). Moreover, the Health Insurance Portability and Accountability Act (HIPAA) and the Gramm-Leach-Bliley Act of 1999 (GLBA) are consumer-oriented federal laws.
CCPA: This is a state-wide data privacy law that helps regulate how businesses all over the country handle the P.I. of California residents. This was the pioneering data privacy law of the country
CFAA: It is concerned with authorization and data scraping cases that imply real property norms
HIPAA: This is a health insurance and accountability act that has set guidelines regarding patient privacy. A violation of these guidelines could result in federal prosecution
GLBA: This protects consumers' private information. To be GLBA compliant, firms need to inform customers of their right to opt-out if they don't want their personal information being used by financial firms
The CFAA and similar state laws are the leading legal basis for claims concerning web scraping disagreements. According to it, access to a website can be unauthorized when the website owner sends a cease and desist letter to anyone crawling or scraping. This is what happened in the case of Craigslist Inc. v. 3Taps Inc. in 2013 and Facebook, Inc. v. Power Ventures, Inc. in 2016. 3Taps is a firm committed to collecting and distributing public data. It is partnered with PadMapper. Craigslist sent the former a cease and desist letter in response to PadMapper using its listings. After the data distribution startup refused to comply, Craigslist registered a complaint with the U.S. District Court for Northern California.
However, the letter alone may not be enough to hold the web scraper responsible under the CFAA in some cases like Ticketmaster LLC v. Prestige Entertainment, Inc. in 2018. Ticketmaster took Prestige Entertainment to court over non-compliance of CFAA state laws; however, the defendants were able to circumvent the claims by stating that Prestige had acquired tickets through the Ticketmaster website— something that's permitted in its Terms of Use.
Comparing U.S., E.U., and Latin American Laws
It's a little challenging to compare E.U. and U.S. laws.
Both let people choose not to have their data processed. They can also delete their information or look at it.
In Europe, data protection laws are part of the GDPR, but there has never been a federal user privacy law in the U.S. Each state has tried to fill in the gap as they see fit. The CCPA is an example of this, but other states haven't shown the same amount of resolve. Another difference is that the CCPA requires privacy policies on all websites, whereas the GDPR needs clear and specific user consent.
Data Privacy is becoming more of an issue not only in the U.S. and Europe but also in Latin America. In fact, Brazil is leading the way with its new data privacy laws that need to be consolidated over 40 different regulations. Lei Geral de Proteção de Dados (LGPD) was set up on 2020 and puts significant compliance obligations on companies that process data.
How Can You Keep Your Scrapers Ethical?
Don't just pay lip service to ethical web scraping but make it an integral part of your data harvesting efforts.
The only mantra of ethical web scraping is: do no harm.
You have a lot of power as a web scraper because you'll likely come across loads of private user data and personal information of a website's users. That's why it is vital to have a moral code to guide your scraping efforts.
First off, make sure that you have a strict policy about not profiting off private data. Here's what you need to do next:
Use APIs
Some websites offer built-in APIs for scrapers. Make sure you use them and follow the rules. You could always use your API for web scraping, like the one from Ujeebu.
The Robots Exclusion Standard or the robots.txt file will tell you where to find the info you need and where you are allowed to go using your web-crawling software.
Read The Terms And Conditions
This is where you find the rules for using and scraping data from a website. Sure, you could always click 'I agree' without reading and do what you want to do. But it is essential to understand that the terms and conditions are there for a reason. So take your time to figure out how they affect you and what you are trying to do.
Be Kind
Scraping is harsh on web servers. So make sure you begin when there is little to no traffic on the website and be gentle when gathering data. Also, space out the requests so it doesn't look like you are trying to DDoS the servers.
Say Hi
The website admin will likely notice some unusual traffic when you start scraping. It'd be good to introduce yourself, tell them what you plan to do, and leave your contact info.
In fact, go a step further and courteously ask for permission. This will not only make you look like a nice person but also relieve some of the legal burdens. Besides, the data really doesn't belong to you, so it'd be the right thing to do.
The Bottom Line: Practice Ethical Scraping
The issue of legality boils down to what you scrape and how you go about it. Before embarking on your web scraping mission, be sure to give yourself a little ethics check. Ask yourself if you're about to scrap personal data, copyrighted data or if you're trying to gather data, usually behind a login.
It only takes good manners and a bit of due diligence to keep your web scraping efforts within ethical and legal confines.
Happy scraping!
This article first appeared here: https://ujeebu.com/blog/is-web-scraping-legal/
Top comments (0)