DEV Community

Cover image for Responsible Web Scraping: Challenges and Approaches
dnasedkina for SOAX

Posted on • Originally published at soax.com

Responsible Web Scraping: Challenges and Approaches

“Web scraping is neither legal nor illegal. It's how you use it and what you scrape.” - Ondra Urban, COO @ Apify

Ondra Urban

If you’re new to the web scraping field or have been hurt in some way by web scraping, it’s difficult to see it in a positive light. Trust me; you’re not alone. Back in 2018, when web scraping and data collection emerged, it felt illegal in nature. However, with the recent rise in web scraping companies deploying ethical standards, the nature of web scraping has mostly morphed into a force for good.

On the inaugural episode of the podcast Ethical Data, Explained, Ondra Urban (COO at Apify but preferring the title “Chief Debugging Officer”) discusses how web scraping companies can maintain ethical standards. He explains how debugging and resolving problems can create a more accessible and programmable web. As a bonus, Ondra gives his thoughts on the implication of the HiQ vs. LinkedIn case on the web scraping industry and possible innovations in the field.

Insights on ethical data collection for businesses:

  • Web scraping is merely a tool; it’s you, as a company, that decides what to do with it. Therefore it could be a force for both good and evil, depending on who wields the tool.
  • No US federal laws currently ban web scraping, and while this does not imply that the US government has officially declared web scraping legal, it is a step in the right direction. However, it also means that the best way that web scraping companies can thrive in the field is to do their data harvesting ethically.
  • “Safe harbor” is important for a web scraping company. Safe habor protects you from getting hit by the consequences of your users' actions. For instance, if a user logs onto your web scraping platform and does something illegal, safe harbor protects you from that action – unless you were aware of it beforehand.
  • Look to provide value, and you’ll build trust and a positive image for your brand.

Possible Innovations in the Web Scraping Industry:

  • Speaking on the HiQ vs. LinkedIn case, Ondra highlights that while it affirmed that scraping public data is not a criminal offense, it still doesn’t entirely validate web scraping. Nonetheless, it is a good start that may spark up and shape the future of the web scraping space.
  • AI and web scraping: Ondra posits that AI won’t replace programmers in the web spacing field. While AI brings an innovative touch to any field it enters, it won’t make significant transformations in web scraping: "AI can take you 80%, 85% there, except for specific use cases," says Ondra. In other words, depending on the context, you might find AI-collected data useful a lot of the time. Unfortunately, not every piece of information gathered by AI ends up being useful.

Listen to this episode of Ethical Data, Explained to learn more about how web scraping companies can maintain ethical standards and for insights on potential changes that could transform the industry.

You can find "Ethical Data, Explained" on

Apple Podcasts
Spotify
Google podcasts
YouTube

For more info reg the podcast, episodes and guests - welcome to the official podcast webpage.

CTA: if you know someone whose knowledge, experience and expertise could make them an interesting guest to discuss data and data collection - let me know in the comments! I am happy to reach out and invite them for a talk 🙂

Top comments (0)