It was just like any other day I found myself thinking about any interesting projects to do in tech space. Then I thought it would be fun to bypass the bot detection systems present on the internet.
I thought about building upon the prior work done by some one else to by pass the recaptcha. After searching for a bit, I came across an interesting paper here. This paper primarily talks about Google Recaptcha.
Google Recaptcha is widely used to detect bots on the internet. It has three behaviors when you click on the checkbox. It can behave as a simple checkbox i.e. it will simply allow you to perform the action like submitting the data without any further action required or it might think you are a bot therefore give you pictures to classify into groups which are pretty hard for any algorithm to actually solve machine learning included (they mostly give pictures to classify about street signs I am guessing for their self driving cars project or stores), it might give you texts as well with some obscurity like lines drawn here and there which are apparently discontinued because there are good machine learning algorithms out there which can solve this problem.
In this paper they talked about the techniques that they have used to bypass the recaptcha with over 70 percent accuracy.
They created a bot which initially would traverse random websites to seem like a legit user browsing the web, to get the token or the cookie to behave like it is from a valid users. They let it browse these websites for a couple of days (approximately 10 days).
After this trial period they get into attacking mode on a website which has recaptcha to stop bots. When they tried attacking with this trial token they saw that it doesn't ask for harder stuff, instead it grants permission to the bot with out any extra hassle.
Just think about it any one could have generated hundreds and thousands of such tokens with out any significant effort and attacked multiple websites to bypass their bot detection system.
They tried with fresh tokens but recaptcha being smart detected their fraudulent activity and asked to solve a couple of hard AI problems discussed above. But these guys had found a pattern and where able to exploit it with more than 70 percent accuracy (You can read the paper above to get the details about how exactly they were able to accomplish this). The best professional paid service to bypass a bot detection out there has only about 50 percent accuracy.
Being responsible individuals they have forwarded their research to Google so that they can improve their infrastructure.
I have only scratched the surface of what the paper actually tells. You can check out the link above to read more about it.
As for me after the reading it, I have decided to explore other interesting topics in tech space that I am missing out on (this is my first time reading a paper like this). If you have any cool papers in mind or suggestions please drop a comment below about it and why you thought it was interesting.
Top comments (2)
I don't like recaptcha and refuse to use it because it's sending all this data to Google to improve their machine learning for free. It's pretty nasty and insidious.
What alternatives would you use then?