Scraping tweets without Twitter API for FREE

#python #api #selenium

In the past (before Elon Musk…), you could easily and freely apply for a developer account to get your own tokens and start using Twitter API without any struggle. One of the strengths of the developer account besides making bots and tweeting via API was search API. You could almost grab all the tweets you want. But after Elon Musk, unfortunately, you have to pay for it!

Tiers will start at $500,000 a year for access to 0.3 percent of the company's tweets. Researchers say that's too much for too little data. [source]

There is one solution that almost always works. Selenium! (Also, it's good to know that the great alternative for selenium in JS is puppeter).

It almost allows you to scrape everything on the surface of the web. Just you have to write a script for your use case with the selenium library.

How

The algorithm for scraping tweets is so easy.
These are the steps:

Open Twitter search with an advanced search query.
Scrape specific tags to get the value
scroll
Repeat the steps until you scrape the number of tweets you need.

Code

It can be written by your script or using other libraries like twitter_scraper_selenium
It's available on PyPI and GitHub.



pip install twitter_scraper_selenium

(Note: For saving as CSV and working as data frames, we must install pandas and other dependencies too)
Then you can write your own wrapper function like this

from twitter_scraper_selenium import scrape_keyword
import json
import pandas as pd
import asyncio



def scrape_profile_tweets_since_2023(username: str):
    kword = "from:" + username
    path = './users/' + username
    file_path = path + '.csv'
    tweets = scrape_keyword(
                            headless=True,
                            keyword=kword,
                            browser="chrome",
                            tweets_count=2, # Just last 2 tweets
                            filename=path,
                            output_format="csv",
                            since="2023-01-01",
                            # until="2025-03-02", # Until Right now
                            )
    data = pd.read_csv(file_path)
    data = json.loads(data.to_json(orient='records'))
    return data

You can call this function for multiple accounts at the same time, as this:



from twitter_scraper_selenium import scrape_keyword
import json
import pandas as pd
import asyncio

def scrape_profile_tweets_since_2023(username: str):
    kword = "from:" + username
    path = './users/' + username
    file_path = path + '.csv'
    tweets = scrape_keyword(
                            headless=True,
                            keyword=kword,
                            browser="chrome",
                            tweets_count=2, # Just last 2 tweets
                            filename=path,
                            output_format="csv",
                            since="2023-01-01",
                            # until="2025-03-02", # Until Right now
                            )
    data = pd.read_csv(file_path)
    data = json.loads(data.to_json(orient='records'))
    return data
You can call this function for multiple accounts at the same time, as this:
from multiprocessing import Pool

# Just one account
# scrape_profile_tweets_since_2023('elonmusk')

# Run in parallely
def functionToRunParallely(i):
    return i

noOfPools = 5

if __name__ == "__main__":
    with Pool(noOfPools) as p:
        p.map(scrape_profile_tweets_since_2023,['elonmusk', 'BarackObama', 'cathiedwood'])

Result
Your result will be something like this: