DEV Community

Nima Akbarzadeh
Nima Akbarzadeh

Posted on

Scraping tweets without Twitter API for FREE

Image description

In the past (before Elon Musk…), you could easily and freely apply for a developer account to get your own tokens and start using Twitter API without any struggle. One of the strengths of the developer account besides making bots and tweeting via API was search API. You could almost grab all the tweets you want. But after Elon Musk, unfortunately, you have to pay for it!

Tiers will start at $500,000 a year for access to 0.3 percent of the company's tweets. Researchers say that's too much for too little data. [source]

There is one solution that almost always works. Selenium! (Also, it's good to know that the great alternative for selenium in JS is puppeter).

It almost allows you to scrape everything on the surface of the web. Just you have to write a script for your use case with the selenium library.

How

The algorithm for scraping tweets is so easy.
These are the steps:

  1. Open Twitter search with an advanced search query.
  2. Scrape specific tags to get the value
  3. scroll
  4. Repeat the steps until you scrape the number of tweets you need.

Code

It can be written by your script or using other libraries like twitter_scraper_selenium
It's available on PyPI and GitHub.



pip install twitter_scraper_selenium


Enter fullscreen mode Exit fullscreen mode

(Note: For saving as CSV and working as data frames, we must install pandas and other dependencies too)
Then you can write your own wrapper function like this

from twitter_scraper_selenium import scrape_keyword
import json
import pandas as pd
import asyncio



def scrape_profile_tweets_since_2023(username: str):
    kword = "from:" + username
    path = './users/' + username
    file_path = path + '.csv'
    tweets = scrape_keyword(
                            headless=True,
                            keyword=kword,
                            browser="chrome",
                            tweets_count=2, # Just last 2 tweets
                            filename=path,
                            output_format="csv",
                            since="2023-01-01",
                            # until="2025-03-02", # Until Right now
                            )
    data = pd.read_csv(file_path)
    data = json.loads(data.to_json(orient='records'))
    return data


Enter fullscreen mode Exit fullscreen mode

You can call this function for multiple accounts at the same time, as this:



from twitter_scraper_selenium import scrape_keyword
import json
import pandas as pd
import asyncio

def scrape_profile_tweets_since_2023(username: str):
    kword = "from:" + username
    path = './users/' + username
    file_path = path + '.csv'
    tweets = scrape_keyword(
                            headless=True,
                            keyword=kword,
                            browser="chrome",
                            tweets_count=2, # Just last 2 tweets
                            filename=path,
                            output_format="csv",
                            since="2023-01-01",
                            # until="2025-03-02", # Until Right now
                            )
    data = pd.read_csv(file_path)
    data = json.loads(data.to_json(orient='records'))
    return data
You can call this function for multiple accounts at the same time, as this:
from multiprocessing import Pool

# Just one account
# scrape_profile_tweets_since_2023('elonmusk')

# Run in parallely
def functionToRunParallely(i):
    return i

noOfPools = 5

if __name__ == "__main__":
    with Pool(noOfPools) as p:
        p.map(scrape_profile_tweets_since_2023,['elonmusk', 'BarackObama', 'cathiedwood'])



Enter fullscreen mode Exit fullscreen mode

Result
Your result will be something like this:

Image description

In the next post, we are going to scrape mentions/replies as well.

If you like the post, please use clap or follow me on GitHub and LinkedIn!
Github.com/iw4p

https://www.linkedin.com/in/nimk/

Top comments (1)

Collapse
 
ambar3497 profile image
Ambar Pathak

1- I dont see the fucntion to run parallel being used
2- it gives me an error str object has no attribute close

can you please help