kayYOLO

Posted on Oct 13, 2022

[Python]Facebook Scraper via Clicknium

#tutorial #python #programming #datascience

Facebook Scraper

This is a sample to scrape Facebook posts using Clicknium.

Preparation

Python 3.7+
Windows 7 SP1+
Chrome browser
VS Code
Clicknium
Clicknium Chrome extension

Scrape Facebook posts

We will scrape the post of the Facebook company page as an example.

Create a Python project

Create a Python file, for example, sample.py, under a project folder.

Show Locators under the VS Code Explorer:

Capture locator

A locator is a tool that targets the UI elements.

Login and open the Facebook company page: https://www.facebook.com/facebook
Click the Capture button in VS Code.

Click similar elements This feature lets you get all the posts on the page that have the same structure.

Use Ctrl + Click to capture the first post words on Facebook:

Capture the second post in the same way:

You will see there are five elements matched. Click the save button and finish.

Get the text via Locator:

To get the locator targets, we can use find_element function. In our scenario, we need to get multiple posts so we can use find_elements function to get a result array.

UIs = cc.find_elements(locator.facebook.posts)

In the Python code, we can use Locator. to use the locators we captured using Clicknium in this project. If there is a need to use the same Locator across projects, you can make the locator store into a cloud locator store, and you can reference it anywhere.

When we get the UI elements, we need to find an element property that can contain the text info. Check the Web elements property. The property innertext is what we need.

    uis = cc.find_elements(locator.facebook.posts)
    for ui in uis:
        text = ui.get_property("innertext")

we can print the text to check if it works or not.

If it doesn't work, we need to check the locator page to tune the property. The Locator uses the identity UI elements.

On the above page, you can do some quick validation and action to check if the Locator can work or not. And you can also select and modify which attribute you want to use to locate the UI elements.

Go to the next page

The Facebook content will be loaded when you scroll down the page. So if we capture once, we can't get all the information. So we have to capture each page. If we mimic the scroll action of the mouse, it would be hard to control. So the best choice would be to use the PageDown button in the keyboard. The send_hotkey function can do it easily. we can find the Code for PageDown is {PGDN}.

cc.send_hotkey("{PGDN}")

We can use a while loop the get all the posts. Since the multiple capture would get the same post for times, we can use a dictionary to store the post and use ancestorid as the key.

Source code

GitHub

DEV Community

[Python]Facebook Scraper via Clicknium

Facebook Scraper

Preparation

Scrape Facebook posts

Create a Python project

Capture locator

Get the text via Locator:

Go to the next page

Source code

Top comments (0)

Read next

The Future of Software Development and Web Development: Trends to Watch in 2025

Resilience & Adaptability

Detailed Tutorial: Crawling GitHub Repository Folders Without API

How to Retrieve EC2 Instances Information Using Python and Boto3