Facebook Scraper
This is a sample to scrape Facebook posts using Clicknium.
Preparation
- Python 3.7+
- Windows 7 SP1+
- Chrome browser
- VS Code
- Clicknium
- Clicknium Chrome extension
Scrape Facebook posts
We will scrape the post of the Facebook company page as an example.
Create a Python project
Create a Python file, for example, sample.py
, under a project folder.
Show Locators
under the VS Code Explorer:
Capture locator
A locator is a tool that targets the UI elements.
- Login and open the Facebook company page: https://www.facebook.com/facebook
- Click the
Capture
button in VS Code.
- Click similar elements This feature lets you get all the posts on the page that have the same structure.
- Use
Ctrl + Click
to capture the first post words on Facebook:
- Capture the second post in the same way:
You will see there are five elements matched. Click the save button and finish.
Get the text via Locator:
To get the locator targets, we can use find_element function. In our scenario, we need to get multiple posts so we can use find_elements function to get a result array.
UIs = cc.find_elements(locator.facebook.posts)
In the Python code, we can use Locator.
to use the locators we captured using Clicknium in this project. If there is a need to use the same Locator across projects, you can make the locator store into a cloud locator store, and you can reference it anywhere.
When we get the UI elements, we need to find an element property that can contain the text info. Check the Web elements property. The property innertext
is what we need.
uis = cc.find_elements(locator.facebook.posts)
for ui in uis:
text = ui.get_property("innertext")
we can print the text to check if it works or not.
If it doesn't work, we need to check the locator page to tune the property. The Locator uses the identity UI elements.
On the above page, you can do some quick validation and action to check if the Locator can work or not. And you can also select and modify which attribute you want to use to locate the UI elements.
Go to the next page
The Facebook content will be loaded when you scroll down the page. So if we capture once, we can't get all the information. So we have to capture each page. If we mimic the scroll action of the mouse, it would be hard to control. So the best choice would be to use the PageDown
button in the keyboard. The send_hotkey function can do it easily. we can find the Code
for PageDown
is {PGDN}
.
cc.send_hotkey("{PGDN}")
We can use a while loop the get all the posts. Since the multiple capture would get the same post for times, we can use a dictionary to store the post and use ancestorid
as the key.
Top comments (0)