Splash is a javascript rendering service. I don't have much idea what this service actually is. All I know is the service is one of many tools that could help me scrapping sites that needs javascript to run and enabled. And Splash could work well along with Scrapy, the web scrapping framework that I currently learn about. And as always, If this service can be done installed using Docker then I would give a try the docker way.
Pulling the Image
As instructed from the docker registry page, we can pull the latest splash image using this docker command (The image size is huge enough, prepare your internet) :
docker pull scrapinghub/splash
And when check the image listed using docker image ls
, we could see that it has a huge size:
scrapinghub/splash latest 9364575df985 12 months ago 1.89GB
Run As Container Service
We can name the service anything you want, but here let's it's splash-test
. We forward the port to 8050:8050
so we can access it on our browser. Here is the full command to create and run the container:
docker run --name splash-test -p 8050:8050 -d scrapinghub/splash
Once it created, you can check whether the service is running or stopped using docker container ls
:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6e49662c03a7 scrapinghub/splash "python3 /app/bin/sp…" 48 seconds ago Up 46 seconds 0.0.0.0:8050->8050/tcp, :::8050->8050/tcp splash-test
You could also check the resource used by the service using docker stats
:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
6e49662c03a7 splash-test 0.08% 181.8MiB / 6.043GiB 2.94% 1.09MB / 987kB 0B / 0B 37
Render A Javascript-Required Site
You can access the service using your browser at http://localhost:8050/ and here is what it looks like:
If you successfully followed me at this point, then you can start to render any web site that needs javascript enabled to view the pages. For example, you can use https://www.transfermarkt.com/ because I find that this site can't be viewed when I disable the javascript on the browser. So try it by filling the URL form with it and hit the green Render me!
button.
As the result, you can see the snapshot image of the site, some statistics, and more importanly the raw html document that ready for you to scrap it.
That's it and have fun scrapping!
Top comments (0)