DEV Community

Cover image for Auto blog scraper in your GH profile using GH Actions.
ZacharyP
ZacharyP

Posted on • Updated on

Auto blog scraper in your GH profile using GH Actions.

Wana get right to it click here

What

...is an auto blog scraper? A small tool used to take a look at your blog of choice every so often and automatically updates your GitHub readMe with a nice list of links to the most recent posts all hands free. Here is mine

and a GitHub Action?? A way to automate processes I don't want to do anymore. This rabbit hole runs deep but for the scope of this tutorial we are only going to scratch the surface of that iceberg.

This uses a GitHub Action called blog-post-workflow a brilliant tool which does all of the heavy lifting behind the scenes to make this simple workflow possible.


Why

Because we can

How

I am under the assumption you already have a GitHub readME since you have made it this far down the page, but if not here is a bonus micro tutorial.

  • Create a new folder with the same name as your GitHub username.
  • Create a README.md file in that repo and push it to GitHub.
  • Continue with your regularly scheduled tutorial.

1: Open the readME repo in your code editor of choice.


2: Create a folder named .github and a folder named workflows within the new .github folder.


3: Create a file within the workflows folder named blog-post-workflow.ymlFolder structure


4: Within that new .yml file paste the following code and replace
{ YOUR_FEED_HERE } with whatever blogs RSS feed you want to scrape:

```yaml
 name: Latest blog post workflow
 on:
    schedule: # Sets the Action to trigger based on time
      - cron: "0 0 * * *" # triggers the action every hour 
    workflow_dispatch: # Allows you to run the action manually 


 jobs:
    update-readme-with-blog:
      name: Update this repo's README with latest blog posts
      runs-on: ubuntu-latest
      steps:
        - name: Checkout
          uses: actions/checkout@v2
        - name: Pull in dev.to posts
          uses: gautamkrishnar/blog-post-workflow@master
          with:
            feed_list: { YOUR_FEED-HERE }

```
Enter fullscreen mode Exit fullscreen mode

Some things to note:

  • cron: is a fancy way to tell time and in this case is how we set a timer on our bot in this case once every day at midnight.
  • workflow_dispatch: is what runs through the jobs provided to the bot and attempts to run them. this is where if an error occurs it will provide what step in the job is causing an issue.

5: In the README.md file paste the following code where you want the list to show up with a link to the main page of that blog. Just replace {BLOGS_URL} with the actual URL to your blog.

```md 

 <!-- BLOG-POST-LIST:START -->
 <!-- BLOG-POST-LIST:END -->

 ➡️ [more blog posts...]({BLOGS_URL})

```
Enter fullscreen mode Exit fullscreen mode

6: Push changes to GH and pat yourself on the back because you have finished! From here the Action will fire every hour until told otherwise, all you need to do now is go write the blog post you've been putting off for awhile. 🎉🎉🎉


💡 Side-Note: You can manually run this action by...

First: Going to the Actions tab in the README repo on GitHub
GitHub repo navigation bar Highlighted is the Actions tab

Next: Click on the Action and click on the run workflow button to manually run the action.

Shows the submenu for the individual Actions, showing where the manual run button is.


TLDR

Use GH actions to automate the updating of your GH profile with any new blog posts using a tool called blog-post-workflow. ...you know this is a tutorial right??


Me writing all this fun and engaging content for everyone:

Bruce Almighty typewriter scene gif


Writing this tutorial was a hands on way for me to try and better
understand the subject and too practice technical writing. Thank you for reading 👋🏼 Follow me on Twitter to watch me teach myself how to code by teaching everyone else.

Top comments (0)