My Workflow
I spent a few nights wrapping Google's analyzeSentiment API into a GitHub Action. The Action runs Sentiment Analysis over the content of HTML files and provides an overview of the overall emotion of all (the selected) pages in your project.
The API returns values from -1 to 1, indicating how strong a certain emotion โ positive or negative โ is. After running the Action, a table with the score per each page is printed in its logs. Read more about Interpreting sentiment analysis values.
Along with other content analysis tools, it might come in handy to maintainers who want to understand the text that is pushed to the project every day. See it in action ๐
โ ๏ธ Now I got really excited about this and will continue developing along with other automation ideas that I have. This being an early release of the Action, please take a look at the roadmap and submit issues with what kind of features you would like to see in future releases.
Submission Category:
Maintainer Must-Haves
Yaml File or Link to Code
Here is an example of how to use the Action on public .html
files.
name: Sentiment analysis on public
on: push
jobs:
analysis:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2 #Be sure you checkout the files beforehand
- name: Run sentiment analysis on HTML files
uses: bogdaaamn/copy-sentiment-analysis@v0.6.1
with:
gcp_key: ${{ secrets.GCP_KEY }} #Google Cloud Platform API key. Read the README for instructions
Along with the code โ more examples, requirements and a known issues roadmap are available in the bogdaaamn/copy-sentiment-analysis repository (view it on Marketplace).
bogdaaamn / copy-sentiment-analysis
Run sentiment analysis over the text of your website using Google API.
Copy Sentiment Analysis
This GitHub Action runs Sentiment Analysis over the built text of your GitHub project. It uses Google's analyzeSentiment API, evaluating the overall emotion score (from positive to negative) of a page. The Action provides an overview of the scores of all the pages from your project (more on interpreting the scores).
๐ Usage
This is a workflow example of using the Action on plain .html
files from the public
folder (by default).
name: Sentiment analysis on public
on: push
jobs:
analysis:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2 #Be sure you checkout the files beforehand
- name: Run sentiment analysis on HTML files
uses: bogdaaamn/copy-sentiment-analysis@v0.6.1
with:
gcp_key: ${{ secrets.GCP_KEY }} #Google Cloud Platform API key. Read the README for instructions
Although, if you project needs to be built beforehand, be sure you placeโฆ
Additional Resources / Info
Open source use cases
At the moment, the overview table is printed in the Actions tab, after the code is running. But it seems counter-intuitive and having too much friction in between.
I am curious about what the community thinks: How would you see this Action printing the results? A comment to the PR? A table in Action's log? Failing if there are too many negative results?
โ ๏ธ GCP's bias in sentiment analysis
A few years ago, Google API was criticized in the media for producing bias results towards race, gender, and religion. So I had mixed feelings about using a pre-trained model in this Action.
Now, it is hard to understand what is going on with Google's proprietary algorithm and how they fight unwanted bias, but more recent research (see charlescearl, 2019) concluded that GCP seems to be less sensitive to the race or gender of participants than other competitor platforms. The same article recommends that users should proceed with caution and conduct evaluations on their own. The tests that I've done had neutral results, but I am ready to expand my research and pull the plug if needed.
Moreover, there is extraordinary research done towards identifying, analyzing, and diminishing bias in data (see Dixon et al., 2018 from Google Research or Caliskan et al., 2016 and May et al., 2019) and all I hope is that Google (or really any cloud provider out there) is doing better and better. I believe it is really important to enable and support bias and fairness research โ especially now, after the recent upbringing of GPT-3 (see Burus, 2020) when the society gets exposed more and more to technology.
Sources
Engadget: Google's sentiment analysis API is just as biased as humans
Techleer: Google Sentiment Analysis API gives a biased output
Measuring and Mitigating Unintended Bias in Text Classification
Semantics derived automatically from language corpora contain human-like biases
The (Un)ethical Story of GPT-3: OpenAIโs Million Dollar Model
Top comments (3)
An option to decide whether do the sentiment analysis per file or per "block" could be handful.
I recently worked with sentiment analysis, with sentences written within a Excel file. I could easily convert that Excel file into one HTML file, and split each sentence by a
div
or a CSS class, and then I could easily get the result using your Action.Regarding your results display question, I advise to keep something visual, like what you did with the table in the output. It would definitely be great to have it both as Action comment (as shown above) and as PR comment (for history and easier/faster visualization).
I'd also advise to generate a JSON artifact that's included with the "run".
That's what I do with my 2E2 tests, I store the screenshots and videos as artifacts. See github.com/UnlyEd/next-right-now/r...
Having such artifact would allow the owner to actually use those results programmatically quite easily. (table is great for visualization)
Thanks a lot @vadorequest for the input! Really appreciate it ๐๐ป
That is a great idea, I actually thought about that myself. From my experience, when we usually run sentiment analysis on actual documents, we do it for the whole document. But I see how there is a content issue there. While papers are meant to be written about the same topic, on a website you have different sections that might be totally unrelated and have a different tone, vibe, or language. So I think an option to switch between those would be really helpful.
Thank you for that, I totally agree. I am currently trying to sort out @actions/github and then I will jump on the PR comment.
This is such a great idea, never thought about how should I take this programmatically to the next steps.
Thanks for sharing UnlyEd/next-right-now, you have really nice pipelines in there.
Thanks! Glad it helped :)