Im currently working on a GitHub bot that uses a machine learning model that can read info from a pull request (via graphql) and automatically tag the PRs with labels like author_confidence (high/low), pr_complexity (high/low), stackoverflow_tag (python , aws, etc...)
What other types of label information would add value to a PR?
Top comments (2)
Not sure I like the idea of the author_confidence one... How would ML predict how confident I am in my PR? Or is it confidence in myself overall? If so, this raises all kinds of ethical questions about algorithms scoring people and inherently biasing a system against them.
A tag that denotes what the PR is doing maybe? Bugfix, feature, documentation, tests etc?
Whether the PR is a duplicate of a similar PR maybe?
The author confidence is score is calculated based on the author's interaction with the community
ie. if they post many comments, have submitted other PRS, have they been asked to review prs, etc..
High confidence = I trust this author to make a PR that makes sense because of community interaction
Low confidence = Im weary of this PR because author is not known to community
I like the idea of what the "PR is doing"
our stackoverflow tags simulate that on a more granular level as it matches the code in the PR and predicts what label it would fall under if we pasted the code in stackoverflow.