Samuel-Zacharie FAURE

Posted on Apr 28, 2023

The robot invasion of Dev.to

#meta #ai #beginners #writing

In a previous article, I've been complaining about the sheer lack of quality on Dev.to, or at the very least about the incredibly low signal-to-noise ratio.

Authors Vs Readers

As I was explaining, the incentives of the platform and readers are at odds with the incentives of (most of) the authors.

The readers on Dev.to want good quality content which can easily be found.

The authors on Dev.to want recognition, personal branding, and sometimes just good old link building (linking to their personal product/pages for SEO).

Sidenote: Not all authors. Some do seem to enjoy writing and sharing knowledge. But looking at what's published, this seems to be an extreme minority.

Now, I personally think it's great that authors get personal benefits for sharing their knowledge online. I love writing, but I also love being read, and if a future recruiter love what I wrote, that's a good motivation booster.

But there are two ways to go for that. You either spend a long time carefully crafting an interesting, insightful article... or you just spam whatever.

Since my last article complaining about the state of affairs (Published on the 6 of July, 2021), we lived through an AI revolution. And now, I would say that the robot invasion have begun, but that's not true. The robots have already invaded, and they won. Dev.to belongs to the Robots.

The ChatGPT problem on Dev.to

Searching for interesting articles on the "relevant" section, a tremendous amount are GPT-generated. Almost all of them via ChatGPT: you can recognize the subtle flavor inherent to ChatGPT's training, such as its tendency to reply in 5 to 10 bullet points.

We do live in an age where you can't be sure that something was AI-generated, or is just poorly-written human content. Thankfully, we have tools such as zeroGPT to detect if something is AI-generated.

I even suspect the process to be automatized. It would be trivial to write a list of 20 prompts somewhere, then write a script to regularly copy-paste directly from ChatGPT to Dev.to. There you go, 20 "interesting" articles for your personal branding.

There are (almost) no AI-assisted articles on the platform

It's all generated. Sometimes it is clear the author took less than a minute modifying a few words and adding titles or even an introduction. But the articles are completely AI-generated.

Very few posts are actually "chat-assisted", meaning GPT was used to write parts of the article, but a human wrote big pieces of the article themselves. Most of them are entirely, completely, copy-pasted from ChatGPT.

I checked more than 100 AI-generated articles, only 1 or 2 were detected as "AI-assisted". I'm not a fan of AI-assisting for writing, but this isn't the problem here. The problem is AI-generation and extreme laziness leading to terrible content.

Obviously, this is very bad.

Our AI technology is impressive. Revolutionary, even. But we still haven't breached the AGI milestone where an AI can actually produce new, interesting, insightful content.

So this is only amplifying the biggest issue on Dev.to: the lack of good, interesting content, and the drowning of such content in a sea of mediocrity.

What are Dev.to guidelines on this?

This is my personal interpretation of Dev.to's Guidelines for AI-assisted content..

I mention this is a personal interpretation of the guidelines, because I personally find them unclear and a tad confusing.

It seems that Dev.to authorizes AI-generated content, under the conditions that the post discloses that the content was AI-generated.

On the 100+ posts I detected were AI-generated (not AI-assisted, but 100% generated), absolutely none disclosed that information.

The guidelines also mention that AI-assisted and AI-generated articles should not:

Be published with the main purpose of building a personal brand, building a social media presence, or gaining clout.

While this rule is great in spirit, it is absolutely unenforceable and therefore useless. Without the ability to read minds, you cannot know the main intentions of an author.

What can be done?

Here are some acting suggestions:

Hardening & clarifying the guidelines: My personal opinion is that AI-generated content, and maybe even AI-enhanced content, have absolutely no place on Dev.to or any publishing website that strives for a modicum of quality. Any breaking of this rule should be cause for a ban.
Hiring extra moderators to enforce those rules. As much as I love to report guidelines-breaking articles in an effort to increase Dev.to's overall quality, it feels not only like removing a drop of water from the ocean, but instead passing that drop of water to another person who might or might not decide it belongs in the ocean.

I did not find any way to become a moderator myself, just a "trusted user", which doesn't seem to confer extra moderation power.

The lowest hanging fruit: Implementing auto-detection for GPT-generated text on each new publication. The technology exists and is already reliable and robust.

Conclusion

I love writing on Dev.to. I love publishing on a platform full of enthusiast developers. I love the sense of community, and I love the efforts from the team to curate content. My last two articles made it to 'Top 7 weekly', and I'm infinitely grateful to the team for choosing me.

But I don't love reading on Dev.to. It's just so hard to find insightful, interesting content in the sea of robots all yelling for your attention.

I do hope it gets better in the future, but it won't happen by itself. We need to do something about the robots.

Top comments (16)

Ben Halpern • Apr 28 '23

This is a really interesting analysis. We have made major improvements in the last year or so improving this from the UX perspective.

In general we put the greatest emphasis on what is elevated for users in their feeds and other forms of recommended content vs what comprises the whole library of user-generated content.

This post already offers some hints as to how to conduct an analysis and relating to how much content actually gets viewed by people which is most important to us.

Any additional thoughts, definitely listening!

Samuel-Zacharie FAURE • Apr 28 '23

Sidenote: in the last 15 minutes after this article being published, about 12 other articles have been published, and about 1 out of 3 are GPT-generated.

Randall • Apr 28 '23

I'm skeptical of how useful this metric is. Most of my posts are scoring 30%+. One of mine from July 2022 gets 65%.

Accreditly • Apr 28 '23

Yeah I agree, one of mine is 55% likely to be AI generated. I spent 2 hours writing it from scratch myself. I'm not disagreeing with the article's premise about the quality of the articles on dev.to being low lately though.

Personally I think the fix here is better investment into the 'feed'. Quality articles get traction. My feed is littered with posts with 0 comments and 0 reactions. A quality article, whether it's AI generated, assisted, or human-only would naturally get comments and reactions, so we should be utilising that in showcasing the best posts.

Samuel-Zacharie FAURE • Apr 28 '23

My feed is littered with posts with 0 comments and 0 reactions. A quality article, whether it's AI generated, assisted, or human-only would naturally get comments and reactions

Not necessarily, because comments and reactions can only happen first in the "latest" feed before the "relevant" feed, but that "latest" feed is just too spammed.

Accreditly • Apr 28 '23 • Edited

I get what you're saying but there are ways to implement it to get around that issue. It's what the majority of large social networks do to combat poor quality posts. There are numerous strategies:

Giving individual accounts an internal (usually hidden) score based on their previous posts performance. This is what Facebook/Meta used to do, and I assume still does. Higher scoring accounts get more visibility than low scoring accounts. This just naturally promotes better content.
Implement a Reddit style upvote/downvote system. Effectively crowdsourcing poor quality and good quality post moderation.
ML to score posts based on recognised patterns that are probably AI produced. Although given that ~a lot of~ most tools in this space give lots of false positives it would need to be in conjunction with one something else.

To be clear though, I agree with your post. I've literally just reported a post about different types of rugs...

Rense Bakker • Apr 28 '23

Tbh I think this problem existed before chatgpt. It's the nature of this platform that they allow all content without redaction. Unfortunately that results in a lot of low quality content. I'm not sure if any amount of moderation will be able to fix that. On the other hand... I agree they could try harder. But there are other issues I have with their community guidelines. They're subject to interpretation by the moderators and mostly rely on the fear factor (fear to be banned), because moderators only catch a fraction of what goes on in the comments section of articles. People can freely disregard all the rules and any form of civil conversation, aslong as they don't get caught... A lot of online platforms follow this same pattern of moderation and imho it's evil... If they used more objective tools, like zerogpt, that scan ALL the content on the platform, it would be better.

Brian Kirkpatrick • May 3

I do love reading on dev.to but--and this is important--only as a means of following up on content from people I already have in network. As a means to discover, it has very little utility. And frankly, I'm surprised that's still the case even for less junior-javascripty-topics like C.

cloutierjo • Apr 29 '23

Totally agree with your vision here, I'm scrolling through the dev feed almost everyday to find the "same" article. Usually finding 2 or 3 interesting article a day.

My gpt eye are not quite up to date as I really realise about an hour's ago that a was reading a low fine article until it say "my knowledge end in 2021 so..." And i just stopped their think "what the ...., Ho really!"

I was then to start looking at a gpt detector, so thank to provide that anyway.

As for solution, I would go for an author quality score. You start with a relatively high score, then if you do crap, people flag your post, you never get interaction your score goes down and your visibilities with it, otherwise good author get better scores and better visibility. (Good commenter should be taken into account too! Ai account are probably not commenting or are spamming in comments)

But to avoid stack overflow entry keeper that higher welcoming visibility is imo important.

Samuel-Zacharie FAURE • Apr 29 '23

Great idea

Samuel-Zacharie FAURE • Apr 28 '23

Call to the readers: what are your thoughts on this?

Do you have suggestions to correct the issue?

Ingo Steinke, web developer • Apr 28 '23 • Edited

Thanks for your post! I wasn't aware of zeroGPT, I must check that out!

As you mentioned we have been discussion quality and signal ratio on DEV for long. I think these problems are not unique to DEV, but amplified by its inclusive and beginner-friendly guidelines.

But what could be an alternative?

How do other sites / platforms compare?

StackOverflow's "elitist meritocracy", also perceived as "toxic gatekeeping" risks turning away aspiring beginners and deleting valuable content. They have a zero-tolerance policy against anything generated by the popular machine learning based "AI" systems. They also allow casting downvotes for most users with a certain reputation. DEV has added its own, less obvious and less destructive, down voting option, but still keeps focusing on either adding or not adding positive reactions.

Twitter (Blue Sky Mastodon Fediverse whatever): "social" media was where DEV started before building the current forem-based platform. While some people, especially developers in the USA seem to experience a "tech twitter" at least in the past, where people shared valuable information, Twitter users keep complaining about signal vs noise ratio as well, plus the increasing hate speech, spam, and political discussions. I feel Twitter to be "toxic" on a level far beyond StackOverflow, but that might depend on who you follow.

Medium, Substack, Hashnode, Tealfeed whatever: publishing platforms. I haven't tested all of those, but I didn't like medium for various reasons, most of all having to pay for reading mostly worthless content, and I feel that those platforms are even more self-promoting self-publishing write-only sites for SEO optimization purposes. Sometimes I find a valuable technical tutorial there, but often it turns out to be too outdated or opinionated anyway.

Low Quality / Chatbot Dilemma

ChatGPT, GitHub Copilot, tabnine etc. machine learning or algorithm based assistance systems or similar "artificial intelligence" chatbot systems in general: I have been testing most of those and discussed it with fellow developers. ChatGPT seems to be helpful to generate boilerplate code that you would otherwise have copied from StackOverflow, tutorial posts, or outsourced to junior developers. I found it more irritating than helpful, especially due to the low quality I always felt it was sure to fail in some untested edge case sooner or later.

Tuning my own DEV algorithm

Summing up, I still find DEV to be the lesser evil, so to say. My personal strategies for improving perceived content quality, much like on Twitter: follow people, block other people, follow tags, block other tags (which is done, unintuitively, by following them with a sub zero priority of -1 or lower). And, like at StackOverflow, I try to engage in discussion and moderation to some extent, like posts, flag posts as high or low quality, and report spam posts to the admins.

After seeing so many things come and go, like USENET (nntp-based "news groups"), IRC, mySpace and Geocities pages, bulletin boards, experts exchange, StackOverflow, Twitter, YouTube, DEV, I remain hopeful that there will always be some resource to go.

But what should DEV do to improve quality and address the chatbot dilemma?

Strategies against low-quality content

I second your claim that DEV guidelines should be more clear and more effectively enforced against 100% chatbot generated articles. Personally, I would rather cut at 50% or even when there is any detectable "AI" ingredient. If people used it as an assistance, they should sure edit the output and adapt everything to their personal style, so how could that be bot content? Enter the chance of false positives. A friend of mine had one of their hand written text blocked as seemingly AI-generated by YouTube recently. So let's say an 80% bot origin probability should flag a post for deletion or hide it at the bottom of any recommendation algorithm.

Last but not least, let's keep engaging and discussing as a community! Which means, take our team to actually read other people's posts, leave comments, and link to other people's posts where it feels appropriate. This should at least lead to an interlinked network of related quality content written by actual people.

Apart from that, let's not rely on any platform, not even DEV. As IndieWeb claims, we must own our content. So let's keep updating and adding content to our own websites and weblogs. This might sound old-fashioned, but it's the only guarantee to have a site without spam, hate speech, and half-knowledge generated by bots and thickheads.

Samuel-Zacharie FAURE • Apr 28 '23

Very insightful comment, thank you for this.

Ingo Steinke, web developer • Apr 28 '23 • Edited

Thanks! I have edited my comment to highlight my points using bold text and sub headlines, to make it easier for quic readers to browse. I also edited the paragraph about down voting options on DEV to avoid misunderstanding.