Top 5 things you should be aware of when dealing with AI models

#ai #datasecurity #discuss

1. Privacy breaches – Who are you sharing your personal information with?

You probably already know that training AI requires large amounts of data, which often includes private information. The more data we use, the more accurate and better the model is.

The largest models use huge amounts of data and very often do not want to disclose what data they tested the models on. When ChatGPT was first launched, Samsung had significant problems because their employees used ChatGPT at work. Later, it turned out that this data was used to train the model and became publicly available.

Also, many of us use Slack, right? I don't know if you are aware, but Slack, by default, uses your messages for model training, and the only way to opt out is to email Slack stating that you do not want your data to be used for training the model.

This raises important concerns about data collection practices and the need to ensure that personal and sensitive information is handled responsibly.

2. Discrimination and bias in models - Did you check your facts?

The data used to train AI models can inherently contain biases, which may lead to discriminatory outcomes in AI decision-making. The algorithm itself can be designed to give answers that its owners think are more "suitable" for them.

The best example is Google Gemini, which caused a huge uproar because users who asked for a picture of the Pope received an image of a black or female Pope.

Also, if you asked for a picture of a German soldier from 1945, you would receive images of African Americans and Asians. Although this may seem like an amusing oversight with pictures, the problem is much deeper. Pushing certain viewpoints can be dangerous because people already use chatbots to seek information, which they later use as facts without verification.

That's why I believe developers creating these models certainly possess the technical expertise, but addressing bias is an ethical issue. And who holds the responsibility to deal with this?

3. Data manipulation - Have you heard about Nazi chatbot?

The quality of an AI model directly depends on the quality of the data it is trained on. And as we said, data can be manipulated.

Unlike the female Pope, the problem here is not with the code itself or the developers who wrote it. The issue is that data can be manipulated. For example, the first case I heard about 5-6 years ago was when Microsoft developed the "Tay" Nazi Chatbot.

It was supposed to be an AI that would communicate with users on Twitter, but within just 24 hours, it started making statements like "Hitler was right" and "I hate Jews." Users found an exploit and convinced the poor bot to become a Nazi.

Recently, there was a similar incident with Reddit. When it became known that Reddit data would be used for training the model, users deliberately responded with offensive and inaccurate content to influence the AI's training.

"Garbage in—garbage out" (GIGO) is a long-known term in analytics, but with the popularity of AI, it has become even more emphasized. What does it mean? The idea is that the outputs are only as good as the quality of the inputs.

In healthcare, this can be a problem if, for example, diagnostic data is poorly labeled or not labeled at all, leading the AI model to draw incorrect conclusions or provide incorrect diagnoses. The healthcare industry, in particular, exemplifies the critical need for accurate data practices to ensure patient safety and effective treatments.

4. Data leaks and data breaches - Is your company prepared for AI technologies?
Despite increased prioritization and budget allocation for AI system security—with 94% of IT leaders setting aside a dedicated AI security budget for 2024—77% of companies have still experienced security breaches related to AI.

Even with significant investments, only 61% of IT leaders believe their budgets are adequate to prevent potential cyberattacks.

5. Deepfakes - Is the Nigerian prince trying to contact you?

This may not directly affect us in the development and work with AI, but I believe it will become a very big problem for all of us users of modern technologies and the Internet.

On one hand, we will have many fake news and other false content, and it will become increasingly difficult to find accurate and original content.

On the other hand, I am convinced that the "Nigerian prince" will soon become very convincing and will try to scam as many people as possible. Personally, I am worried that it will become very easy to manipulate someone's voice and soon their video as well, which could be really unpleasant and dangerous.

What impressed me the most was the podcast between Joe Rogan and Steve Jobs that never happened. The models learned from all the materials available about the two of them, and the model itself created the conversation. By using a super realistic text-to-voice model, it sounded like they were truly talking.

It's difficult to say what awaits us in the future. Every so often, a newer and better model with incredible capabilities emerges, and we joke internally that ChatGPT 5 will probably have "out of the box" solutions for our clients.

And in a year or two or three, who knows, maybe Skynet from Terminator will actually come to life.

This article is part of a bigger blog where we are covering the topic of privacy and data security in the age of AI.

DEV Community

Top 5 things you should be aware of when dealing with AI models

Top comments (0)

Read next

"Computer Use" for UAT

"Computer Use" to Speed Up UI Development

Uncovering AI Hacking Tactics: New Honeypot Monitors Large Language Model Threats

Post-training layer scaling technique LiNeS prevents forgetting, enhances model merging