You often hear the term “embedding” when talking about RAG applications. No, always.
Simply put, embedding means converting text into a vector.
But do you know how to convert vectors back to text?
Theoretically, it is impossible to convert vector to original text.
As a first Medium article, I’m going to explain about this.
It’s very simple once you know the embedding process.
Why is it impossible?
You are probably aware of several embedding models supported by NLP providers such as OpenAI or Hugging Face. Right?
Their performance and output dimensions vary, but the embedding process is the same.
Mainly, embedding consists of tokenization, lemmatization, stemming and stop words removal.
Here, a stop word refers to a word that does not affect the meaning of the entire sentence.
“a”, “the”, “that”, “which” and so on.
Although the criteria for judging this vary depending on the embedding model, this process is essential.
For example, let’s say “am”, “to” and “~ing” are stop words.
And let’s focus on the next two sentences.
“I am going to school.”
“I go to school.”
After stop word removal processing, the two sentences become identical.
“I go school”.
In other words, two vectors are the same, but original text are different.
This is the reason why it is impossible to revert vector to text.
Top comments (0)