In today's global market, it's essential to internationalize (i18n) products. By supporting multiple languages, products can attract and serve a global user base, enhance user experience, increase loyalty, and gain market share. However, internationalization support often comes with high costs and complex maintenance:
- Setting up language support requires professional teams and selecting suitable partners, which takes time and effort.
- Traditional translation processes are slow and require manual intervention, making it challenging to keep up with product iterations.
In this article, we will share the Logto team's experience of efficiently supporting internationalization for our products using ChatGPT and its API while minimizing costs.
How we approached internationalization
Before integrating the ChatGPT API, we followed a traditional translation process to support internationalization for our products. The process was as follows:
- For each language, we created a translation folder containing several TypeScript files. Each file consisted of a TypeScript object with key-value pairs representing phrase keys and their translations.
- To add a new phrase, we manually added a new key-value pair to the translation file for each language.
- In the root directory of each language, we created an index.ts file that exported all the translation files in that language.
Here's an example of the index.ts file for the zh-HK language:
import api_resources from './api_resources';
import users from './users';
const zhHK = {
common: {
hi: '你好',
nice_to_meet_you: '很高興見到你。',
},
home: {
welcome: '歡迎',
},
api_resources,
users,
};
To display a phrase in the UI, we used a function that retrieved the translation based on the current environment. We accessed nested objects or keys using the dot notation.
For example, if the user was using an English browser:
t('common.nice_to_meet_you'); // Nice to meet you.
Challenges faced
We encountered two major challenges with this approach:
- As the number of languages and phrases increased, maintaining the translation files became challenging. Adding new phrases to each language manually was time-consuming and prone to errors.
- The translation process was unreliable. Without a dedicated translation team, we relied on community users for most translations. They were doing great jobs, but inevitably the translations became incomplete and inconsistent over the time.
Introduce ChatGPT
While ChatGPT gained popularity, we discovered an interesting use case. Accidentally, we copied a TypeScript phrase object into ChatGPT and found that it could translate the code snippet while preserving the code structure. This discovery excited us.
For example:
Please translate the code snippet into `zh-HK`:
export const translation = {
hi: 'hi',
nice_to_meet_you: 'Nice to meet you.',
};
And the result:
export const translation = {
hi: '你好',
nice_to_meet_you: '很高興見到你。',
};
This was truly remarkable. In real-world cases, we also had historical files that included some untranslated phrases marked with // UNTRANSLATED
:
const zhHK = {
common: {
hi: '你好',
nice_to_meet_you: '很高興見到你。',
},
home: {
welcome: 'Welcome', // UNTRANSLATED
},
};
ChatGPT was even able to "partially" translate the code snippet, eliminating the last obstacle. However, it was still far from perfect. Let's see the issues we encountered.
Prompt engineering
The randomness response issue
One well-known fact about ChatGPT is its unpredictable nature. Sometimes, the results are accurate, while other times, they can be strange or not what we intended. While this is acceptable in a chatbot context, it becomes problematic when generating translation code snippets.
Let's revisit the zh-HK
example:
Please translate the code snippet into `zh-HK`:
export const translation = {
hi: 'hi',
nice_to_meet_you: 'Nice to meet you.',
};
The expected translation for Nice to meet you.
in zh-HK
is 很高興見到你。
. However, there are instances where the snippet is translated into zh-CN
:
export const translation = {
hi: '你好',
nice_to_meet_you: '很高兴见到你。',
};
很高兴见到你。
is different from 很高興見到你。
.
Sometimes, the prompt "Please translate the..." is also translated:
請把這個代碼片段翻譯成 `zh-HK`:
export const translation = {
hi: '你好',
nice_to_meet_you: '很高興見到你。',
};
Occasionally, the results contain unnecessary content:
This is the translation:
export const translation = {
hi: '你好',
nice_to_meet_you: '很高興見到你。',
};
The first line "This is the translation:" is unnecessary for our needs.
Based on the above results, we can identify several gaps between ChatGPT and us in the conversation:
- Sometimes, only the language is inferred (zh), without considering regional differences (zh-HK).
- It fails to accurately identify which content needs to be translated.
- The response format doesn't match our expectations.
Write clear and specific prompts
To address the issues mentioned above, we refined our prompt:
Please infer the regional language corresponding to the language code ${languageCode},
and translate the code snippet delimited by triple backticks into the regional language inferred.
Ensure the output is a TypeScript code in the original format, ready to be directly used in the codebase.
```
${code}
```
This new prompt proved to be effective. We instructed ChatGPT to:
- Infer the regional language based on the language code provided.
- Translate the code snippet delimited by triple backticks.
- Ensure the output maintains the original format without any additional content.
Give the model time to think
As mentioned earlier, for existing phrases, we needed to translate those marked with // UNTRANSLATED. This introduced another layer of complexity to the prompt. For example:
Please infer the regional language corresponding to the language code `zh-HK`,
and translate the code snippet delimited by triple backticks into the regional language inferred.
Ensure the output is a TypeScript code in the original format, ready to be directly used in the codebase.
Remember not to translate the values without an `// UNTRANSLATED` comment.
```
export const translation = {
hi: '你好',
nice_to_meet_you: 'Nice to meet you.', // UNTRANSLATED
success: '你成功了!',
};
```
One of the results we obtained:
```
export const translation = {
hi: '你好',
nice_to_meet_you: '很高興見到你。', // TRANSLATED
success: '你做到了!', // TRANSLATED
};
```
It also translated (or transformed) the phrase success
and added two // TRANSLATED
comments, which were both unexpected.
Although the previous prompt was clear and specific, it was quite lengthy and challenging to follow, even for humans. To improve this, we gave the model more time to think. We provided a step-by-step prompt:
You will receive a language code and a TypeScript code snippet delimited by triple backticks, which contains an object.
Please follow the steps below to process the code snippet:
Step 1: Infer the regional language corresponding to the language code you received.
Step 2: Collect object field keys only when their value is followed by a " // UNTRANSLATED" comment from the code snippet.
Step 3: Translate only the values associated with the keys collected in Step 2 into the regional language inferred in Step 1.
Step 4: Remove all comments in the code.
Ensure the output is a TypeScript code file in the original format, ready to be used directly in the codebase.
Language code: ${languageCode}
Code snippet:
```
${code}
```
The results significantly improved. However, sometimes ChatGPT would display the "thinking process" since we explicitly provided the steps.
Roles in conversations
Although there is no direct user interface, we can send an interactive and dynamic conversation to ChatGPT by assigning different "roles" to each message. The role can be one of the following:
- "system": Responsible for guiding the flow and behavior of the conversation to ensure continuity and goal orientation.
- "assistant": Represents the ChatGPT model itself, providing answers and responses.
- "user": Represents the user asking questions and interacting with the model.
By using roles, we can create a more engaging conversation with ChatGPT. Each message plays a specific role in guiding the conversation.
Here, we can use the "system" role to guide ChatGPT on how to accomplish our task:
You are a translation assistant for a TypeScript engineer.
You will receive a language code and a TypeScript code snippet delimited by triple backticks, which contains an object.
Please follow the steps below to process the code snippet:
Step 1: Infer the regional language corresponding to the language code you received.
Step 2: Collect object field keys only when their value is followed by a " // UNTRANSLATED" comment from the code snippet.
Step 3: Translate only the values associated with the keys collected in Step 2 into the regional language inferred in Step 1.
Step 4: Remove all comments in the code.
Ensure the output is a TypeScript code file in the original format, ready to be used directly in the codebase.
Use the "user" role to ask ChatGPT to translate the code snippet:
Language code: zh-HK
Code snippet:
```
export const translation = {
hi: '你好',
nice_to_meet_you: 'Nice to meet you.', // UNTRANSLATED
success: '你成功了!',
};
```
Use the "assistant" role to indicates the "stadnard answer":
export const translation = {
hi: '你好',
nice_to_meet_you: '很高興見到你。',
success: '你成功了!',
};
Finally, combine the three messages above with the code snippet we want to translate, and send a single request to the Chat Completion API. Here's an example:
const messages = [
{
role: 'system',
content: 'You are a translation assistant for a TypeScript engineer.\nYou will receive...',
},
{
role: 'user',
content: 'Language code: zh-HK \nCode snippet: \n...',
},
{
role: 'assistant',
content: 'export const ...',
},
{
role: 'user',
content: `Language code: ${languageCode} \nCode snippet: \n \`\`\`${code}\`\`\``,
},
];
We replaced languageCode
with the target code and code
with the code snippet to translate, and saw the exact result we expected. Bravo!
Integrate with our CLI
While the results were correct and stable, the process of using ChatGPT API was still not very convenient. We had to manually copy the code snippet, paste it into the request, and replace the languageCode
and code
variables. Considering the need to translate more than 10 languages and 100 code snippets, this was not an ideal experience.
To address this, we decided to integrate the ChatGPT API into our Command-Line Interface (CLI). We added two commands to our CLI:
-
logto translate sync
: This command collects code snippets across all languages that contain// UNTRANSLATED
comments, translates them, and replaces the original code snippets with the translated versions. -
logto translate create [languageCode]
: This command creates a new language with the specified[languageCode]
and translates all code snippets from the default language (English) to the new language.
Although the translation may be not perfect, it is still a huge improvement. We could now focus on validating and improving the translation results instead of spending time on the actual translation process.
Conclusion
Integrating ChatGPT API into our workflow for product internationalization can be a powerful tool, but it requires careful planning and clear instructions to get the best results.
If you're developing AI applications, user authentication is essential. Logto provides a simple and secure way to authenticate your users and offers a unified user identity system across all your applications, and also offers out-of-the-box support for ChatGPT plugins. For more information, check out our blog post:
Top comments (0)