This week I spent some time refactoring my project codeshift. I've been meaning to do this for a while but I've been holding back from working on it on my own time so that I have stuff to do when I need to work on it for an assignment.
uday-rana / codeshift
A command-line tool that translates source code files into a chosen programming language.
codeshift
Codeshift is a command-line tool to translate and transform source code files between programming languages.
Features
- Select output language to convert source code into
- Support for multiple input files
- Output results to a file or stream directly to
stdout
- Customize model and provider selection for optimal performance
- Supports leading AI providers
Requirements
- Node.js (Requires Node.js 20.17.0+)
- An API key from any of the following providers:
- OpenAI
- OpenRouter
- Groq
- any other AI provider compatible with OpenAI's chat completions API endpoint
Installation
-
Clone the repository with Git:
git clone https://github.com/uday-rana/codeshift.git
- Alternatively, download the repository as a .zip from the GitHub page and extract it
-
In the repository's root directory (where
package.json
is located), runnpm install
:cd codeshift/ npm install
-
To be able to run the program without prefixing
node
, runnpm install -g .
ornpm link
within the project directory:npm install -g
…
The first thing I wanted to do was to split my gigantic index.js file into smaller modules. I've been wanting to do this for some time now because I noticed it's getting hard to find the different sections of code in the single source file. Doing this took a while because I had to test my program after splitting each section off, but after I was done I felt like a huge weight was lifted off my shoulders - my program was so much easier to understand. I can't understate how much of a difference this made. Isolating the program into different logical components lets you easily identify and work on the part of the logic you need to, without having to worry about the rest of the program.
I also took the opportunity to clean up the logic for assigning a default model, and added a default provider for when the user fails to provide the base URL (I just realized that'd only work if they have the API key for that specific provider, so I think I'm gonna roll that change back..).
Another big chunk of my effort went towards the completion output logic - this part of the project had a lot of duplicate code. I had separate loops for different conditions because I didn't want to place a condition inside the loop because it would be inefficient, but it made reading and maintaining the code a mess. A big part of efficient code is human efficiency - how easy it is to work with. A few extra CPU cycles to run some more conditional checks in a loop won't slow a computer down much, but having to work with 4 separate for loops that all do nearly the same thing will definitely slow down a human. I decided I'd rather prioritize maintainability so I coalesced them into a single loop and extracted it into a function.
Before:
// Ugly code warning!
let completion;
try {
// Send request to AI provider
completion = await getAIChatStream(prompt, model);
} catch (error) {
console.error(`error getting response from provider: ${error}`);
process.exit(22);
}
let promptTokensUsed = 0;
let completionTokensUsed = 0;
let totalTokensUsed = 0;
try {
// Write to either output file or stdout
if (outputFilePath) {
let response = "";
// Read response stream chunk by chunk
for await (const chunk of completion) {
// Concatenate chunk to response
response += chunk.choices[0]?.delta?.content || "";
if (chunk?.usage) {
promptTokensUsed = chunk.usage.prompt_tokens;
completionTokensUsed = chunk.usage.completion_tokens;
totalTokensUsed = chunk.usage.total_tokens;
}
if (chunk?.x_groq?.usage) {
promptTokensUsed = chunk.x_groq.usage.prompt_tokens;
completionTokensUsed = chunk.x_groq.usage.completion_tokens;
totalTokensUsed = chunk.x_groq.usage.total_tokens;
}
}
fs.writeFile(outputFilePath, `${response}`);
} else {
// Read response stream chunk by chunk
for await (const chunk of completion) {
// Write chunk to stdout
process.stdout.write(chunk.choices[0]?.delta?.content || "");
if (chunk?.usage) {
promptTokensUsed = chunk.usage.prompt_tokens;
completionTokensUsed = chunk.usage.completion_tokens;
totalTokensUsed = chunk.usage.total_tokens;
}
if (chunk?.x_groq?.usage) {
promptTokensUsed = chunk.x_groq.usage.prompt_tokens;
completionTokensUsed = chunk.x_groq.usage.completion_tokens;
totalTokensUsed = chunk.x_groq.usage.total_tokens;
}
}
process.stdout.write("\n");
}
} catch (error) {
console.error(`error reading response stream: ${error}`);
process.exit(23);
}
if (tokenUsageRequested) {
if (
promptTokensUsed == 0 &&
completionTokensUsed == 0 &&
totalTokensUsed == 0
) {
console.error(`\n No Token Usage returned by model.`);
}
console.error(
`\nToken Usage Report:\n`,
`Prompt tokens: ${promptTokensUsed}\n`,
`Completion tokens: ${completionTokensUsed}\n`,
`Total tokens: ${totalTokensUsed}`
);
}
});
After:
// Recently learned this is called dependency injection!
const writeFunction = outputFilePath
? async (completionChunk) =>
await fs.appendFile(
outputFilePath,
completionChunk.choices[0]?.delta?.content || "",
)
: (completionChunk) => {
process.stdout.write(completionChunk.choices[0]?.delta?.content || "");
};
try {
for await (const chunk of completion) {
await writeFunction(chunk);
if (tokenUsageRequested) {
const usage = chunk?.x_groq?.usage ?? chunk?.usage;
if (usage) {
tokenUsage.prompt_tokens += usage.prompt_tokens || 0;
tokenUsage.completion_tokens += usage.completion_tokens || 0;
tokenUsage.total_tokens += usage.total_tokens || 0;
}
}
}
if (outputFilePath) {
await fs.appendFile(outputFilePath, "\n");
} else {
process.stdout.write("\n");
}
} catch (error) {
console.error(`error reading response stream: ${error}`);
process.exit(23);
}
At one point during refactoring I broke my program when I tried moving the program
variable definition into another file and tried importing it in my start file. I didn't look into it too much but it said program.action()
(which is the method used to run the program) was undefined so I assume I made a mistake when exporting. Either way, it wasn't a lot of logic so I was fine leaving it in the start file.
After refactoring my code I was asked to squash all of my commits together. I've been squashing commits for a little bit now so I know what to expect, what to do, and especially what not to do. It went pretty smooth - I squashed my commits, rebased my refactoring branch on main, and merged it into main, which led to a clean fast-forward merge (and a giant commit message).
I think having this level of control over the git history is awesome. It lets you clean up your commits and makes the history so much easier to understand. And what's incredible about Git is that even if you royally screw up, it acts as this safety net so you never lose your work, so you can play around with rebasing and squashing and get used to how they work without having to worry.
Top comments (0)