TL;DR
I created Repomix, a Node.js tool that combines your project files into a single document. It was originally made to help me work with AI coding assistants like Claude, but it might be useful for other purposes too. It handles file encoding, ignores binary files, and respects .gitignore patterns.
I had a positive experience coding with Claude, which inspired me to create this Node.js command. It aims to simplify the process of working with AI on larger codebases. Here's how it works:
Running the following command in the root of your repository will create a file called repomix-output.txt
:
$ npx repomix
You can then send this file to an AI assistant with a prompt like:
This file contains all the files in the repository combined into one. I want to refactor the code, so please review it first.
The AI can then understand the overall content and potentially assist with tasks like refactoring.
When you propose specific changes, the AI might be able to generate code accordingly. With features like Claude's Artifacts, you could potentially output multiple files, allowing for the generation of multiple interdependent pieces of code.
Features
This tool combines files under a specified folder into a single file.
https://github.com/yamadashy/repomix
While it handles details like ignoring .gitignore
targets and binary files, and includes phrases to help AI understand, what it does is simple: it outputs a file like this:
================================================================
REPOMIX OUTPUT FILE
================================================================
// Phrases to help AI understand
================
File: src/index.js
================
// File contents
================
File: src/utils.js
================
// File contents
Claude 3.5 Sonnet can understand this simple structure well, so it can easily handle refactoring across multiple files.
While the context length that AI services can process is limited, this should be sufficient for small-scale projects. (Please don't use it for work code!)
You can adjust ignore targets in the configuration file. Please check the GitHub repository for more details.
Background
I've been experimenting with Claude Pro recently, using the Projects feature to load multiple files and have it modify code. It generates code with about 60-95% accuracy, which I find quite efficient when used as a reference for rewriting code.
However, the Projects feature doesn't allow uploading folders or zip files directly. I considered using Cursor, but hesitated due to its pay-as-you-go API.
Then I saw this article and realized that Claude could understand even when everything was in a single file, which led me to create and publish this tool.
https://qiita.com/kunishou/items/ed097b46cd78030e0b29
About the Development
While it's not doing anything too complex, I hope it can be of some use 🙏
Output File Content
The output file begins with text explaining how to handle the file.
Initially, I used a simple structure with just file paths and contents, but Claude sometimes had trouble understanding it. By adding "File Purpose," "Format," "Handling Method," and "Other Notes" at the beginning, Claude could understand more accurately.
Token Optimization
For file separators, I referred to ChatGPT's Tokenizer and used 16 and 64 =
characters, which count as 1 token.
https://platform.openai.com/tokenizer
The token count might be different for ChatGPT-4 or Claude, but I aimed for something similar.
CLI Implementation
I used the commander
library to create the command-line tool. I could have used yargs, but I like commander's simple syntax and automatic help generation.
To get the tool's version, I read from package.json. To ensure compatibility with older Node.js versions, I use fs to read it directly instead of import. Also, since __dirname
isn't available in ES modules and import.meta.dirname
isn't supported in some versions, I use a slightly tricky method:
import * as url from "url";
const dirName = url.fileURLToPath(new URL(".", import.meta.url));
Publishing as an npm Package
To easily publish new versions to npm, I added these scripts to package.json:
"npm-publish": "npm run lint && npm run test-coverage && npm run build && npm publish",
"npm-release-patch": "npm version patch && npm run npm-publish",
"npm-release-minor": "npm version minor && npm run npm-publish",
"npm-release-prerelease": "npm version prerelease && npm run npm-publish"
The npm version <major|minor|patch>
command is surprisingly handy.
File Encoding
Since this tool shouldn't read binary files, I use is-binary-path to ignore binaries, and jschardet and iconv-lite to handle encoding properly.
File Ignoring
For ignoring specific files, the ignore package is useful. It makes it easy to implement .gitignore-like pattern filtering:
function createIgnoreFilter(patterns: string[]): (path: string) => boolean {
const ig = ignore.add(patterns);
return (filePath: string) => !ig.ignores(filePath);
}
Conclusion
This is a simple tool that I created because I needed it. While Claude will likely implement folder and zip upload features if we wait, I believe this tool can be used universally with the growing number of AI generation services.
I'd be happy to receive your feedback if you try it out!
Top comments (4)
Fantastic and thank you! I really like the project and your way of describing and documenting it here. Small note: I tried using it with line numbers enabled, but no line numbers seems to be added.
Thank you for your kind words and feedback, Stig! I'm glad you like the project.
I actually fixed the line numbers issue in version 0.1.27. If you're still not seeing them, try updating Repopack to the latest version.
Let me know if you need any further help!
man Im 33 and I dont know a lic of code, Ive been using claude and ive been trying to make something like this for a few weeks now :C fuck... Thank you though. Maybe my idea might progress farther now bc this right here, is THE key. On a 21:9 monitor ive had my Desktop fill up twice with the same project, different structures ranging from just Errors weird structures...just a bunch of BS honestly. Ive done construction my whole life with Computers being my Go to in solitude when im not working....so if Im right about what I think I know, what this is...then you Da man. PS. ive been having to give claude instructions "When re-writing code Include the full code for the file" "please revert to project knowledge when rewriting fullcode" "Stop burning so many goddamn token" :P Last thing. is there anyway i can ask just one question? im not sure how to message people on here or if that is even a thing. Its about one of the comments you had said not to do.
Love this idea. I’m going to be trying this on a current project and an ai automation workflow idea I have!