llm-prepare converts complex project directory structures and files into a single flat file or set of flat files, facilitating processing using In-Context Learning (ICL) prompts.
This Node.js tool recursively scans a project directory based on provided arguments (at least a directory). It generates a simplified layout view that includes all directories and matching files. Additionally, it combines this layout view with the aggregated text content of the entire project. By default, the aggregated file content is stripped of comments and unnecessary whitespace. The tool supports output compression to reduce token use and can handle large projects by chunking the output. Example prompts are included for guidance.
In-Context Learning (ICL) enables a model to perform tasks by interpreting context provided within a prompt, eliminating the need for additional training or fine-tuning.
Learn more about In-Context Learning (ICL)
--config option to load a JSON configuration file containing pre-defined arguments and paths to include..ignore files to exclude specific files or directories.CREATE TABLE statement based on your provided CSV content.fs module, providing additional methods and promise support..ignore files similar to .gitignore.yargs.Before installing, ensure you have Node.js and npm (Node Package Manager) installed on your system. You can download and install Node.js from Node.js official website.
To install and use llm-prepare, follow these steps:
Clone the Repository: Begin by cloning the repository containing the llm-prepare to your local machine.
git clone https://github.com/samestrin/llm-prepare/Navigate to your project's root directory and run:
npm installTo make llm-prepare available from any location on your system, you need to install it globally. You can do this using npm.
Run the following command in your project directory:
npm linkThis will create a global symlink to your script. Now, you can run the script using llm-prepare from anywhere in your terminal.
The provided installation steps should work as-is for both macOS and Linux platforms.
For Windows, ensure that Node.js is added to your PATH during the installation. The npm link command should also work in Windows PowerShell or Command Prompt, allowing you to run the script globally.
To run the script, you need to provide one mandatory argument: the path to the project directory (--path-name or -p).
This will process all files in the specified project directory, respecting any .ignore files, and output the consolidated content and structure to your console.
Defaults the file pattern to "*"
llm-prepare -p "/path/to/project"or
llm-prepare --path "/path/to/project"This will process all JavaScript files in the specified project directory, respecting any .ignore files, and output the consolidated content and structure to your console.
llm-prepare -p "/path/to/project" -f "*.js"This will process all files in the specified project directory, respecting any .ignore files, and output the consolidated content and structure to output.txt.
llm-prepare -p "/path/to/project" -o "output.txt"If you don't specific a filename, this will process all files in the specified project directory, respecting any .ignore files, and output the consolidated content and structure to project.txt. The filename is auto-generated based on the top level directory in the path-name variable.
llm-prepare -p "/path/to/project" -oYou may optionally set the LLM_PREPARE_OUTPUT_DIR environment variable. If the LLM_PREPARE_OUTPUT_DIR environment variable is set, the output files are written to that directory.
If you don't want to include specific files or directories, you can specify the rules using --custom-ignore-string.
llm-prepare -p "/path/to/project" -o --custom-ignore-string "*docs*,*test*"If you don't want to include specific files or directories, you can specify the rules using an external and --custom-ignore-filename. Use .gitignore file formatting.
llm-prepare -p "/path/to/project" -o --custom-ignore-filename "/path/to/.ignorefile"If you don't want to include specific files or directories, you can specify the rules using an external and --custom-ignore-filename. Use .gitignore file formatting.
llm-prepare -p "/path/to/project" -o --custom-ignore-filename "/path/to/.ignorefile"You can use a JSON configuration file to predefine the arguments and paths to include in the processing.
Example config.json file:
{
"args": {
"output-filename": "output.txt",
"compress": true
},
"include": ["./src/", "./lib/"]
}To run the script with a config file:
llm-prepare -c "config.json" --help Show help [boolean]
-p, --path Path to the project directory[string] [required]
-f, --file-pattern Pattern of files to include, e.g., '.js$' or
'*' for all files [string] [default: "*"]
-o, --output-filename Output filename [string]
-i, --include-comments Include comments? (Default: false) [boolean]
-c, --compress Compress? (Default: false) [boolean]
--chunk-size Maximum size (in kilobytes) of each file[number]
-s, --suppress-layout Suppress layout in output (Default: false)
[boolean]
--default-ignore Use a custom default ignore file [string]
--ignore-gitignore Ignore .gitignore file in the root of the
project directory [boolean]
--show-default-ignore Show default ignore file [boolean]
--show-prompts Show example prompts in your browser [boolean]
--custom-ignore-string Comma-separated list of ignore patterns [string]
--custom-ignore-filename Path to a file containing ignore patterns
[string]
--config Path to the config file [string]
-v, --version Display the version number [boolean]"While finetuning with full datasets is still a powerful option if the data vastly exceeds the context length, our results suggest that long-context ICL is an effective alternative– trading finetuning-time cost for increased inference-time compute. As the effectiveness and efficiency of using very long model context lengths continues to increase, we believe long-context ICL will be a powerful tool for many tasks."
- Massive prompts can outperform fine-tuning for LLMs, researchers find
In-Context Learning (ICL) allows a Large Language Model (LLM) to perform tasks by interpreting the context provided within the prompt without additional training or fine-tuning. This approach differs significantly from previous methods where models were explicitly trained on a specific task using vast datasets. Instead, ICL leverages the model's pre-trained knowledge base—a comprehensive understanding accumulated during its initial extensive training phase.
As the token size—or the amount of data that an LLM can process and generate in a single instance—has dramatically increased, the value of ICL has become even more significant. This increase in token size allows LLMs to handle longer and more complex inputs and outputs, which enhances their ability to understand and generate sophisticated text.
In-Context Learning (ICL) prompts guide a large language model (LLM) in performing tasks by providing relevant context within the input prompt. These prompts typically include examples, instructions, or patterns that help the model understand how to generate appropriate responses. Here are some characteristics and examples of ICL prompts:
Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes or improvements.
This project is licensed under the MIT License - see the LICENSE file for details.