
Felafax is a framework for continued-training and fine-tuning open source LLMs using XLA runtime. We take care of necessary runtime setup and provide a Jupyter notebook out-of-box to just get started.
Our goal at felafax is to build infra to make it easier to run AI workloads on non-NVIDIA hardware (TPU, AWS Trainium, AMD GPUs, and Intel GPUs).
Add your dataset, click "Run All", and you'll run on free TPU resource on Google Colab!
| Felafax supports | Free Notebooks |
|---|---|
| Llama 3.1 (1B, 3B) |
|
LLaMa-3.1 JAX Implementation
LLaMa-3/3.1 PyTorch XLA
Get started with fine-tuning your models using the Felafax CLI in a few simple steps.
Start off by installing the CLI.
pip install pipx
pipx install felafax-cliThen, generate an Auth Token:
Finally, authenticate your CLI session using your token:
felafax-cli auth login --token <your_token>First, generate a default configuration file for fine-tuning. This command generates a config.yml file in the current directory with default hyperparameter values.
felafax-cli tune init-configSecond, update the config file with your hyperparameters:
HuggingFace knobs:
Dataset pipeline and training params:
batch_size, max_seq_length to use for fine-tuning dataset.null if you want trainig to run through entire dataset. If num_steps is set to a number, training will stop after the specified number of steps.learning_rate and lora_rank to use for fine-tuning.eval_interval is the number of steps between evaluations.Run the follow command to see the list of base models you can fine-tune, we support all variants of LLaMA-3.1 as of now.
felafax-cli tune start --helpNow, you can start the fine-tuning process with your selected model from above list and dataset name from HuggingFace (like yahma/alpaca-cleaned):
felafax-cli tune start --model <your_selected_model> --config ./config.yml --hf-dataset-id <your_hf_dataset_name>Example command to get you started:
felafax-cli tune start --model llama3-2-1b --config ./config.yml --hf-dataset-id yahma/alpaca-cleanedAfter you start the fine-tuning job, Felafax CLI takes care of spinning up the TPUs, running the training, and it uploads the fine-tuned model to the HuggingFace Hub.
You can stream realtime logs to monitor the progress of your fine-tuning job:
# Use `<job_name>` with the job namethat you get after starting the fine-tuning.
felafax-cli tune logs --job-id <job_name> -fAfter fine-tuning is complete, you can list all your fine-tuned models:
felafax-cli model listYou can start an interactive terminal session to chat with your fine-tuned model:
# Replace `<model_id>` with model id from `model list` command you ran above.
felafax-cli model chat --model-id <model_id>The CLI is broken into three main command groups:
tune: To start/stop fine-tuning jobs.model: To manage and interact with your fine-tuned models.files: To upload/view yourdataset files.Use the --help flag to discover more about any command group:
felafax-cli tune --helpWe recently fine-tuned the llama3.1 405B model on 8xAMD MI300x GPUs using JAX instead of PyTorch. JAX's advanced sharding APIs allowed us to achieve great performance. Check out our blog post to learn about the setup and the sharding tricks we used.
We did LoRA fine-tuning with all model weights and lora parameters in bfloat16 precision, and with LoRA rank of 8 and LoRA alpha of 16:
The GPU utilization and VRAM utilization graphs can be found below. However, we still need to calculate the Model FLOPs Utilization (MFU). Note: We couldn't run the JIT-compiled version of the 405B model due to infrastructure and VRAM constraints (we need to investigate this further). The entire training run was executed in JAX eager mode, so there is significant potential for performance improvements.
If you have any questions, please contact us at [email protected].