A bro who codes with you



Why always Python ? Let's finetune a code generation model for TypeScript only !
I created a TypeScript-Instruct 20K dataset. It's 20,000 pairs of {instruction, output} that you can't find in any current Code Generation LLMs dataset (or maybe you can)

For the output, thank you HuggingFace, I get TypeScript code data from The Stack project
For the instruction, thank you OpenAI, I made 20K API call request to generate instruction and explanation for those code data
Every other things about training (parameters, logs, ...) you can see it in here (link the HuggingFace Training Metrics link later)
I use the MultiPL-E benchmark (Cassano et al., 2023) just like the base model Code Llama using in their paper
(link the evaluaion result table later)
You can find my works here:
Or contact me here: https://levuminhhuy.site/about