Roadmap / encodec.cpp / ggml
Inference of SunoAI's bark model in pure C/C++.
With bark.cpp, our goal is to bring real-time realistic multilingual text-to-speech generation to the community.
Models supported
Models we want to implement! Please open a PR :)
Demo on Google Colab (#95)
Here is a typical run using bark.cpp:
./main -p "This is an audio generated by bark.cpp"
__ __
/ /_ ____ ______/ /__ _________ ____
/ __ / __ `/ ___/ //_/ / ___/ __ / __
/ /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ /
/_.___/__,_/_/ /_/|_| (_) ___/ .___/ .___/
/_/ /_/
bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222
Generating semantic tokens: 17%
bark_print_statistics: sample time = 10.98 ms / 138 tokens
bark_print_statistics: predict time = 614.96 ms / 4.46 ms per token
bark_print_statistics: total time = 633.54 ms
Generating coarse tokens: 100%
bark_print_statistics: sample time = 3.75 ms / 410 tokens
bark_print_statistics: predict time = 3263.17 ms / 7.96 ms per token
bark_print_statistics: total time = 3274.00 ms
Generating fine tokens: 100%
bark_print_statistics: sample time = 38.82 ms / 6144 tokens
bark_print_statistics: predict time = 4729.86 ms / 0.77 ms per token
bark_print_statistics: total time = 4772.92 ms
write_wav_on_disk: Number of frames written = 65600.
main: load time = 324.14 ms
main: eval time = 8806.57 ms
main: total time = 9131.68 msHere is a video of Bark running on the iPhone:
Here are the steps to use Bark.cpp
git clone --recursive https://github.com/PABannier/bark.cpp.git
cd bark.cpp
git submodule update --init --recursiveIn order to build bark.cpp you must use CMake:
mkdir build
cd build
# To enable nvidia gpu, use the following option
# cmake -DGGML_CUBLAS=ON ..
cmake ..
cmake --build . --config Release# Install Python dependencies
python3 -m pip install -r requirements.txt
# Download the Bark checkpoints and vocabulary
python3 download_weights.py --out-dir ./models --models bark-small bark
# Convert the model to ggml format
python3 convert.py --dir-model ./models/bark-small --use-f16
# run the inference
./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4Weights can be quantized using the following strategy: q4_0, q4_1, q5_0, q5_1, q8_0.
Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models.
./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0bark.cpp is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be
bark.cpp. Don't hesitate to report it on the issue section.