Bailing
Bailing is an open source voice conversation assistant designed to have natural conversations with users through voice. The project combines speech recognition (ASR), speech activity detection (VAD), large language model (LLM) and speech synthesis (TTS) technologies. This is a GPT-4o-like voice conversation robot implemented through ASR+LLM+TTS to provide a high-quality voice conversation experience with an end-to-end delay of 800ms. Bailing aims to achieve GPT-4o-like dialogue effects without the need for GPU, and is suitable for various edge devices and low-resource environments.

Project Features
- Efficient open source model : Bailing uses multiple open source models to ensure an efficient and reliable voice conversation experience.
- No GPU required : Optimized, locally deployable, and still provides performance performance like GPT-4.
- Modular design : ASR, VAD, LLM and TTS modules are independent of each other and can be replaced and upgraded according to requirements.
- Support memory function : It has the ability to learn continuously, can remember users' preferences and historical dialogues, and provides a personalized interactive experience.
- Support tool calls : Flexible integration of external tools, users can directly request information or perform operations through voice, improving the practicality of the assistant.
- Support task management : Efficiently manage user tasks, able to track progress, set reminders, and provide dynamic updates to ensure that users miss any important matters.
Project Introduction
Bailing realizes voice dialogue function through the following technical components:
- ASR : Use FunASR for automatic speech recognition to convert user's voice into text.
- VAD : Use silero-vad to perform voice activity detection to ensure that only valid voice segments are processed.
- LLM : Use deepseek as a large language model to process user input and generate responses, which is extremely cost-effective.
- TTS : Use edge-tts ChatTTS MacOS says for text-to-speech conversion, converting the generated text response into a natural and smooth speech.
Framework Description

Robot is responsible for efficient task management and memory management, and can intelligently handle user interrupt requests, while achieving seamless coordination and connection between various modules to ensure a smooth interactive experience.
| Player status | Whether to speak | illustrate |
|---|
| Playing | Not speaking | normal |
| Playing | say | Interrupt the scene |
| Not played | Not speaking | normal |
| Not played | say | VAD judgment, ASR recognition |
Demo
bailing audio dialog
bailing audio dialog
Functional Features
- Voice input : Accurate speech recognition through FunASR.
- Voice activity detection : Use silero-vad to filter invalid audio to improve recognition efficiency.
- Intelligent dialogue generation : Relying on the powerful language comprehension provided by deepseek, it generates natural text replies, which is extremely cost-effective.
- Voice output : Convert text into voice through edge-tts to provide users with realistic auditory feedback.
- Support interrupts : Flexible configuration of interrupt policies can be used to identify keyword and voice interrupts, ensuring instant feedback and control of users in conversations, and improving interaction fluency.
- Support memory function : It has the ability to learn continuously, can remember users' preferences and historical dialogues, and provides a personalized interactive experience.
- Support tool calls : Flexible integration of external tools, users can directly request information or perform operations through voice, improving the practicality of the assistant.
- Support task management : Efficiently manage user tasks, able to track progress, set reminders, and provide dynamic updates to ensure that users miss any important matters.
Project Advantages
- High-quality voice conversation : Integrate excellent ASR, LLM and TTS technologies to ensure the fluency and accuracy of voice conversations.
- Lightweight design : Runs without high-performance hardware, suitable for resource-constrained environments.
- Fully open source : Bailing is fully open source, encouraging community contribution and secondary development.
Installation and Run
Depend on the environment
Make sure that the following tools and libraries are installed in your development environment:
- Python 3.8 or later
-
pip Package Manager - FunASR, silero-vad, deepseek, edge-tts Dependency library required
Installation steps
Cloning the project warehouse:
git clone https://github.com/wwbin2017/bailing.git
cd bailing
Dependencies required for installation:
pip install -r requirements.txt
Configure environment variables:
- Open config/config.yaml to configure ASR LLM and other related configurations
- Download SenseVoiceSmall to the directory models/SenseVoiceSmall SenseVoiceSmall Download address
- Go to deepseek official website to get the configuration api_key, deepseek get the api_key, of course, you can also configure other models such as openai, qwen, gemini, 01yi, etc.
Run the project:
cd server
python server.py # 启动后端服务,也可不执行这一步
Instructions for use
- After starting the application, the system will wait for voice input.
- Convert user voice to text via FunASR.
- silero-vad performs voice activity detection to ensure that only valid voice is processed.
- deepseek processes text input and generates smart replies.
- edge-tts, ChatTTS, MacOs says converts the generated text into speech and plays it to the user.
Roadmap
In the future, Bailing will be sublimated into a JARVIS-like personal assistant, like a caring think tank with unparalleled memory and forward-looking task management capabilities. Relying on cutting-edge RAG and Agent technology, it will accurately control your affairs and knowledge and simplify the complex. Just a whisper, such as “Help me find recent news” or “Summarize the latest developments in the big model”, Bailing responds quickly, analyzes smartly, tracks in real time, and presents the results gracefully to you. Imagine that you have not only an assistant, but a smart partner who is well versed in your needs, accompanying you in every important moment in the future, helping you to see everything and win a thousand miles.
Supported tools
| Function name | describe | Function | Example |
|---|
get_weather | Get weather information for a location | After providing the location name, return to the weather conditions of the location | User said: "How is the weather in Hangzhou?" → zhejiang/hangzhou |
ielts_speaking_practice | IELTS oral practice | Generate IELTS oral exercise questions and dialogues to help users practice IELTS oral exercises | - |
get_day_of_week | Get the current day of the week or date | When the user asks for the current time, date or day of the week, the corresponding information is returned | User said: "What day of the week today?" → Return to the current day of the week |
schedule_task | Create a timed task | Users can specify the execution time and content of the task and remind users regularly | User said: “Remind me to drink water every morning at 8 o’clock.” → time: '08:00', content: '提醒我喝水' |
open_application | Open the specified application on your Mac | Users can specify the name of the application, and the script will launch the corresponding application on the Mac | The user said, "Open Safari." → application_name: 'Safari' |
web_search | Search for specified keywords online | Return the corresponding search results based on the search content provided by the user | User said: "Search for the latest tech news." → query: '最新的科技新闻' |
Contribution Guide
Any contributions are welcome! If you have any suggestions for improvements to the Bailing project or find any problems, please provide feedback or submit a Pull Request through GitHub Issues.
Open Source Protocol
The project is open sourced based on the MIT license. You are free to use, modify and distribute this project, but you need to retain the original license statement.
Contact information
If you have any questions or suggestions, please contact:
- GitHub Issues: Project Issue Tracking
Disclaimer
Bailing is an open source project designed for personal learning and research purposes. Please note the following disclaimer when using this project:
- Personal Use : This project is for personal study and research only and is not suitable for commercial use or production environments.
- Risks and Responsibility : Using Bailing may lead to data loss, system failure, or other problems. We are not responsible for any losses, damages or problems arising from the use of this project.
- Support : This project does not provide any technical support or warranty. Users shall bear the risks of using this project at their own risk.
Before using this project, make sure you understand and accept these disclaimers. If you do not agree to these terms, please do not use this project.
Thank you for your understanding and support!