You may have heard of the famous Neurosama, or Mu Jimeng from China. Do you also want to have your own AI virtual image to accompany you to live broadcast, chat, and play games? The open source Zerolan Live Robot is working to realize your dreams! And this only requires a consumer graphics card!
Zerolan Live Robot is a multi-functional live broadcast robot (AI VTuber), which can automatically read barrage in the Bilibili live broadcast room, observe the designated windows of the computer screen, understand its screen content, manipulate game characters in Minecraft, and make emotional voice chat responses.
Its associated projects KonekoMinecraftBot, zerolan-core, zerolan-data, zerolan-ui.
Note
This project is under continuous development, the current version is 2.0 . You can follow the developer's Bilibili account Akagawa Tsurumi_Channel. It is training AI cat girl based on this project, and it will broadcast the latest progress from time to time.
The following briefly lists what this project supports:
| Support items | Supported content |
|---|---|
| Live broadcast platform | Bilibili | Twitch |
| Big language model | THUDM/GLM-4 | THUDM/ChatGLM3 | Qwen/Qwen-7B-Chat | 01ai/Yi-6B-Chat | augmxnt/shisa-7b-v1 |
| Automatic speech recognition model | iic/speech_paraformer_asr |
| Speech synthesis model | RVC-Boss/GPT-SoVITS |
| Image subtitle model | Salesforce/blip-image-captioning-large |
| Optical character recognition model | paddlepaddle/PaddleOCR |
| Video subtitle model | iic/multi-modal_hitea_video-captioning_base_en |
| External callable tools | Firefox browser, Baidu Encyclopedia, Mengniang Encyclopedia |
| Game plug-in | Minecraft |
Caution
Zerolan Live Robot 2.0 version is incompatible with older versions 1.0, so you may need to reconfigure the environment and install dependencies.
The Zerolan framework consists of Zerolan Live Robot, Zerolan Core, Zerolan Data, and Zerolan UI. The following table briefly describes the uses of each project:
| Project name | use |
|---|---|
| Zerolan Live Robot | The control framework of the live broadcast robot makes action responses through collecting environmental data and comprehensive analysis. |
| Zerolan Core | The core modules that provide AI inference services for live broadcast robots, such as the service-based Web API of large language models. |
| Zerolan Data | Defines the data format for exchange between services using network requests. |
| Zerolan UI | The GUI interface based on PyQT6 includes top pop-up windows and prompt sounds, etc. |
Important
This step is a must !
Please move here to complete the deployment of Zerolan Core, which relies heavily on this core service.
Run the command, which creates a virtual environment and activates it, and then automatically installs the dependency packages required by this project:
conda create --name ZerolanLiveRobot python=3.10
conda activate ZerolanLiveRobot
pip install -r requirements.txt If you are in the dev development branch, you may need to install it manually:
pip install git+https://github.com/AkagawaTsurunaki/zerolan-ui.git@dev
pip install git+https://github.com/AkagawaTsurunaki/zerolan-data.git@dev Find the resources/config.template.yaml configuration file, change it to config.yaml , and then modify it to the configuration you need according to the comments in the configuration file.
In the pipeline configuration item, you need to note that server_url should contain the protocol, IP and port number, such as http://127.0.0.1:11001 , https://myserver.com:11451 , etc. This is the network address where you deploy Zerolan Core. Each type of model may have a different port.
Tip
Can the server only have one port? Then try to forward your request using Nginx.
In the service configuration item, you need to note that host should only include the IP address, and port should only include the port number.
game.platform field supports minecraft , and the live_stream field supports bilibili , twitch , and youtube .
Tip
Obtain the documentation that may be used by the live broadcast platform API Key:
Bilibili: Get the information required for the Credential class
Twitch: Twitch Developers - Authentication
Youtube: Obtaining authorization credentials
The value of character.chat.filter.strategy can be default .
character.chat.filter.bad_words can fill in a series of filter words.
character.chat.injected_history The array must be of an even number, that is, it must be the end of the message that the AI responds to.
character.chat.max_history specifies how many messages are retained at most, that is, the size of the message window.
character.speech.prompts_dir indicates where your TTS audio files are stored, and your file name should be in the format of [语言][情感标签]文本内容.wav . For example [zh][开心]哇!今天真是一个好天气.wav , where "language" only supports zh , en , and ja ; "emotional tags" are arbitrary, as long as the large language model can be distinguished; "text content" is the text content represented by the vocals in this audio.
Caution
There may be a memory leak in the Microsoft Edge browser, so this project is not supported.
The optional value of external_tool.browser.driver is firefox .
external_tool.browser.profile_dir is to ensure that under the control of Selenium, your account login and other information will not be lost. Leaving a blank program will automatically detect the location (but it does not mean that it will definitely be found).
Tip
It is recommended to use API testing tools such as Postman before starting to test whether the connection between the computer running this project and Zerolan Core is normal. Zerolan Live Robot provides some advice when pipeline connection errors, which still require you to troubleshoot manually.
Use the following command to run the main program of Zerolan Live Robot:
python main.pyNote
This step is optional .
This project and KonekoMinecraftBot implement a set of interfaces that can control robots in Minecraft games from this project. If you need it, please move here to view details.
The older version of Zerolan Live Robot 1.0 used a simple polling by second to read environment information from cache lists in each service module. In the older version of Zerolan Live Robot 2.0, it was turned to an event-driven design pattern.
In this project, the robot runs during the sending and processing of a series of events. In other words, without an event, the robot will not respond.
Each event Event contains an event name, which is essentially a string. All event names used in this project are defined in common.enumerator.EventEnum , and you can also expand and add your own event names. Let's take the event of processing user input voice as an example, its event is called EventEnum.SERVICE_VAD_SPEECH_CHUNK .
emitter is a global object used to handle event sending and listener execution. emitter always has the main thread. However, multiple threads will run at the same time during the entire system running, because each thread may have its own instance of EventEmitter.
Use the decorator @emitter.on(EventEnum.某个事件) to quickly register a listener. The listener can be either a synchronous function or an asynchronous function. When we need to send an event, we can use the asynchronous method emitter.emit(EventEnum.某个事件, *args, **kwargs) .
For example, when the system detects a human voice, the SERVICE_VAD_SPEECH_CHUNK event will be sent, and all listeners that register this event will be called to perform some processing:
@ emitter . on ( EventEnum . SERVICE_VAD_SPEECH_CHUNK )
async def on_service_vad_speech_chunk ( speech : bytes , channels : int , sample_rate : int ):
response = ... # 假设这里获得了语音识别的结果
await emitter . emit ( EventEnum . PIPELINE_ASR , response ) # 发送自动语音识别事件The listener here is on_service_vad_speech_chunk , which is essentially a function that will be called when SERVICE_VAD_SPEECH_CHUNK occurs and accepts several parameters. The parameters here are completely specified by the event sender.
Pipeline is an important implementation of communication with Zerolan Core. The use of pipelines is very simple. You only need to pass in a configuration object to get an available pipeline object. Then call the predict or stream_predict method in the pipeline object to use the AI model in Zerolan Core.
Taking the large language model as an example, specify the address of the target server (the address of your Zerolan Core open port), pass in LLMPipelineConfig object to LLMPipeline to establish the pipeline.
config = LLMPipelineConfig ( server_url = "..." )
llm = LLMPipeline ( config )
query = LLMQuery ( text = "你好,你叫什么名字?" , history = [])
prediction = llm . predict ( query )
print ( prediction . response )This should get a reply from the model.
If you want to know more implementation details, you can check the data definition in Zerolan Data, which may also need to be understood in combination with the implementation of the pipeline and the contents in the app.py file in Zerolan Core. Simply put, they are all HTTP-based.
| Module | effect | Supported content |
|---|---|---|
| browser | Selenium-based browser control | Firefox's browser open, search and close browser |
| device | Microphone, screenshot, speaker control | Tested on Windows only |
| filter | Dialogue Blocker | Simple matching filter |
| game | Game Control Plugin | See KonekoMinecraftBot for details |
| live_stream | Barrage reading of live broadcast platform | Bilibili, Twitch, Youtube |
| vad | Human voice audio detection | Audio detection mechanism based on energy threshold |
After startup, the log shows "In its context, the requested address is invalid."
Solution: Check whether the configuration of host is correct in the configuration file. If you want native access only, specify '127.0.0.1' .
This project uses MIT License, please do not use this software for illegal purposes.
Feel free to enjoy open-souce!
MIT License
Copyright (c) 2024 AkagawaTsurunaki
Email : [email protected]
Github : AkagawaTsurunaki
Bilibili : Akagawa Tsurumi_Channel