Javacv uses ffmpeg to achieve synchronous audio and video playback

Author：Eve Cole Update Time：2025-07-17 14:00:04

Recently, the FFmpegFrameGrabber frame capturer using the ffmpeg package of javaCV has synchronized the captured audio frames and video frames. The synchronization method used is to synchronize video to audio.

Programs and source codes

The specific ideas are as follows:

(1) First introduce how ffmpeg captures the images and sounds of video files

 FFmpegFrameGrabber fg = new FFmpegFrameGrabber("a video file path or a url);

After obtaining the frame capture object, calling its grab() method will return the captured Frame object. This frame can be a video frame or an audio frame, because the audio and video frames are arranged first at the playback time according to the timestamp. Of course, the captured frames are all decoded and stored in the java.nio.Buffer object. For video frames, the Buffer is to store pixel data of the image, such as RGB, and then pass

 BufferedImage bi = (new Java2DFrameConverter()).getBufferedImage(f);

You can get the picture, and the obtained picture can be processed in a series of processes or displayed directly on the swing component without processing. Corresponding to the audio frame, the Buffer is the PCM data that stores the audio. This PCM can be float or short, and then the sourceDataLine.write method in java.sounds.sample can be used to write these audio PCM data to the speaker.

(2) Then introduce how to continuously play out the obtained frames. First, play the video separately:

 while(true) { Frame f = fg.grab(); if(f.image!=null) label.setIcon(new ImageIcon((new Java2DFrameConverter()).getBufferedImage(f))); Thread.sleep(1000/video frame rate); }

The same is true for playing audio separately, just write the data to the sound card. example

(3) Production and consumer model.

The above picture is a method implemented by the program. The captured frames are judged using the producer mode. If it is a video frame, it will be produced into the video FIFO. If it is an audio frame, it will be produced into the audio FIFO. Then the audio playback thread and the video playback thread consume the frames from their respective frame warehouses respectively. The production consumer mode is adopted because the speed of frame capture is greater than the consumption of frames, so we prefer to capture frames for buffering, or further preprocess the captured frames, while the video and audio playback threads only need to directly play and display the processed frames.

(4) Methods to realize audio and video synchronization: play all video frames in two frames of audio.

To achieve audio and video synchronization, you must have a frame time stamp. The frames captured here are only the playback timestamp PTS, and there is no decoded timestamp DTS, so we only need to decide on the playback timestamp based on the playback timestamp.

The implementation of the program is based on the above figure. When the audio thread starts to play audio frame A1, the setRun method of the video thread is called, and the timestamp curTime of the current audio frame to be played and the timestamp nextTime of the next frame audio frame A2 is passed to the video thread in the wait state. Then the video thread starts and starts to take out the video frame G1 from the video FIFO, and then calculates the time difference between G1 and A1 as the playback delay. After Thread.sleep(t1), the video thread displays the picture on the swing component, such as JLabel.setIcon(image). Then the video thread takes out another frame of image G2 and compares the timestamp of G2 with the timestamp of A2. If the timestamp of G2 is less than A2, the video thread continues to delay t2 and plays the G2 image. Then G3 is the same as it should, until G4 is obtained and A2 is compared to A2 and finds that the timestamp of G4 is greater than A2, then the video thread enters the wait state and waits for the next startup. Then after the audio thread plays the A1 audio frame, it takes out the audio frame A3 from the warehouse, then passes the timestamp of A2 and timestamps of A3 to the video thread, and then starts playing A2, and then the blocked video thread continues to play.

(5) Dynamically adjust the delay time

Since personal PCs are not real-time operating systems, that is, Thread.sleep is inaccurate and is restricted by the sound card to play sound, the above basic implementation ideas need to be improved. First of all, the java sourceDataLine method is to extract the data written by the audio thread from the internal buffer at a certain speed. If the data written by the audio is taken out, the audio playback will be stuttered. However, if too much audio data is written at one time, the audio and video may be out of synchronization. Therefore, it is necessary to ensure that the internal buffer of the sourceDataLine has certain data, otherwise it will cause lag, but the amount of data cannot be too much. Therefore, we adjust the sound playback from G3 to A2. Due to the inaccuracy of delay, the data written in A1 frame may be removed by the sound card before the time has reached t6. Therefore, after playing the G3 image, the sound thread will judge to judge based on the amount of data returned by sourceDataLine.available(). If the amount of data is about to be finished, the delay time t4 from G3 to A2 is reduced. This ensures that the data volume will not change to 0 and cause sound stuttering.

(6) The following is the result diagram of the program's test under Windows 64 and ubuntu14: The playback is relatively smooth, and synchronization is also possible, but if the playback is turned on, it will be stuck if Penguin writes code in IDE such as IDEA. After all, IDEA is also developed in Java, so the operation of IDEA will affect other Java programs, but other processes will not.

The above is all the content of this article. I hope it will be helpful to everyone's learning and I hope everyone will support Wulin.com more.