Groq launches whisper-large-v3 model, supports speech transcription and translation, free and open

Author：Eve Cole Update Time：2025-02-25 02:50:02

Groq has launched its latest Whisper Large-V3 model, which provides speech transcription and translation capabilities through Playground and API, supporting rapid transcription and translation of multiple languages into English. Its Playground provides a free online experience and the transcription speed is extremely fast. It only takes a few seconds to complete the transcription of a 4 minute and 30 second video. Groq also provides an API interface compatible with OpenAI, making it easy for users to integrate it into their own applications. It is very convenient to develop intelligent assistants or automated translation systems.

Groq has recently launched the Whisper Large-V3 model. Users can use the API in the Playground or local projects to implement speech transcription and translation functions. This model supports transcription in multiple languages, the transcription speed is extremely fast, and it supports translation of other languages into English.

Playground link: https://console.groq.com/playground

Currently, users can experience and use this feature for free on Playground. It only takes about 3 seconds to transcribe a 4 minute and 30 second video. At the same time, Groq also provides an API interface that users can integrate and use in local projects.

The interface design of Whisper API follows the compatibility standard with OpenAI, providing users with access to two core functions: speech to text and speech translation. Users can easily integrate these functions into their own applications and enjoy a convenient development experience whether they are developing intelligent assistants or automated translation systems.

In terms of performance, Whisper API adopts the advanced "whisper-large-v3" model to ensure top performance in speech-to-text and translation tasks.

In addition, the API also has clear support standards for the format and size of audio files, including common formats such as mp3, mp4, wav, etc., but the file size is required not to exceed 25MB. Of particular note is that for files containing multiple audio tracks, the Whisper API will only process the first audio track, which requires the user to perform appropriate audio pre-processing before uploading.

In order to improve the quality and efficiency of transcription, the Whisper API will downsample the audio on the server side to 16,000Hz mono. Groq recommends users complete this pre-processing step on the client, which not only helps reduce file size, but also allows longer audio files to be uploaded and processed.

API interface:

Speech to text: https://api.groq.com/openai/v1/audio/transcriptions

Voice translation: https://api.groq.com/openai/v1/audio/translations

All in all, Groq's Whisper Large-V3 model and its API provide an efficient and easy-to-integrate speech transcription and translation solution. Its excellent performance and convenient interface will bring great convenience to developers. Welcome to visit Playground to experience and explore its potential in different application scenarios.