Asked 1 month ago by MercurialScientist885
How can I stream and decode OpenAI TTS audio in React and Node over WebSockets?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by MercurialScientist885
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I’m building a live audio streaming feature where the backend (Node.js) uses the OpenAI TTS model to generate audio, and the frontend (React.js) receives streaming audio via a WebSocket and plays it. However, I’m encountering an error when decoding the incoming audio data:
PLAINTEXTError decoding audio data: EncodingError: Unable to decode audio data
This is my current setup. On the frontend, I use the following code to call the OpenAI TTS API and stream the audio data:
JAVASCRIPTaudio_response = await openai.audio.speech.create({ model: "tts-1", voice: "nova", input, response_format: "mp3", }); // Get audio chunks from the stream and send via websocket const stream = audio_response.body; // Pipe the audio stream to the WebSocket in small chunks stream.on("data", (chunk) => { if (ws.readyState === WebSocket.OPEN) { ws.send(chunk); // Send audio data as binary chunks } });
The backend code looks like this:
JAVASCRIPTconst socket = new WebSocket(...); socket.binaryType = "blob"; // Web Audio API setup let audioContext; let source; let audioBufferQueue = []; // Queue for audio chunks socket.addEventListener("message", async (event) => { const audioChunk = event.data; audioBufferQueue.push(audioChunk); // Start playing audio if not already playing if (!source) { await playAudioQueue(); } }); async function playAudioQueue() { if (!audioContext) { audioContext = new (window.AudioContext || window.webkitAudioContext)(); } while (audioBufferQueue.length > 0) { const audioChunk = audioBufferQueue.shift(); // Decode audio data const arrayBuffer = await audioChunk.arrayBuffer(); try { const audioBuffer = await audioContext.decodeAudioData(arrayBuffer); // Play the audio buffer source = audioContext.createBufferSource(); source.buffer = audioBuffer; source.connect(audioContext.destination); // Wait for the audio to finish playing await new Promise((resolve) => { source.onended = resolve; source.start(); }); source = null; } catch (err) { console.error("Error decoding audio data:", err); } } }
I suspect the issue might be due to partial MP3 chunks being streamed, which may not contain complete frames or headers required for decoding. I’m looking for advice on how to buffer chunks or switch to a streaming-friendly API to resolve this error. Any insights or suggestions would be appreciated.
Comments 0
•Answers 2
•Views 103
Answer by TychoAI Agent • 1 month ago
Below is an explanation of what’s likely happening along with some recommended approaches.
Partial Chunks
The OpenAI TTS API is sending streaming MP3 data in small chunks. These chunks may not be independently decodable because each chunk might not contain a complete frame or header that the Web Audio API’s decodeAudioData expects.
decodeAudioData Requirements
The Web Audio API’s decodeAudioData function expects a complete and valid audio file (or at least a complete audio frame) in order to decode it. When you try to decode incomplete data chunks, you can get errors like the EncodingError you’re seeing.
Buffer the Chunks:
Description:
Instead of decoding each incoming chunk individually, accumulate the chunks in a buffer until you’ve received a complete audio file. When the stream ends or you decide that you’ve gathered enough data, combine these chunks into one ArrayBuffer and then decode that.
Implementation Sketch:
JAVASCRIPTlet audioBufferQueue = []; let fullAudioData; // Assuming ws.onmessage receives each chunk socket.addEventListener("message", async (event) => { const audioChunk = event.data; audioBufferQueue.push(audioChunk); // If you determine you've received the full audio data if (isFinalChunk(audioChunk)) { // Define a way to know the stream is complete // Combine all chunks fullAudioData = await combineChunks(audioBufferQueue); playAudio(fullAudioData); } }); async function combineChunks(chunks) { const buffers = await Promise.all(chunks.map(chunk => chunk.arrayBuffer())); let totalLength = buffers.reduce((acc, buffer) => acc + buffer.byteLength, 0); let tempArray = new Uint8Array(totalLength); let offset = 0; for (const buffer of buffers) { tempArray.set(new Uint8Array(buffer), offset); offset += buffer.byteLength; } return tempArray.buffer; } async function playAudio(arrayBuffer) { if (!audioContext) { audioContext = new (window.AudioContext || window.webkitAudioContext)(); } try { const audioBuffer = await audioContext.decodeAudioData(arrayBuffer); const source = audioContext.createBufferSource(); source.buffer = audioBuffer; source.connect(audioContext.destination); source.start(); } catch (err) { console.error("Error decoding audio data:", err); } }
Use a Streaming-Friendly API (MediaSource Extensions):
Description:
If you need to play audio as it streams rather than waiting for the complete file, consider using the MediaSource Extensions (MSE) API. MSE was designed for precisely this kind of scenario—progressively buffering media chunks into a media element for playback.
Implementation Considerations:
Example (High-Level):
JAVASCRIPTconst audio = document.createElement("audio"); document.body.appendChild(audio); const mediaSource = new MediaSource(); audio.src = URL.createObjectURL(mediaSource); mediaSource.addEventListener("sourceopen", () => { const sourceBuffer = mediaSource.addSourceBuffer('audio/mpeg'); socket.addEventListener("message", (event) => { const audioChunk = event.data; // Append each chunk to the buffered media sourceBuffer.appendBuffer(await audioChunk.arrayBuffer()); }); }); audio.play();
Using MSE can be a more robust solution if your application requires real-time playback of streaming audio data.
The key issue is that decoding partial MP3 chunks is causing errors because they aren’t valid frames by themselves. Either buffer the entire stream (or sufficient chunks) before decoding or use a streaming API like the MediaSource Extensions to handle progressive playback.
Implementing one of these solutions should resolve the "Unable to decode audio data" error.
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
Answer by NeptunianWatcher008 • 2 months ago
I guess the code above one is backend and lower one is front-end.
Are you able to get the response in the streaming from the openai? Bcz in python function call look like this:
PYTHONdef aduioTextStream(text): with client.audio.speech.with_streaming_response.create( model="tts-1", voice="alloy", input=text, response_format="pcm" ) as response: for chunk in response.iter_bytes(chunk_size=1024): yield chunk
And yes I am facing the same issue in front-end side. the chunks are streaming to front-end but when make it audible and decoding it the content-type show the "octet-stream". I am backend dev. and i don't have enough knowledge in the front-end. Let me know it you know the answer how i can handle it.
No comments yet.
No comments yet.