The error
Could not deserialize speech context. websocket error code: 1007
usually indicates that some properties in the speech context (the JSON that’s sent to the service) are malformed or not recognized. In the case of pronunciation assessment, a common pitfall is providing configuration values as plain strings rather than using the SDK’s provided enumeration values.
Below are a few adjustments you should make:
1. Use Enumerations for Configuration
Instead of passing string values like "HundredMark"
and "Phoneme"
, use the SDK-provided enums. For example:
var pronunciationAssessmentConfig = new window.SpeechSDK.PronunciationAssessmentConfig(
"My voice is my passport, verify me.",
window.SpeechSDK.PronunciationAssessmentGradingSystem.HundredMark,
window.SpeechSDK.PronunciationAssessmentGranularity.Phoneme
);
If you need to assess a different reference text, just change it in the first parameter.
2. Use the Correct AudioConfig Input
The sample solution in the documentation demonstrates using a push stream instead of feeding a WAV file directly. While using a WAV file should work, if the service expects a streaming input for pronunciation context, you might get errors. For testing, try switching to a push stream:
// Example: configuring a push stream input
var pushStream = window.SpeechSDK.AudioInputStream.createPushStream();
var audioConfig = window.SpeechSDK.AudioConfig.fromStreamInput(pushStream);
Then push your WAV file’s data into the stream.
3. Apply the Pronunciation Assessment Config Once
Make sure you apply the pronunciation assessment configuration to your recognizer only once before starting recognition:
pronunciationAssessmentConfig.applyTo(speechRecognizer);
A redundant call may not cause issues in every case but can be confusing when troubleshooting.
Final Revised Code Example
Below is a sample implementation combining these suggestions:
assessPronunciation(fileUrl) {
const speechConfig = window.SpeechSDK.SpeechConfig.fromSubscription("xxx", "westeurope");
speechConfig.speechRecognitionLanguage = "en-GB";
// Fetch the WAV file and create a push stream
fetch(fileUrl)
.then(response => response.arrayBuffer())
.then(arrayBuffer => {
// Create a push stream input and push binary data into it
var pushStream = window.SpeechSDK.AudioInputStream.createPushStream();
pushStream.write(arrayBuffer);
pushStream.close();
var audioConfig = window.SpeechSDK.AudioConfig.fromStreamInput(pushStream);
// Use enumerations for grading system and granularity.
var pronunciationAssessmentConfig = new window.SpeechSDK.PronunciationAssessmentConfig(
"My voice is my passport, verify me.",
window.SpeechSDK.PronunciationAssessmentGradingSystem.HundredMark,
window.SpeechSDK.PronunciationAssessmentGranularity.Phoneme
);
var speechRecognizer = new window.SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
// Apply pronunciation assessment configuration only once.
pronunciationAssessmentConfig.applyTo(speechRecognizer);
speechRecognizer.sessionStarted = (s, e) => {
console.log(`SESSION ID: ${e.sessionId}`);
};
speechRecognizer.recognizeOnceAsync(
function(speechRecognitionResult) {
if (speechRecognitionResult.reason === window.SpeechSDK.ResultReason.RecognizedSpeech) {
var pronunciationAssessmentResult = window.SpeechSDK.PronunciationAssessmentResult.fromResult(speechRecognitionResult);
console.log("pronunciationAssessmentResult", pronunciationAssessmentResult);
var pronunciationAssessmentResultJson = speechRecognitionResult.properties.getProperty(window.SpeechSDK.PropertyId.SpeechServiceResponse_JsonResult);
console.log("pronunciationAssessmentResultJson", pronunciationAssessmentResultJson);
} else {
console.error("Speech not recognized. Reason:", speechRecognitionResult);
}
},
function(error) {
console.error("Error during recognition:", error);
if (error instanceof window.SpeechSDK.SpeechRecognitionCanceledEventArgs) {
console.error("Recognition canceled. Reason:", error.reason);
console.error("Error details:", error.errorDetails);
}
}
);
})
.catch(error => {
console.error("Error fetching WAV file:", error);
});
}
Summary
- Use SDK Enumerations: Replace string values with
window.SpeechSDK.PronunciationAssessmentGradingSystem.HundredMark
and window.SpeechSDK.PronunciationAssessmentGranularity.Phoneme
.
- Audio Configuration: Consider using a push stream for the audio, as the sample solution does.
- Single Application of Config: Apply the pronunciation assessment configuration once before recognition.
These changes should help resolve the deserialization error. Happy coding!