Customizing a voice-based interface using surrounding factors
US-11114089-B2 · Sep 7, 2021 · US
US11501758B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11501758-B2 |
| Application number | US-202016988052-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 7, 2020 |
| Priority date | Sep 27, 2019 |
| Publication date | Nov 15, 2022 |
| Grant date | Nov 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An appliance can include a microphone transducer, a processor, and a memory storing instructions. The appliance is configured to receive an audio signal at the microphone transducer and to detect an utterance in the audio signal. The appliance is further configured to classify a speech mode based on the utterance. The appliance is further configured to determine conditions of an environment of the appliance. The appliance is further configured to select at least one of a playback volume or a speech output mode from a plurality of speech output modes based on the classification, and the conditions of the environment of the appliance. The appliance is further configured to adapt the playback volume and/or mode of played-back speech according to the speech output mode. The appliance may be configured to synthesize speech according to the speech output mode, or to modify synthesized speech according to the speech output mode.
Opening claim text (preview).
We currently claim: 1. An appliance comprising a microphone transducer, a processor, and a memory storing instructions that, when executed by the processor, cause the appliance to: receive an audio signal at the microphone transducer; detect an utterance in the audio signal; classify a speech mode based on the utterance; determine one or more cues, wherein each cue corresponds to a condition of an environment of the appliance; determine a speech output mode based on the classification, a first weight corresponding to the classification, the one or more cues, and a second weight corresponding to the one or more cues, the second weight and the first weight configured to weight the one or more cues more heavily than the classification; and output synthesized speech according to the determined speech output mode. 2. The appliance of claim 1 , wherein instructions that cause the appliance to classify the speech mode comprise instructions that cause the appliance to classify the utterance according to at least one of: a pitch, number of formants, spectral tilt, speech rate, direction of arrival and energy content. 3. The appliance of claim 2 , wherein the instructions cause the appliance to classify the speech mode of the utterance as a whisper mode, a normal mode, or a Lombard effect mode according to one or more characteristics of the utterance. 4. The appliance of claim 3 , wherein the one or more characteristics of the utterance comprise at least one of: a pitch, an energy content, a number of formant, a spectral tilt, a speech rate, or a combination thereof. 5. The appliance of claim 1 , further comprising instructions that, when executed by the processor, cause the appliance to select a playback volume according to the classified speech mode, the one or more cues, or a combination thereof, and to output the synthesized speech at the selected playback volume. 6. The appliance of claim 1 , wherein the instructions to select a speech output mode further comprise instructions to select one or more speech synthesis parameters, the memory further comprising instructions that, when executed by the processor, cause the appliance to: generate synthesized speech according to a speech synthesis model and the selected one or more speech synthesis parameters. 7. The appliance of claim 1 , wherein the instructions to select a speech output mode further comprise instructions to select a speech synthesis model from a plurality of speech synthesis models, the memory further comprising instructions that, when executed by the processor, cause the appliance to generate synthesized speech according to the selected speech synthesis model. 8. The appliance of claim 1 , wherein the instructions to select a speech output mode further comprise instructions to select one or more speech modification parameters based on the classified speech mode, and the one or more cues; and the memory further comprising instructions that, when executed by the processor, cause the appliance to modify synthesized speech according to the one or more speech modification parameters and to output the modified synthesized speech. 9. The appliance of claim 1 , wherein selected speech output mode corresponds to a different speech mode than the speech mode of the utterance. 10. The appliance of claim 1 , wherein the instructions to determine one or more cues comprise instructions to determine one or more acoustic cues from the audio signal, the one or more acoustic cues comprising at least one of: background noise received by the microphone transducer, a direct to reverberant ratio, a signal to noise ratio, an echo coupling residual, a distance to a speaker of the utterance, a quality measure of the utterance, or a combination thereof. 11. The appliance of claim 1 , wherein the instructions to determine one or more cues comprise instructions to determine one or more non-acoustic cues comprising at least one of: a time of day, a location type, an appliance mode, a user profile, a location layout, an acoustic profile of a location, or a combination thereof. 12. The appliance of claim 1 , wherein the instructions that cause the appliance to output the synthesized speech comprise instructions to process synthesized speech for output by a loudspeaker. 13. The appliance of claim 1 , further comprising instructions that, when executed by the processor, cause the appliance to receive the audio signal from the microphone transducer, process the audio signal, request speech recognition on the processed audio signal, and receive text, wherein the received text corresponds to a response generated based on recognized speech. 14. The appliance of claim 1 , wherein the speech output mode includes at least one of a pitch, a rate, an energy, or a spectral tilt selected for intelligibility in at least one of the conditions of the environment of the appliance. 15. An audio appliance, comprising: an audio acquisition module, comprising a microphone transducer, configured to receive sound; a speech classifier configured to detect an utterance in the sound and to classify a speech mode based on the utterance; a decision component configured to determine one or more cues, each cue corresponding to an observed condition of an environment of the audio appliance, and to select a speech output mode from a plurality of speech output modes based on the classification of the speech mode, a first weight corresponding to the classification, the one or more cues, and a second weight corresponding to the one or more cues, the second weight and the first weight configured to weight the one or more cues more heavily than the classification; and an output component configured to output synthesized speech according to the speech output mode. 16. The audio appliance of claim 15 , further comprising: a speech synthesizer comprising a speech synthesis model, and configured to receive text and to generate synthesized speech from the text with the speech synthesis model according to the speech output mode. 17. The audio appliance of claim 16 , wherein the decision component is configured to select one or more speech synthesis parameters corresponding to the speech output mode, and wherein the speech synthesizer is configured to generate the synthesized speech according to the speech synthesis model and the selected one or more speech synthesis parameters. 18. The audio appliance of claim 16 , wherein the decision component is configured to select a speech synthesis model from a plurality of speech synthesis models corresponding to the speech output mode, and wherein the speech synthesizer is configured to generate the synthesized speech according to the selected speech synthesis model. 19. The audio appliance of claim 15 , wherein the decision component is configured to select a playback volume, one or more speech modification parameters, or both, corresponding to the speech output mode, the audio appliance further comprising a speech modification component configured to modify synthesized speech according to the playback volume, the selected one or more speech modification parameters, or both, prior to outputting the modified synthesized speech. 20. The audio appliance of claim 15 , wherein the speech classifier is configured to classify the utterance as a whisper mode, a normal mode, or a Lombard-effect mode according to at least one of a pitch, a number of formants, spectral tilt, speech rate, energy content. 21. A method for improving intelligibility of synthesized speech comprising: recei
for comparison or discrimination · CPC title
2D or 3D arrays of transducers · CPC title
for correcting frequency response · CPC title
Voice editing, e.g. manipulating the voice of the synthesiser · CPC title
for estimating an emotional state · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.