Environment aware voice-assistant devices, and related systems and methods

US11501758B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11501758-B2
Application numberUS-202016988052-A
CountryUS
Kind codeB2
Filing dateAug 7, 2020
Priority dateSep 27, 2019
Publication dateNov 15, 2022
Grant dateNov 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An appliance can include a microphone transducer, a processor, and a memory storing instructions. The appliance is configured to receive an audio signal at the microphone transducer and to detect an utterance in the audio signal. The appliance is further configured to classify a speech mode based on the utterance. The appliance is further configured to determine conditions of an environment of the appliance. The appliance is further configured to select at least one of a playback volume or a speech output mode from a plurality of speech output modes based on the classification, and the conditions of the environment of the appliance. The appliance is further configured to adapt the playback volume and/or mode of played-back speech according to the speech output mode. The appliance may be configured to synthesize speech according to the speech output mode, or to modify synthesized speech according to the speech output mode.

First claim

Opening claim text (preview).

We currently claim: 1. An appliance comprising a microphone transducer, a processor, and a memory storing instructions that, when executed by the processor, cause the appliance to: receive an audio signal at the microphone transducer; detect an utterance in the audio signal; classify a speech mode based on the utterance; determine one or more cues, wherein each cue corresponds to a condition of an environment of the appliance; determine a speech output mode based on the classification, a first weight corresponding to the classification, the one or more cues, and a second weight corresponding to the one or more cues, the second weight and the first weight configured to weight the one or more cues more heavily than the classification; and output synthesized speech according to the determined speech output mode. 2. The appliance of claim 1 , wherein instructions that cause the appliance to classify the speech mode comprise instructions that cause the appliance to classify the utterance according to at least one of: a pitch, number of formants, spectral tilt, speech rate, direction of arrival and energy content. 3. The appliance of claim 2 , wherein the instructions cause the appliance to classify the speech mode of the utterance as a whisper mode, a normal mode, or a Lombard effect mode according to one or more characteristics of the utterance. 4. The appliance of claim 3 , wherein the one or more characteristics of the utterance comprise at least one of: a pitch, an energy content, a number of formant, a spectral tilt, a speech rate, or a combination thereof. 5. The appliance of claim 1 , further comprising instructions that, when executed by the processor, cause the appliance to select a playback volume according to the classified speech mode, the one or more cues, or a combination thereof, and to output the synthesized speech at the selected playback volume. 6. The appliance of claim 1 , wherein the instructions to select a speech output mode further comprise instructions to select one or more speech synthesis parameters, the memory further comprising instructions that, when executed by the processor, cause the appliance to: generate synthesized speech according to a speech synthesis model and the selected one or more speech synthesis parameters. 7. The appliance of claim 1 , wherein the instructions to select a speech output mode further comprise instructions to select a speech synthesis model from a plurality of speech synthesis models, the memory further comprising instructions that, when executed by the processor, cause the appliance to generate synthesized speech according to the selected speech synthesis model. 8. The appliance of claim 1 , wherein the instructions to select a speech output mode further comprise instructions to select one or more speech modification parameters based on the classified speech mode, and the one or more cues; and the memory further comprising instructions that, when executed by the processor, cause the appliance to modify synthesized speech according to the one or more speech modification parameters and to output the modified synthesized speech. 9. The appliance of claim 1 , wherein selected speech output mode corresponds to a different speech mode than the speech mode of the utterance. 10. The appliance of claim 1 , wherein the instructions to determine one or more cues comprise instructions to determine one or more acoustic cues from the audio signal, the one or more acoustic cues comprising at least one of: background noise received by the microphone transducer, a direct to reverberant ratio, a signal to noise ratio, an echo coupling residual, a distance to a speaker of the utterance, a quality measure of the utterance, or a combination thereof. 11. The appliance of claim 1 , wherein the instructions to determine one or more cues comprise instructions to determine one or more non-acoustic cues comprising at least one of: a time of day, a location type, an appliance mode, a user profile, a location layout, an acoustic profile of a location, or a combination thereof. 12. The appliance of claim 1 , wherein the instructions that cause the appliance to output the synthesized speech comprise instructions to process synthesized speech for output by a loudspeaker. 13. The appliance of claim 1 , further comprising instructions that, when executed by the processor, cause the appliance to receive the audio signal from the microphone transducer, process the audio signal, request speech recognition on the processed audio signal, and receive text, wherein the received text corresponds to a response generated based on recognized speech. 14. The appliance of claim 1 , wherein the speech output mode includes at least one of a pitch, a rate, an energy, or a spectral tilt selected for intelligibility in at least one of the conditions of the environment of the appliance. 15. An audio appliance, comprising: an audio acquisition module, comprising a microphone transducer, configured to receive sound; a speech classifier configured to detect an utterance in the sound and to classify a speech mode based on the utterance; a decision component configured to determine one or more cues, each cue corresponding to an observed condition of an environment of the audio appliance, and to select a speech output mode from a plurality of speech output modes based on the classification of the speech mode, a first weight corresponding to the classification, the one or more cues, and a second weight corresponding to the one or more cues, the second weight and the first weight configured to weight the one or more cues more heavily than the classification; and an output component configured to output synthesized speech according to the speech output mode. 16. The audio appliance of claim 15 , further comprising: a speech synthesizer comprising a speech synthesis model, and configured to receive text and to generate synthesized speech from the text with the speech synthesis model according to the speech output mode. 17. The audio appliance of claim 16 , wherein the decision component is configured to select one or more speech synthesis parameters corresponding to the speech output mode, and wherein the speech synthesizer is configured to generate the synthesized speech according to the speech synthesis model and the selected one or more speech synthesis parameters. 18. The audio appliance of claim 16 , wherein the decision component is configured to select a speech synthesis model from a plurality of speech synthesis models corresponding to the speech output mode, and wherein the speech synthesizer is configured to generate the synthesized speech according to the selected speech synthesis model. 19. The audio appliance of claim 15 , wherein the decision component is configured to select a playback volume, one or more speech modification parameters, or both, corresponding to the speech output mode, the audio appliance further comprising a speech modification component configured to modify synthesized speech according to the playback volume, the selected one or more speech modification parameters, or both, prior to outputting the modified synthesized speech. 20. The audio appliance of claim 15 , wherein the speech classifier is configured to classify the utterance as a whisper mode, a normal mode, or a Lombard-effect mode according to at least one of a pitch, a number of formants, spectral tilt, speech rate, energy content. 21. A method for improving intelligibility of synthesized speech comprising: recei

Assignees

Inventors

Classifications

  • for comparison or discrimination · CPC title

  • 2D or 3D arrays of transducers · CPC title

  • for correcting frequency response · CPC title

  • Voice editing, e.g. manipulating the voice of the synthesiser · CPC title

  • G10L25/63Primary

    for estimating an emotional state · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11501758B2 cover?
An appliance can include a microphone transducer, a processor, and a memory storing instructions. The appliance is configured to receive an audio signal at the microphone transducer and to detect an utterance in the audio signal. The appliance is further configured to classify a speech mode based on the utterance. The appliance is further configured to determine conditions of an environment of …
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/63. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).