Environment aware voice-assistant devices, and related systems and methods
US-2021097980-A1 · Apr 1, 2021 · US
US12315490B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12315490-B2 |
| Application number | US-202117565826-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 30, 2021 |
| Priority date | Dec 31, 2020 |
| Publication date | May 27, 2025 |
| Grant date | May 27, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates generally to speech processing. Humans change their speech patterns in noisy environments. The systems and devices described herein can compensate for noisy environments to be more human-like. Thus, the configurations and implementations herein can determine a sound profile for the sound environment where the user is listening. Based on the sound profile, the devices can determine a transform to apply to output speech from the device. This transform is applied to the wake word, speech recognition, and to the output speech to compensate for the noise level of the environment by mimicking the Lombard effect.
Opening claim text (preview).
What is claimed is: 1. A method comprising: providing, by a media delivery system, a first audio representation of a first simulated sound environment; receiving, by the media delivery system, first speech from a user speaking subject to audio playout of in the first simulated sound environment; providing, by the media delivery system, a second audio representation of a second simulated sound environment, wherein the second simulated sound environment has different acoustic characteristics than the first simulated sound environment; receiving, by the media delivery system, second speech from the user speaking subject to audio playout of the second simulated sound environment; determining a change in a speech component between the first speech and the second speech; and based on the change in the speech component, creating a transform to adjust the speech component. 2. The method of claim 1 , wherein the change in the speech component is associated with one or more of a change to a phoneme, in a speed of the phoneme, in a duration of the phoneme, to a separation between phonemes, to a separation between morphemes, in a pitch of the phoneme, in a frequency range for the phoneme, or in a pause between words. 3. The method of claim 2 , wherein the change in the speech component mimics the Lombard Effect. 4. The method of claim 2 , wherein a first phoneme is pronounced in a first frequency range and a second phoneme is pronounced in a second frequency range, and wherein the change to the speech component involves a first change to the first phoneme that is different than a second change to the second phoneme based on a difference between the first frequency range and the second frequency range. 5. The method of claim 1 , further comprising: assigning a desired voice; receiving a request from the user; determining a current sound environment for the user; determining text to output to the user in response to the request; synthesizing the text, by Text-To-Speech (TTS), to create speech output; applying the transform to the speech output; and playing the transformed speech output. 6. The method of claim 5 , wherein the request from the user is a wake word, the method further comprising: retrieving the transform; and adjusting a reception of the wake word based on the transform. 7. The method of claim 5 , further comprising: receiving third speech from the user in the request; retrieving the transform; and adjusting a speech recognition of the third speech based on the transform. 8. The method of claim 1 , wherein the transform is associated with the second simulated sound environment. 9. The method of claim 1 , wherein the first simulated sound environment is quieter than the first simulated sound environment. 10. The method of claim 9 , wherein the first simulated sound environment has a sound level of 45 dB or less, and wherein the second simulated sound environment has a sound level of over 50 dB. 11. The method of claim 1 , wherein the transform is applied to sounds in a partial band of frequencies. 12. A system comprising: a memory; and a processing unit coupled to the memory, wherein the processing unit is operative to: provide, by a media delivery system, a first audio representation of a first simulated sound environment; receive, by the media delivery system, first speech from a user speaking subject to audio playout of the first simulated sound environment; provide, by the media delivery system, a second audio representation of a second simulated sound environment, wherein the second simulated sound environment has different acoustic characteristics than the first simulated sound environment; receive, by the media delivery system, second speech from the user speaking subject to audio playout of the second simulated sound environment; determine a change in a speech component between the first speech and the second speech; and based on the change in the speech component, create a transform to adjust the speech component. 13. The system of claim 12 , wherein the change in the speech component is associated with one or more of a change to a phoneme, in a speed of the phoneme, in a duration of the phoneme, to a separation between phonemes, to a separation between morphemes, in a pitch of the phoneme, in a frequency range for the phoneme, and wherein the change in the speech component mimics the Lombard Effect. 14. The system of claim 12 , the processing unit further operative to: assign a desired voice; receive a request from the user; determine a current sound environment for the user; determine text to output to the user in response to the request; synthesize the text, by Text-To-Speech (TTS), to create speech output; apply the transform to the speech output; and play the transformed speech output. 15. The system of claim 14 , wherein the request from the user is a wake word, the processing unit further operative to: retrieve the transform; and adjust a reception of the wake word based on the transform. 16. The system of claim 14 , the processing unit further operative to: receive third speech from the user in the request; retrieve the transform; and adjust a speech recognition of the third speech based on the transform. 17. A method comprising: determining, by a media-playback device, a sound environment from received background noise; selecting a sound profile with similar audio characteristics as the received background noise, wherein the sound profile is associated with a transform for speech; determining speech output; applying the transform to the speech output to create transformed speech; and playing, by the media-playback device, the transformed speech. 18. The method of claim 17 , wherein a characteristic associated with the transform comprises one or more of a change to a phoneme, in a speed of the phoneme, in a duration of the phoneme, to a separation between phonemes, to a separation between morphemes, in a pause between words, in a pitch of the phoneme, in a frequency range for the phoneme, and wherein the transform mimics the Lombard effect. 19. The method of claim 17 , wherein a user can understand the transformed speech in the sound environment without changing a volume of the media-playback device. 20. The method of claim 17 , wherein determining the sound environment comprises determining a dB (A) or a dB (C).
Speech classification or search · CPC title
for discriminating voice from noise · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Word spotting · CPC title
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.