Low-power, always-listening, voice command detection and capture
US-2018174583-A1 · Jun 21, 2018 · US
US11164584B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11164584-B2 |
| Application number | US-201916563981-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 9, 2019 |
| Priority date | Oct 24, 2017 |
| Publication date | Nov 2, 2021 |
| Grant date | Nov 2, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for application awakening and speech recognition. Such system may comprise a microphone configured to record an audio in an audio queue. The system may further comprise a processor configured to monitor the audio queue for an awakening phrase, in response to detecting the awakening phrase, obtain an audio segment from the audio queue, and transmit the obtained audio segment to a server. The recording of the audio may be continuous from a beginning of the awakening phrase to an end of the audio segment.
Opening claim text (preview).
The invention claimed is: 1. A computing system, comprising: a microphone configured to record an audio in an audio queue; and a processor configured to: monitor the audio queue for an awakening phrase; in response to detecting the awakening phrase, obtain an audio segment from the audio queue by performing operations comprising: monitoring the audio queue for a first absence of voice activity, wherein the first absence of voice activity corresponds to a first-detected duration in the audio queue after the awakening phrase with no voice recorded and exceeding a first preset threshold; in response to detecting the first absence of voice activity exceeding the first preset threshold, monitoring the audio queue for a first presence of voice activity after the first absence of voice activity, wherein the first presence of voice activity corresponds to a first-detected duration with voice recorded in the audio queue after the first absence of voice activity; and in response to not detecting the first presence of voice activity within a second preset threshold from an end of the awakening phrase, obtaining the audio segment comprising at least a portion of the audio queue from the end of the awakening phrase to a start of the first absence of voice activity; and transmit the obtained audio segment to a server, wherein the recording of the audio is continuous from a beginning of the awakening phrase to an end of the audio segment. 2. The system of claim 1 , wherein: the system is implemented on a mobile device including a mobile phone; the server is caused to perform the speech recognition on the audio segment and return information to the mobile device based on the speech recognition. 3. The system of claim 2 , further comprising: a display configured to display the returned information, wherein the returned information comprises texts of a machine-recognized speech corresponding to the audio segment. 4. The system of claim 1 , wherein: the audio queue is associated with time; and to monitor the audio queue for the awakening phrase, the processor is configured to screen the recorded audio for a match with the awakening phrase. 5. The system of claim 4 , wherein: the recording of the audio in the audio queue is continuous throughout the detecting of the awakening phrase. 6. The system of claim 1 , wherein: the audio segment further comprises the awakening phrase. 7. The system of claim 1 , wherein: to obtain the audio segment from the audio queue in response to detecting the awakening phrase, the processor is further configured to: in response to detecting the first presence of voice activity within the second preset threshold from an end of the awakening phrase, monitor the audio queue for a second absence of voice activity, wherein the second absence of voice activity corresponds to a first-detected duration in the audio queue after the first presence of voice activity with no voice recorded and exceeding the first preset threshold; and in response to detecting the second absence of voice activity, obtain the audio segment comprising at least a portion of the audio queue from a start of the first presence of voice activity to an end of the first presence of voice activity. 8. The system of claim 7 , wherein: the first preset threshold is 700 milliseconds; and the second preset threshold is longer than the first preset threshold. 9. A method, comprising: recording an audio in an audio queue; and monitoring the audio queue for an awakening phrase; in response to detecting the awakening phrase, obtaining an audio segment from the audio queue, wherein the obtaining comprises: monitoring the audio queue for a first absence of voice activity, wherein the first absence of voice activity corresponds to a first-detected duration in the audio queue after the awakening phrase with no voice recorded and exceeding a first preset threshold; in response to detecting the first absence of voice activity exceeding the first preset threshold, monitoring the audio queue for a first presence of voice activity after the first absence of voice activity, wherein the first presence of voice activity corresponds to a first-detected duration with voice recorded in the audio queue after the first absence of voice activity; in response to detecting the first presence of voice activity within a second preset threshold from an end of the awakening phrase, monitoring the audio queue for a second absence of voice activity, wherein the second absence of voice activity corresponds to a first-detected duration in the audio queue after the first presence of voice activity with no voice recorded and exceeding the first preset threshold; and in response to detecting the second absence of voice activity, obtaining the audio segment comprising at least a portion of the audio queue from a start of the first presence of voice activity to an end of the first presence of voice activity; and transmitting the obtained audio segment to a server, wherein the recording of the audio is continuous from a beginning of the awakening phrase to an end of the audio segment. 10. The method of claim 9 , wherein: the method is implemented by a mobile device including a mobile phone; the server is caused to perform the speech recognition on the audio segment and return information to the mobile device based on the speech recognition. 11. The method of claim 10 , further comprising: displaying the returned information, wherein the returned information comprises texts of a machine-recognized speech corresponding to the audio segment. 12. The method of claim 9 , wherein: the audio queue is associated with time; and monitoring the audio queue for the awakening phrase comprises screening the recorded audio for a match with the awakening phrase. 13. The method of claim 12 , wherein the recording of the audio in the audio queue is continuous throughout the detecting of the awakening phrase. 14. The method of claim 9 , wherein: obtaining the audio segment from the audio queue in response to detecting the awakening phrase further comprises: in response to not detecting the first presence of voice activity within the second preset threshold from an end of the awakening phrase, obtaining the audio segment comprising at least a portion of the audio queue from the end of the awakening phrase to a start of the first absence of voice activity. 15. The method of claim 14 , wherein: the audio segment further comprises the awakening phrase. 16. The method of claim 9 , wherein: the first preset threshold is 700 milliseconds; and the second preset threshold is longer than the first preset threshold. 17. A non-transitory computer-readable medium, comprising instructions stored therein, wherein the instructions, when executed by one or more processors, cause the one or more processors to perform a method comprising: obtaining a recorded audio in an audio queue; and monitoring the audio queue for an awakening phrase; in response to detecting the awakening phrase, obtaining an audio segment from the audio queue, wherein the obtaining comprises: monitoring the audio queue for a first absence of voice activity, wherein the first absence of voice activity corresponds to a first-detected duration in the audio queue after the awakening phrase with no voice recorded and exceeding a first preset threshold; and in response to detecting the first absence of voice activity exceeding the first preset threshold, obtaining the audio segment comprising at least a portion of the audio queue from an end of
Word spotting · CPC title
Word boundary detection · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Execution procedure of a spoken command · CPC title
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.