Shopping recommendation method, client, and server
US-11361364-B2 · Jun 14, 2022 · US
US12236950B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12236950-B2 |
| Application number | US-202318149181-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 3, 2023 |
| Priority date | Mar 18, 2020 |
| Publication date | Feb 25, 2025 |
| Grant date | Feb 25, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: generating first output audio using a loudspeaker associated with a device; receiving first audio data; processing the first audio data using a first component of the device to determine that the first audio data represents first speech; in response to determining that the first speech is represented in the first audio data, performing a first action; determining, by a natural language processing component, first natural language processing data associated with the first speech; providing the first audio data and the first natural language processing data as inputs to a machine learning component, the machine learning component being configured to classify input data as corresponding to a device-directed speech event; determining, using the machine learning component, that the first audio data and the first natural language processing data correspond to a first device-directed speech event; and based at least in part on the first audio data and the first natural language processing data corresponding to the first device-directed speech event, causing natural language processing to be completed based on the first audio data. 2. The computer-implemented method of claim 1 , further comprising: detecting an endpoint of the first speech represented in the first audio data, wherein determining that the first audio data and the first natural language processing data correspond to the first device-directed speech event occurs after detection of the endpoint. 3. The computer-implemented method of claim 1 , further comprising: determining, using a wakeword detection component, an indicator that the first speech includes a wakeword; and providing the indicator as an input to the machine learning component together with the first audio data and the first natural language processing data. 4. The computer-implemented method of claim 1 , further comprising: processing, by the natural language processing component, a first portion of the first audio data to determine the first natural language processing data, wherein the first natural language processing data corresponds to the first portion of the first audio data. 5. The computer-implemented method of claim 1 , further comprising: detecting an endpoint of the first speech represented in the first audio data, wherein determining the first natural language processing data comprises determining the first natural language processing data corresponding to an entirety of the first speech. 6. The computer-implemented method of claim 1 , further comprising: processing, by a wakeword detection component, the first audio data; and failing to detect, by the wakeword detection component, a representation of a wakeword in the first audio data. 7. The computer-implemented method of claim 1 , wherein the first component comprises a wakeword detection component and the method further comprises: processing, by the wakeword detection component, the first audio data to determine a representation of a wakeword in the first audio data. 8. The computer-implemented method of claim 1 , wherein performing the first action comprises: presenting, by the device, a visual output corresponding to an indication that natural language processing is occurring. 9. The computer-implemented method of claim 1 , wherein performing the first action comprises: reducing a volume level of the first output audio. 10. The computer-implemented method of claim 1 , further comprising: after determination that the first audio data and the first natural language processing data correspond to the first device-directed speech event, discontinuing generating the first output audio using the loudspeaker of the device. 11. A system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: generate first output audio using a loudspeaker associated with a device; receive first audio data; process the first audio data using a first component of the device to determine that the first audio data represents first speech; in response to determination that the first speech is represented in the first audio data, performing a first action; determine, by a natural language processing component, first natural language processing data associated with the first speech, wherein the first natural language processing data corresponds to a representation of the first speech; provide the first audio data and the first natural language processing data as inputs to a machine learning component, the machine learning component being configured to classify input data as corresponding to a device-directed speech event; determine, using the machine learning component, that the first audio data and the first natural language processing data correspond to a first device-directed speech event; and based at least in part on the first audio data and the first natural language processing data corresponding to the first device-directed speech event, cause natural language processing to be completed based on the first audio data, wherein the natural language processing includes determining an intent associated with the first speech. 12. The system of claim 11 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: detect an endpoint of the first speech represented in the first audio data, wherein determination that the first audio data and the first natural language processing data correspond to the first device-directed speech event occurs after detection of the endpoint. 13. The system of claim 11 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine, using a wakeword detection component, an indicator that the first speech includes a wakeword; and provide the indicator as an input to the machine learning component together with the first audio data and the first natural language processing data. 14. The system of claim 11 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process, by the natural language processing component, a first portion of the first audio data to determine the first natural language processing data, wherein the first natural language processing data corresponds to the first portion of the first audio data. 15. The system of claim 11 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: detect an endpoint of the first speech represented in the first audio data, wherein the first natural language processing data corresponds to an entirety of the first speech. 16. The system of claim 11 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process, by a wakeword detection component, the first audio data; and fail to detect, by the wakeword detection component, a representation of a wakeword in the first audio data. 17. The system of claim 11 , wherein the first component comprises a wakeword detection component and wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process, by the wakeword detection component, the first audio data
of application context · CPC title
Execution procedure of a spoken command · CPC title
Word spotting · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.