Method and apparatus to provide comprehensive smart assistant services
US-12051410-B2 · Jul 30, 2024 · US
US12499892B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12499892-B2 |
| Application number | US-202418744076-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 14, 2024 |
| Priority date | Feb 8, 2018 |
| Publication date | Dec 16, 2025 |
| Grant date | Dec 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus supports smart assistant services with a plurality of smart service providers. The apparatus includes an audio device that receives a speech signal having a user utterance, captures the user utterance when the user utterance includes a user wake word, and sends the captured utterance to a backend computing device. The backend computing device replaces the user wake word with specific wake words associated with different smart service providers. The processed utterances are then sent to selected smart service providers. The backend computing device subsequently constructs feedback to the user utterance based on voice responses from the different smart service providers. The backend computing device then passes a digital representation of the feedback to the audio device, and the audio device converts the digital representation to an audio reply to the user utterance.
Opening claim text (preview).
What is claimed is: 1 . An apparatus comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, configure the apparatus to: detect a user trigger; capture, based on the detected user trigger, a user utterance from a user; include a first specific wake word in the captured user utterance to form a first processed utterance; send the first processed utterance to a first service provider; obtain, from the first service provider, a first response to the first processed utterance; and construct, based on data associated with a history of processed utterances, first feedback to the user utterance based on the first response; and generate, based on the first feedback, a reply to the user utterance. 2 . The apparatus of claim 1 , wherein the user trigger comprises one or more of a wake word, a sound, an audio condition, a hand gesture, a body gesture, a facial expression or biology signature. 3 . The apparatus of claim 1 , wherein the instructions, when executed by the one or more processors, further configure the apparatus to detect the user trigger based on a model trained on user input. 4 . The apparatus of claim 1 , wherein the data associated with the history of processed utterances comprises one or more of: scoring metrics indicative of accuracies of feedbacks to the processed utterances; a history of user utterances following processed utterances; or data indicating response speeds of a plurality of service providers to one or more previous captured user utterances. 5 . The apparatus of claim 1 , wherein the instructions, when executed by the one or more processors, further configure the apparatus to: include a second specific wake word, different from the first specific wake word, in the user utterance to form a second processed utterance; send the second processed utterance to a second service provider; obtain a second response from the second service provider, and construct the first feedback by combining, based on the data associated with the history of processed utterances, the first response and the second response. 6 . The apparatus of claim 1 , wherein the instructions, when executed by the one or more processors, further configure the apparatus to: include a second specific wake word, different from the first specific wake word, in the user utterance to form a second processed utterance; send the second processed utterance to a second service provider; obtain a second response from the second service provider, construct second feedback based on the second response; and update, based on the first feedback and on the second feedback, the data associated with the history of processed utterances. 7 . The apparatus of claim 1 , wherein the instructions, when executed by the one or more processors, further configure the apparatus to: obtain a scoring function by training on the data associated with the history of processed utterances, wherein the scoring function measures a probabilistic prediction accuracy; and construct the first feedback based on a scoring metric, obtained from the scoring function applied to the first response, satisfying a threshold. 8 . A method comprising: detecting, by a computing device via a sensor, a user trigger; capturing, based on the detected user trigger, a user utterance from a user; including a first specific wake word in the captured user utterance to form a first processed utterance; sending the first processed utterance to a first service provider; obtaining, from the first service provider, a first response to the first processed utterance; and constructing, based on data associated with a history of processed utterances, first feedback to the user utterance based on the first response; and generating, based on the first feedback, a reply to the user utterance. 9 . The method of claim 8 , wherein the user trigger comprises one or more of a wake word, a sound, an audio condition, a hand gesture, a body gesture, a facial expression or biology signature. 10 . The method of claim 8 , further comprising detecting the user trigger based on a model trained on user input. 11 . The method of claim 8 , wherein the data associated with the history of processed utterances comprises one or more of: scoring metrics indicative of accuracies of feedbacks to the processed utterances; a history of user utterances following processed utterances; or data indicating response speeds of a plurality of service providers to one or more previous captured user utterances. 12 . The method of claim 8 , further comprising: including a second specific wake word, different from the first specific wake word, in the user utterance to form a second processed utterance; sending the second processed utterance to a second service provider; obtaining a second response from the second service provider, and constructing the first feedback by combining, based on the data associated with the history of processed utterances, the first response and the second response. 13 . The method of claim 8 , further comprising including a second specific wake word, different from the first specific wake word, in the user utterance to form a second processed utterance; sending the second processed utterance to a second service provider; obtaining a second response from the second service provider, constructing second feedback based on the second response; and updating, based on the first feedback and on the second feedback, the data associated with the history of processed utterances. 14 . The method of claim 8 , further comprising: obtaining a scoring function by training on the data associated with the history of processed utterances, wherein the scoring function measures a probabilistic prediction accuracy; and constructing the first feedback based on a scoring metric, obtained from the scoring function applied to the first response, satisfying a threshold. 15 . A non-transitory computer readable medium storing instructions that, when executed, cause: detecting a user trigger; capturing, based on the detected user trigger, a user utterance from a user; including a first specific wake word in the captured user utterance to form a first processed utterance; sending the first processed utterance to a first service provider; obtaining, from the first service provider, a first response to the first processed utterance; and constructing, based on data associated with a history of processed utterances, first feedback to the user utterance based on the first response; and generating, based on the first feedback, a reply to the user utterance. 16 . The non-transitory computer readable medium of claim 15 , wherein the user trigger comprises one or more of a wake word, a sound, an audio condition, a hand gesture, a body gesture, a facial expression or biology signature. 17 . The non-transitory computer readable medium of claim 15 , further comprising detecting the user trigger based on a model trained on user input. 18 . The non-transitory computer readable medium of claim 15 , wherein the data associated with the history of processed utterances comprises one or more of: scoring metrics indicative of accuracies of feedbacks to the processed utterances; a history of user utterances following processed utterances; or data indicating response speeds of a plurality of service providers to one or more previous captured user utterances. 19 . The non-transitory computer readable med
Feedback of the input speech · CPC title
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Execution procedure of a spoken command · CPC title
Word spotting · CPC title
Speech classification or search · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.