Targeted Content During Media Downtimes
US-2018124438-A1 · May 3, 2018 · US
US11966986B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11966986-B2 |
| Application number | US-202217878778-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2022 |
| Priority date | Oct 18, 2019 |
| Publication date | Apr 23, 2024 |
| Grant date | Apr 23, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a method includes receiving, at a client system, an audio input, where the audio input comprises a coreference to a target object, accessing visual data from one or more camera associated with the client system, where the visual data comprises images portraying one or more objects, resolving the coreference to the target object from among the one or more objects, resoling the target object to a specific entity, and providing, at the client system, a response to the audio input, where the response comprises information about the specific entity.
Opening claim text (preview).
What is claimed is: 1. A method comprising, by a client system: receiving, at the client system, a user query comprising an audio input from a user, wherein the audio input comprises a coreference to a target object, and wherein the user is associated with a current context; accessing, responsive to receiving the audio input from the user, visual data from one or more camera associated with the client system; analyzing, by a scene understanding engine, the visual data to identify a plurality of objects portrayed in the visual data; resolving the coreference to the target object from among the plurality of objects by identifying the target object from among the plurality of identified objects portrayed in the visual data; resolving the target object to a specific entity from a plurality of selected entities corresponding to the plurality of objects, wherein the plurality of selected entities are selected based on a respective recency of each of the selected entities and a respective correlation of each of the selected entities to the current context associated with the user; and providing, at the client system, a response to the user query, wherein the response comprises information about the specific entity, and wherein the response is in one or more modalities determined based on device capabilities of the client system. 2. The method of claim 1 , further comprising: accessing a knowledge graph; and retrieving attribute information about the specific entity from the knowledge graph. 3. The method of claim 1 , further comprising: analyzing, by a computer vision module, the visual data to identify the plurality of objects portrayed in the images; parsing, by a natural-language understanding (NLU) module, an intent of the audio input and the coreference to the target object to one of the plurality of objects portrayed in the images; and updating a dialog state to include the identified objects and the coreference to the target object. 4. The method of claim 3 , further comprising: classifying the intent of one or more requests from one or more pre-defined taxonomies of semantic intentions; and generating a confidence score corresponding to the intent of the one or more requests. 5. The method of claim 3 , further comprising: receiving, at the client system, gesture or gaze information; and updating the dialog state to include the received gesture or gaze information. 6. The method of claim 3 , wherein resolving the coreference to the target object from among the plurality of objects comprises combining additional information with the dialog state. 7. The method of claim 1 , further comprising: assigning respective object identifiers to the plurality of objects portrayed in the images; and storing one or more of the object identifiers as entities in a dialog state tracker. 8. The method of claim 1 , wherein one or more of the objects portrayed in the images data are virtual objects in a virtual reality environment. 9. The method of claim 1 , wherein identifying the target object from among the identified objects is based on its position within a field of view of the visual data. 10. One or more computer-readable non-transitory storage media comprising instructions executable by a processor to: receive, at a client system, a user query comprising an audio input from a user, wherein the audio input comprises a coreference to a target object, and wherein the user is associated with a current context; access, responsive to receiving the audio input from the user, visual data from one or more camera associated with the client system; analyze, by a scene understanding engine, the visual data to identify a plurality of objects portrayed in the visual data; resolve the coreference to the target object from among the plurality of objects by identifying the target object from among the plurality of identified objects portrayed in the visual data; resolve the target object to a specific entity from a plurality of selected entities corresponding to the plurality of objects, wherein the plurality of selected entities are selected based on a respective recency of each of the selected entities and a respective correlation of each of the selected entities to the current context associated with the user; and provide, at the client system, a response to the user query, wherein the response comprises information about the specific entity, and wherein the response is in one or more modalities determined based on device capabilities of the client system. 11. The media of claim 10 , wherein the instructions are further executable by the processor to: access a knowledge graph; and retrieve attribute information about the specific entity from the knowledge graph. 12. The media of claim 10 , wherein the instructions are further executable by the processor to: analyze, by a computer vision module, the visual data to identify the plurality of objects portrayed in the images; parse, by a natural-language understanding (NLU) module, an intent of the audio input and the coreference to the target object to one of the plurality of objects portrayed in the images; and update a dialog state to include the identified objects and the coreferences to the target object. 13. The media of claim 12 , wherein the instructions are further executable by the processor to: classify the intent of one or more requests from one or more pre-defined taxonomies of semantic intentions; and generate a confidence score corresponding to the intent of the one or more requests. 14. The media of claim 12 , wherein the instructions are further executable by the processor to: receive, at the client system, gesture or gaze information; and update the dialog state to include the received gesture or gaze information. 15. The media of claim 12 , wherein resolving the coreference to the target object from among the plurality of objects comprises combining additional information with the dialog state. 16. The media of claim 10 , wherein the instructions are further executable by the processor to: assign respective object identifiers to the plurality of objects portrayed in the images; and store one or more of the object identifiers as entities in a dialog state tracker. 17. The media of claim 10 , wherein one or more of the objects portrayed in the images data are virtual objects in a virtual reality environment. 18. A client system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: receive, at the client system, a user query comprising an audio input from a user, wherein the audio input comprises a coreference to a target object, and wherein the user is associated with a current context; access, responsive to receiving the audio input from the user, visual data from one or more camera associated with the client system; analyze, by a scene understanding engine, the visual data to identify a plurality of objects portrayed in the visual data; resolve the coreference to the target object from among the plurality of objects by identifying the target object from among the plurality of identified objects portrayed in the visual data; resolve the target object to a specific entity from a plurality of selected entities corresponding to the plurality of objects, wherein the plurality of selected entities are selected based on a respective recency of each of the selected entities and a respective correlation of each of the sele
Business processes related to social networking or social networking services · CPC title
Supervised learning · CPC title
Distributed learning, e.g. federated learning · CPC title
Calendar-based scheduling for persons or groups · CPC title
Execution procedure of a spoken command · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.