Context interpretation in natural language processing using previous dialog acts
US-2015340033-A1 · Nov 26, 2015 · US
US2021117681A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021117681-A1 |
| Application number | US-202017006339-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 28, 2020 |
| Priority date | Oct 18, 2019 |
| Publication date | Apr 22, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a method includes receiving, from a client system associated with a user, a user request comprising a reference to a target object, accessing visual data from the client system, wherein the visual data comprises images portraying the target object and one or more additional objects, and wherein attribute information of the target object is recorded in a multimodal dialog state, resolving the reference to the target object based on the attribute information recorded in the multimodal dialog state, determining relational information between the target object and one or more of the additional objects portrayed in the visual data, and sending, to the client system, instructions for presenting a response to the user request, wherein the response comprises the attribute information and the determined relational information.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving, from a client system associated with a user, a user request comprising a reference to a target object; accessing visual data from the client system, wherein the visual data comprises images portraying the target object and one or more additional objects, and wherein attribute information of the target object is recorded in a multimodal dialog state; resolving the reference to the target object based on the attribute information recorded in the multimodal dialog state; determining relational information between the target obj ect and one or more of the additional objects portrayed in the visual data; and sending, to the client system, instructions for presenting a response to the user request, wherein the response comprises the attribute information and the determined relational information. 2 . The method of claim 1 , wherein the attribute information comprises an identifier of the target object. 3 . The method of claim 1 , wherein the attribute information comprises a location of the target obj ect. 4 . The method of claim 1 , wherein the attribute information comprises a timestamp of an image of the visual data portraying the target object. 5 . The method of claim 1 , wherein the target object is an object that has been labeled as an object of significance. 6 . The method of claim 1 , further comprising: receiving the visual data from the client system; and storing the visual data in a data store. 7 . The method of claim 6 , further comprising: analyzing, by a computer vision module, the received visual data to identify the target object and the one or more additional objects; assigning respective object identifiers to the target object and one or more of the identified additional objects; and recording one or more of the object identifiers in the multimodal dialog state. 8 . The method of claim 7 , further comprising: recording the multimodal dialog state to a dialog state tracker, wherein the multimodal dialog state comprises one or more intents, slots, or relational information generated during a current session. 9 . The method of claim 1 , wherein each image of the plurality of images of the visual data is associated with a respective timestamp, wherein one or more of the images are associated with the target object. 10 . The method of claim 9 , further comprising: selecting, from among the one or more images associated with the target object, a first image having a most recent timestamp with respect to a time associated with the user request. 11 . The method of claim 10 , further comprising: analyzing, by a computer vision module, the first image to identify the target object and the one or more additional objects; and processing, by a scene understanding engine, the first image to generate the relational information between the target object and one or more of the additional objects. 12 . The method of claim 11 , further comprising: passing the first image, the attribute information, and one or more object identifiers of one or more additional objects associated with the first image to the scene understanding engine. 13 . The method of claim 10 , further comprising: recording, in the multimodal dialog state, a timestamp and location information associated with the first image. 14 . The method of claim 13 , further comprising: receiving an additional plurality of images, wherein each of the additional images is associated with the target object and a respective additional timestamp; and selecting, from among the additional images, a second image having a most recent timestamp. 15 . The method of claim 14 , further comprising: updating the multimodal dialog state to replace information of the first image with information of the second image. 16 . The method of claim 1 , further comprising: generating, by a scene understanding engine, the relational information in response to receiving the user request. 17 . The method of claim 1 , wherein the response comprises visual information indicating the relational information. 18 . The method of claim 1 , further comprising: receiving, from the client system, a subsequent user request for additional relational information associated with the target object; and generating, by a scene understanding engine, the additional relational information. 19 . One or more computer-readable non-transitory storage media embodying software that is operable when executed to: receive, from a client system associated with a user, a user request referencing a target object; access visual data from the client system, wherein the visual data comprises images portraying the target object and one or more additional objects, and wherein attribute information of the target object is recorded in a multimodal dialog state; resolve the reference to the target object based on the attribute information recorded in the multimodal dialog state; determine relational information between the target object and one or more of the additional objects portrayed in the visual data; and send, to the client system, instructions for presenting a response to the user request, wherein the response comprises the attribute information and the determined relational information. 20 . A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: receive, from a client system associated with a user, a user request referencing a target object; access visual data from the client system, wherein the visual data comprises images portraying the target object and one or more additional objects, and wherein attribute information of the target object is recorded in a multimodal dialog state; resolve the reference to the target object based on the attribute information recorded in the multimodal dialog state; determine relational information between the target object and one or more of the additional objects portrayed in the visual data; and send, to the client system, instructions for presenting a response to the user request, wherein the response comprises the attribute information and the determined relational information.
Business processes related to social networking or social networking services · CPC title
using neural networks · CPC title
using classification, e.g. of video objects · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Facial expression recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.