Goal-driven human-machine interaction architecture, and systems and methods of use thereof
US-2024403772-A1 · Dec 5, 2024 · US
US2026024236A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2026024236-A1 |
| Application number | US-202418775219-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 17, 2024 |
| Priority date | Jul 17, 2024 |
| Publication date | Jan 22, 2026 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus comprises at least one processing device configured to generate, using a first machine learning model, a first data structure comprising input representations of one or more input components from an augmented reality environment. The at least one processing device is also configured to generate, using a second machine learning model that takes as input at least a portion of the first data structure, a second data structure comprising at least one vector representation characterizing relevance of one or more of the input representations in the first data structure. The at least one processing device is further configured to generate, using a third machine learning model that takes as input at least a portion of the first data structure and at least a portion of the second data structure, an output response, and to present the output response to a user in the augmented reality environment.
Opening claim text (preview).
What is claimed is: 1 . An apparatus comprising: at least one processing device comprising a processor coupled to a memory; the at least one processing device being configured: to generate, using a first machine learning model, a first data structure comprising input representations of one or more input components from an augmented reality environment; to generate, using a second machine learning model that takes as input at least a portion of the first data structure, a second data structure comprising at least one vector representation characterizing relevance of one or more of the input representations in the first data structure; to generate, using a third machine learning model that takes as input at least a portion of the first data structure and at least a portion of the second data structure, an output response; and to present the output response to a user in the augmented reality environment. 2 . The apparatus of claim 1 wherein the one or more input components comprise: (i) a user prompt received from the user in the augmented reality environment; (ii) a history of interactions associated with the user in the augmented reality environment; and (iii) visual and spatial information associated with the user in the augmented reality environment. 3 . The apparatus of claim 2 wherein the first machine learning model used to generate the first data structure comprises one or more large language models. 4 . The apparatus of claim 3 wherein the one or more large language models comprise: at least a first text-based large language model configured for generating input representations of the (i) the user prompt received from the user in the augmented reality environment and (ii) the history of interactions associated with the user in the augmented reality environment; and at least a second vision-based large language model configured for generating input representations of (iii) the visual and spatial information associated with the user in the augmented reality environment. 5 . The apparatus of claim 1 wherein the second machine learning model comprises a Continuous Attention Memory Model (CAMM). 6 . The apparatus of claim 5 wherein the CAMM comprises: a continuous attention mechanism configured to compute attention weights between the input representations of the first data structure and a query vector; a dynamic memory bank configured to store and update information from the input representations of the first data structure as memory items, each of the memory items comprising a vector representation encoding information from at least one of the input representations of the first data structure; and a context relevance estimator configured to rank the memory items according to a relevance to a current context of the augmented reality environment. 7 . The apparatus of claim 6 wherein the query vector is initialized randomly and updated iteratively utilizing a gradient descent algorithm. 8 . The apparatus of claim 6 wherein the continuous attention mechanism is configured to utilize a dot product to compute the attention weights and a sigmoid function. 9 . The apparatus of claim 6 wherein the dynamic memory bank comprises a set of memory slots, each of the memory slots comprising at least one of the memory items, the dynamic memory bank having a fixed size of memory items and being configured to store the memory items in a chronological order utilizing a first-in, first-out policy for replacing memory items. 10 . The apparatus of claim 6 wherein the context relevance estimator comprises a feed-forward neural network configured to compute relevance scores for the memory items in the dynamic memory bank. 11 . The apparatus of claim 1 wherein the third machine learning model comprises one or more large language models conditioned on said at least a portion of the second data structure. 12 . The apparatus of claim 11 wherein the one or more large language models further incorporates visual and spatial information from the augmented reality environment for customizing the output response based at least in part on a view of the user in the augmented reality environment. 13 . The apparatus of claim 1 wherein the at least one processing device is further configured to update at least one of the first machine learning model, the second machine learning model and the third machine learning model according to one or more user preferences of the user in the augmented reality environment. 14 . The apparatus of claim 13 wherein the one or more user preferences of the user in the augmented reality environment are determined based at least in part on at least one of: sentiment analysis extracting emotions from text or speech of the user captured in the augmented reality environment; facial expression recognition to detect emotions from one or more images of the user captured in the augmented reality environment; and reinforcement learning to learn from rewards or penalties determined from user interaction in the augmented reality environment. 15 . A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device: to generate, using a first machine learning model, a first data structure comprising input representations of one or more input components from an augmented reality environment; to generate, using a second machine learning model that takes as input at least a portion of the first data structure, a second data structure comprising at least one vector representation characterizing relevance of one or more of the input representations in the first data structure; to generate, using a third machine learning model that takes as input at least a portion of the first data structure and at least a portion of the second data structure, an output response; and to present the output response to a user in the augmented reality environment. 16 . The computer program product of claim 15 wherein the one or more input components comprise: (i) a user prompt received from the user in the augmented reality environment; (ii) a history of interactions associated with the user in the augmented reality environment; and (iii) visual and spatial information associated with the user in the augmented reality environment. 17 . The computer program product of claim 15 wherein the second machine learning model comprises a Continuous Attention Memory Model (CAMM), the CAMM comprising: a continuous attention mechanism configured to compute attention weights between the input representations of the first data structure and a query vector; a dynamic memory bank configured to store and update information from the input representations of the first data structure as memory items, each of the memory items comprising a vector representation encoding information from at least one of the input representations of the first data structure; and a context relevance estimator configured to rank the memory items according to a relevance to a current context of the augmented reality environment. 18 . A method comprising: generating, using a first machine learning model, a first data structure comprising input representations of one or more input components from an augmented reality environment; generating, using a second machine learning model that takes as input at least a portion of the first data structure, a second dat
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Semantic analysis · CPC title
Facial expression recognition · CPC title
for estimating an emotional state · CPC title
Generative networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.