Natural language interactions with interactive visual content

US2025149028A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025149028-A1
Application numberUS-202418923949-A
CountryUS
Kind codeA1
Filing dateOct 23, 2024
Priority dateDec 10, 2021
Publication dateMay 8, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for facilitating natural language interactions with visual interactive content are described. During a build time, a system analyzes various websites and applications relating to a particular user goal to understand website and application navigation and information relating to the user goal. The learned information is used to store configuration data. During runtime, when a user request performance of an action, the system engages in a dialog with the user to complete the user's goal. The system uses the stored configuration data to determine actions to be performed at a website or application to complete the user's goal, and determines system responses to present to the user to facilitate completion of the goal. Such system responses may request information from the user, may inform the user of information displayed at the website or application, etc.

First claim

Opening claim text (preview).

1 - 20 . (canceled) 21 . A computer-implemented method, comprising: accessing a first website to determine first website data corresponding to the first website, the first website data indicating at least one web element operable to perform a first action using the first website; processing the first website data to configure a first system component to, in response to a user input, interact with the first website data to provide a response to the user input; after configuration of the first system component, receiving first input data corresponding to a first natural language input; processing the first input data to determine the first natural language input corresponds to execution of the first action using the first website; performing processing using the first system component to determine execution of the first action requires first information; determining first output data representing a request for input of the first information; causing presentation of the request using the first output data; receiving second input data; processing the second input data to determine a first value for the first information; after determination of the first information, accessing the first website to provide the first value and cause execution of the first action; after providing the first value, receiving, from the first website, second output data representing a result of the first action; and sending the second output data to a user device to cause presentation of information related to the result. 22 . The computer-implemented method of claim 21 , wherein the second input data represents a second natural language input and wherein the method further comprises performing natural language processing using the second input data to determine the first information. 23 . The computer-implemented method of claim 21 , further comprising, prior to receiving the first input data: accessing a second website to determine second website data corresponding to the second website, the second website data indicating at least one second web element operable to perform a second action using the second website; and processing the second website data to further configure the first system component. 24 . The computer-implemented method of claim 21 , further comprising: receiving first content data corresponding to the first website, the first content data corresponding to interactive visual content, wherein configuration of the first system component uses the first content data. 25 . The computer-implemented method of claim 21 , wherein: causing presentation of the request comprises causes presentation of the request using the user device. 26 . The computer-implemented method of claim 21 , wherein: the first website data further comprises image data corresponding to the first website; and configuration of the first system component uses the image data. 27 . The computer-implemented method of claim 26 , further comprising: processing the image data using bounding box detection to determine image feature data, wherein configuration of the first system component uses the image feature data. 28 . The computer-implemented method of claim 26 , further comprising: processing the image data using computer vision processing to determine image feature data, wherein configuration of the first system component uses the image feature data. 29 . The computer-implemented method of claim 21 , further comprising: processing the first website data to determine web elements mapping data representing a relationship of the at least one web element to at least one second web element, wherein configuration of the first system component uses the web elements mapping data. 30 . The computer-implemented method of claim 21 , further comprising: processing the first website data to determine semantic data corresponding to at least the first action, wherein configuration of the first system component uses the semantic data. 31 . A system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: access a first website to determine first website data corresponding to the first website, the first website data indicating at least one web element operable to perform a first action using the first website; process the first website data to configure a first system component to, in response to a user input, interact with the first website data to provide a response to the user input; after configuration of the first system component, receive first input data corresponding to a first natural language input; process the first input data to determine the first natural language input corresponds to execution of the first action using the first website; perform processing using the first system component to determine execution of the first action requires first information; determine first output data representing a request for input of the first information; cause presentation of the request using the first output data; receive second input data; process the second input data to determine a first value for the first information; after determination of the first information, access the first website to provide the first value and cause execution of the first action; after providing the first value, receive, from the first website, second output data representing a result of the first action; and send the second output data to a user device to cause presentation of information related to the result. 32 . The system of claim 31 , wherein the second input data represents a second natural language input and wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to perform natural language processing using the second input data to determine the first information. 33 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to, prior to receipt of the first input data: access a second website to determine second website data corresponding to the second website, the second website data indicating at least one second web element operable to perform a second action using the second website; and process the second website data to further configure the first system component. 34 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: receive first content data corresponding to the first website, the first content data corresponding to interactive visual content, wherein configuration of the first system component uses the first content data. 35 . The system of claim 31 , wherein the instructions that cause the system to cause presentation of the request comprise instructions that, when executed by the at least one processor, cause the system to cause presentation of the request using the user device. 36 . The system of claim 31 , wherein: the first website data further comprises image data corresponding to the first website; and configuration of the first system component uses the image data. 37 . The system of claim 36 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process the image data using bounding box detection to determine image feature da

Assignees

Inventors

Classifications

  • Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L13/08) · CPC title

  • Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title

  • Parsing for meaning understanding · CPC title

  • Execution procedure of a spoken command · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025149028A1 cover?
Techniques for facilitating natural language interactions with visual interactive content are described. During a build time, a system analyzes various websites and applications relating to a particular user goal to understand website and application navigation and information relating to the user goal. The learned information is used to store configuration data. During runtime, when a user req…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/183. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).