Systems and methods for using conjunctions in a voice input to cause a search application to wait for additional inputs

US12524461B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12524461-B2
Application numberUS-202318239401-A
CountryUS
Kind codeB2
Filing dateAug 29, 2023
Priority dateJan 7, 2020
Publication dateJan 13, 2026
Grant dateJan 13, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A search is performed based on a voice input combined with user selection of entities displayed on a display screen as well as real-world entities. A voice input is received from the user by a media device, as well as a selection of a first entity being displayed on the media device. A conjunction spoken in the voice input triggers the media device to wait for selection of a second entity before performing the search. After receiving selection of the second entity, a search query is constructed based on the voice input, the first entity, and the second entity. The search query is transmitted to a database and, in response, the media device receives at least one identifier of a least one content item. The at least one identifier is then generated for display to the user.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method, comprising: receiving input from a user via a user input interface of a media device; processing the input to identify a particular pronoun; identifying a gesture made by the user; processing an image associated with the gesture to determine a plurality of identities of a plurality of entities, respectively, in the image, displayed on a display of the media device; determining, based on the plurality of identities, a respective pronoun for each entity of the plurality of entities; determining, from among the plurality of entities, a particular entity having a pronoun that corresponds to the particular pronoun identified based on the input, wherein the particular entity is displayed on the display of the media device; querying a database based on the input and the particular entity; based on querying the database, receiving at least one identifier of at least one content item; and generating for presentation, using the media device, the at least one identifier of the at least one content item. 2 . The method of claim 1 , wherein the input is a voice input, the particular entity is a second entity, and the method further comprises: receiving, at the media device, a selection of a first entity currently being displayed on the display of the media device; and processing the voice input to identify a search operator, wherein querying the database comprises constructing a search query based on the identified search operator, the first entity and the second entity. 3 . The method of claim 1 , wherein the image is captured by an imaging sensor, and identifying the gesture made by the user comprises determining, based on the image, motion of the user. 4 . The method of claim 1 , wherein the image is captured by an imaging sensor, and determining the particular entity associated with the gesture further comprises: determining a direction of the gesture; and determining that the direction of the gesture corresponds to the image, the image depicting a real-world scene proximate to the user. 5 . The method of claim 4 , wherein determining the particular entity associated with the gesture further comprises: performing image processing of the image to determine a plurality of portions of the image that respectively correspond to the plurality of entities; based on the direction of the gesture, extrapolating a path of the gesture to a portion of the image; and determining as the particular entity an entity of the plurality of entities associated with the portion of the image that the path intersects. 6 . The method of claim 4 , wherein the imaging sensor is a first imaging sensor, the image captured by the first imaging sensor is a first image, and the method further comprises: determining that a second image captured by a second imaging sensor depicts a different perspective of the real-world scene than a perspective of the real-world scene depicted in the first image; extrapolating a first path from the direction of the gesture in the first image; extrapolating a second path from the direction of the gesture in the second image; identifying a point at which the first path crosses the second path; and determining as the particular entity an entity of the plurality of entities associated with the point of the image at which the first path crosses the second path. 7 . The method of claim 4 , wherein the imaging sensor is a first imaging sensor, the image captured by the first imaging sensor is a first image, and the method further comprises: determining that a second image, captured by a second imaging sensor that is facing a second direction, depicts an area in which the user made the gesture; performing image processing of the second image to identify the gesture; extrapolating a first path from the direction of the gesture in the second image; calculating, based on a position and an angle of the first imaging sensor and a position and an angle of the second imaging sensor, a second path in the first image corresponding to the first path; performing image processing of the first image to determine a plurality of portions of the first image that respectively correspond to the plurality of entities; and determining as the particular entity an entity of the plurality of entities associated with the portion of the image that the second path intersects. 8 . The method of claim 1 , wherein the media device is a first media device, the method further comprising: generating, for presentation at a second media device proximate to the first media device, a content item, wherein the image associated with the gesture corresponds to a frame of the content item being presented at the second media device. 9 . The method of claim 8 , wherein the particular entity is a first entity, the content item is a first content item, and the method further comprises: generating, for presentation at the first media device while the first content item is being presented at the second media device, a second content item, wherein the input is a voice input, and the voice input comprises a reference to a second entity depicted in a frame of the second media device, the querying of the database being further based on the second entity. 10 . The method of claim 1 , wherein determining, based on the plurality of identities, the respective pronoun for each entity of the plurality of entities comprises: determining a first pronoun for a first entity of the plurality of entities; and determining a second pronoun for a second entity of the plurality of entities; wherein determining the particular entity having a pronoun that corresponds to the particular pronoun is based on the first pronoun for the first entity of the plurality of entities and the second pronoun for a second entity of the plurality of entities. 11 . A computer-implemented system, comprising: input/output (I/O) circuitry configured to: receive input from a user via a user input interface of a media device; control circuitry configured to: process the input to identify a particular pronoun; identify a gesture made by the user; process an image associated with the gesture to determine a plurality of identities of a plurality of entities, respectively, in the image, the image displayed on a display of the media device; determine, based on the plurality of identities, a respective pronoun for each entity of the plurality of entities; determine, from among the plurality of entities, a particular entity having a pronoun that corresponds to the particular pronoun identified based on the input, wherein the particular entity is displayed on the display of the media device; and query a database based on the input and the particular entity, wherein the I/O circuitry is further configured to: based on querying the database, receive at least one identifier of at least one content item; and generate for presentation, using the media device, the at least one identifier of the at least one content item. 12 . The system of claim 11 , wherein: the input is a voice input, the particular entity is a second entity; the I/O circuitry is further configured to receive, at the media device, a selection of a first entity currently being displayed on the display of the media device; and the control circuitry is further configured to: process the voice input to identify a search operator; and query the database by constructing a search query based on the identified search operator, the first entity and the second entity. 13 . The system of claim 11 , wherein the image is captured by an imaging sensor, and

Assignees

Inventors

Classifications

  • Recognition of hand or arm movements, e.g. recognition of deaf sign language (static hand signs G06V40/113) · CPC title

  • Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level (multimodal speaker identification or verification G10L17/10) · CPC title

  • Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Presentation of query results · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12524461B2 cover?
A search is performed based on a voice input combined with user selection of entities displayed on a display screen as well as real-world entities. A voice input is received from the user by a media device, as well as a selection of a first entity being displayed on the media device. A conjunction spoken in the voice input triggers the media device to wait for selection of a second entity befor…
Who is the assignee on this patent?
Adeia Guides Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/632. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).