What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for disambiguating a voice search query based on gestures

US11227593B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11227593-B2
Application number	US-201916456275-A
Country	US
Kind code	B2
Filing date	Jun 28, 2019
Priority date	Jun 28, 2019
Publication date	Jan 18, 2022
Grant date	Jan 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are described herein for disambiguating a voice search query by determining whether the user made a gesture while speaking a quotation from a content item and whether the user mimicked or approximated a gesture made by a character in the content item when the character spoke the words quoted by the user. If so, a search result comprising an identifier of the content item is generated. A search result representing the content item from which the quotation comes may be ranked highest among other search results returned and therefore presented first in a list of search results. If the user did not mimic or approximate a gesture made by a character in the content item when the quotation is spoken in the content item, then a search result may not be generated for the content item or may be ranked lowest among other search results.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for disambiguating a voice search query, the method comprising: receiving a voice search query; transcribing the voice search query into a string comprising a plurality of words; capturing, concurrently with receiving the voice search query, an image of a pose of a user, the image of the pose comprising a plurality of pixels of at least one portion of a body of the user; querying a database with the string; identifying, from the database in response to the query, a plurality of quotations matching the string; retrieving, from the database, metadata of a quotation of the plurality of quotations matching the string, the metadata including quotation pose information corresponding to the matched string comparing the quotation pose information included in the received metadata with the captured image of the pose of the user, wherein the comparing comprises: scaling a first size of the captured image of the pose of the user to match a second size of the quotation pose; superimposing a grid over the captured image of the pose of the user; determining, based on the grid, a second set of pixel coordinates describing a location of the at least one portion of the body of the user in the captured image of the pose; comparing the second set of pixel coordinates with a first set of pixel coordinates describing a location of at least one portion of a body in the quotation pose information included in the received metadata; determining, based on the comparing, whether the captured image of the captured pose of the user matches the quotation pose information; and in response to determining that the captured image of the pose of the user matches the quotation pose, generating for display a search result comprising an identifier of the quotation. 2. The method of claim 1 , further comprising: receiving, in response to the query, a plurality of content identifiers of content items having metadata matching the string; and generating for display a plurality of search results comprising the plurality of content identifiers. 3. The method of claim 2 , further comprising: ranking each content identifier of the plurality of content identifiers based on a degree to which the metadata corresponding to each respective content identifier matches the string; ranking the identifier of the quotation higher than each of the plurality of content identifiers; and ordering the plurality of content identifiers based on the respective rank of each content identifier of the plurality of content identifiers. 4. The method of claim 1 , wherein capturing the image of the pose of the user comprises: receiving image data representing at least a portion of the body of the user; identifying portions of the body of the user represented in the image data; determining a position of each identified portion of the body of the user; and determining a respective relative position of each identified portion of the body of the user relative to each other identified portion of the body of the user. 5. The method of claim 1 , wherein capturing the image of the pose of the user comprises: receiving position data from at least one user device placed on the body of the user; identifying a portion of the body of the user on which the at least one user device is located; and determining a position of the identified portion of the body of the user relative to other portions of the body of the user. 6. The method of claim 1 , further comprising determining at least one motion associated with the image of the pose. 7. The method of claim 6 , wherein capturing the image of the pose of the user comprises capturing a plurality of successive images of poses of the user corresponding to a period of time during which the voice search query originated. 8. The method of claim 7 , wherein comparing the captured image of the pose of the user with the pose information in the metadata of the quotation comprises: identifying a plurality of portions of the body of the user captured in a first image of pose of the plurality of successive images of poses; and identifying a travel path for each portion of the body of the user by tracking a position of each respective portion of the body of the user of the plurality of portions of the body of the user through each successive image of pose of the plurality of images of poses; wherein the pose information comprises path information. 9. A system for disambiguating a voice search query, the system comprising: input circuitry configured to: receive a voice search query; and capture, concurrently with receiving the voice search query, an image of a pose of a user, the image of the pose comprising a plurality of pixels of at least one portion of a body of the user; and control circuitry configured to: transcribe the voice search query into a string comprising a plurality of words; query a database with the string; identify, from the database in response to the query, a plurality of quotations matching the string; retrieve, from the database, metadata of a quotation of the plurality of quotations matching the string, the metadata including quotation pose information corresponding to the matched string; compare the quotation pose information included in the received metadata with the captured image of the pose of the user, wherein the comparing comprises: scale a first size of the captured image of the pose of the user to match a second size of the quotation pose; superimpose a grid over the captured image of the pose of the user; determine, based on the grid, a second set of pixel coordinates describing a location of the at least one portion of the body of the user in the captured image of the pose; compare the second set of pixel coordinates with a first set of pixel coordinates describing a location of at least one portion of a body in the quotation pose information included in the received metadata; determine, based on the comparing, whether the captured image of the pose of the user matches the quotation pose information; and in response to determining that the captured image of the pose of the user matches the quotation pose, generate for display a search result comprising an identifier of the quotation. 10. The system of claim 9 , wherein the control circuitry is further configured to: receive, in response to the query, a plurality of content identifiers of content items having metadata matching the string; and generate for display a plurality of search results comprising the plurality of content identifiers. 11. The system of claim 10 , wherein the control circuitry is further configured to: rank each content identifier of the plurality of content identifiers based on a degree to which the metadata corresponding to each respective content identifier matches the string; rank the identifier of the quotation higher than each of the plurality of content identifiers; and order the plurality of content identifiers based on the respective rank of each content identifier of the plurality of content identifiers. 12. The system of claim 9 , wherein the input circuitry configured to capture the image of the pose of the user is further configured to: receive image data representing at least a portion of the body of the user; identify portions of the body of the user represented in the image data; determine a position of each identified portion of the body of the user; and determine a respective relative position of each identified portion of the body of the user relative to each other identified portion of the body of the user. 13. The system of claim 9 , wherein the input circuitry con

Assignees

Rovi Guides Inc

Inventors

Classifications

G06F16/243
Natural language query formulation · CPC title
G06F16/24575
using context · CPC title
G06V20/40
in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title
G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G06F3/017Primary
Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title

Patent family

Related publications grouped by family.

View patent family 74043764

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11227593B2 cover?: Systems and methods are described herein for disambiguating a voice search query by determining whether the user made a gesture while speaking a quotation from a content item and whether the user mimicked or approximated a gesture made by a character in the content item when the character spoke the words quoted by the user. If so, a search result comprising an identifier of the content item is …
Who is the assignee on this patent?: Rovi Guides Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Plural-Mode Image-Based Search

Analysis for results of textual image queries

Electronic device and method for controlling the electronic device thereof

Methods and systems for recommending content in context of a conversation

Systems and methods for receiving a segment of a media asset relating to a user image

Pointer projection for natural user input

Frequently asked questions