Plural-Mode Image-Based Search
US-2020356592-A1 · Nov 12, 2020 · US
US11227593B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11227593-B2 |
| Application number | US-201916456275-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 28, 2019 |
| Priority date | Jun 28, 2019 |
| Publication date | Jan 18, 2022 |
| Grant date | Jan 18, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are described herein for disambiguating a voice search query by determining whether the user made a gesture while speaking a quotation from a content item and whether the user mimicked or approximated a gesture made by a character in the content item when the character spoke the words quoted by the user. If so, a search result comprising an identifier of the content item is generated. A search result representing the content item from which the quotation comes may be ranked highest among other search results returned and therefore presented first in a list of search results. If the user did not mimic or approximate a gesture made by a character in the content item when the quotation is spoken in the content item, then a search result may not be generated for the content item or may be ranked lowest among other search results.
Opening claim text (preview).
What is claimed is: 1. A method for disambiguating a voice search query, the method comprising: receiving a voice search query; transcribing the voice search query into a string comprising a plurality of words; capturing, concurrently with receiving the voice search query, an image of a pose of a user, the image of the pose comprising a plurality of pixels of at least one portion of a body of the user; querying a database with the string; identifying, from the database in response to the query, a plurality of quotations matching the string; retrieving, from the database, metadata of a quotation of the plurality of quotations matching the string, the metadata including quotation pose information corresponding to the matched string comparing the quotation pose information included in the received metadata with the captured image of the pose of the user, wherein the comparing comprises: scaling a first size of the captured image of the pose of the user to match a second size of the quotation pose; superimposing a grid over the captured image of the pose of the user; determining, based on the grid, a second set of pixel coordinates describing a location of the at least one portion of the body of the user in the captured image of the pose; comparing the second set of pixel coordinates with a first set of pixel coordinates describing a location of at least one portion of a body in the quotation pose information included in the received metadata; determining, based on the comparing, whether the captured image of the captured pose of the user matches the quotation pose information; and in response to determining that the captured image of the pose of the user matches the quotation pose, generating for display a search result comprising an identifier of the quotation. 2. The method of claim 1 , further comprising: receiving, in response to the query, a plurality of content identifiers of content items having metadata matching the string; and generating for display a plurality of search results comprising the plurality of content identifiers. 3. The method of claim 2 , further comprising: ranking each content identifier of the plurality of content identifiers based on a degree to which the metadata corresponding to each respective content identifier matches the string; ranking the identifier of the quotation higher than each of the plurality of content identifiers; and ordering the plurality of content identifiers based on the respective rank of each content identifier of the plurality of content identifiers. 4. The method of claim 1 , wherein capturing the image of the pose of the user comprises: receiving image data representing at least a portion of the body of the user; identifying portions of the body of the user represented in the image data; determining a position of each identified portion of the body of the user; and determining a respective relative position of each identified portion of the body of the user relative to each other identified portion of the body of the user. 5. The method of claim 1 , wherein capturing the image of the pose of the user comprises: receiving position data from at least one user device placed on the body of the user; identifying a portion of the body of the user on which the at least one user device is located; and determining a position of the identified portion of the body of the user relative to other portions of the body of the user. 6. The method of claim 1 , further comprising determining at least one motion associated with the image of the pose. 7. The method of claim 6 , wherein capturing the image of the pose of the user comprises capturing a plurality of successive images of poses of the user corresponding to a period of time during which the voice search query originated. 8. The method of claim 7 , wherein comparing the captured image of the pose of the user with the pose information in the metadata of the quotation comprises: identifying a plurality of portions of the body of the user captured in a first image of pose of the plurality of successive images of poses; and identifying a travel path for each portion of the body of the user by tracking a position of each respective portion of the body of the user of the plurality of portions of the body of the user through each successive image of pose of the plurality of images of poses; wherein the pose information comprises path information. 9. A system for disambiguating a voice search query, the system comprising: input circuitry configured to: receive a voice search query; and capture, concurrently with receiving the voice search query, an image of a pose of a user, the image of the pose comprising a plurality of pixels of at least one portion of a body of the user; and control circuitry configured to: transcribe the voice search query into a string comprising a plurality of words; query a database with the string; identify, from the database in response to the query, a plurality of quotations matching the string; retrieve, from the database, metadata of a quotation of the plurality of quotations matching the string, the metadata including quotation pose information corresponding to the matched string; compare the quotation pose information included in the received metadata with the captured image of the pose of the user, wherein the comparing comprises: scale a first size of the captured image of the pose of the user to match a second size of the quotation pose; superimpose a grid over the captured image of the pose of the user; determine, based on the grid, a second set of pixel coordinates describing a location of the at least one portion of the body of the user in the captured image of the pose; compare the second set of pixel coordinates with a first set of pixel coordinates describing a location of at least one portion of a body in the quotation pose information included in the received metadata; determine, based on the comparing, whether the captured image of the pose of the user matches the quotation pose information; and in response to determining that the captured image of the pose of the user matches the quotation pose, generate for display a search result comprising an identifier of the quotation. 10. The system of claim 9 , wherein the control circuitry is further configured to: receive, in response to the query, a plurality of content identifiers of content items having metadata matching the string; and generate for display a plurality of search results comprising the plurality of content identifiers. 11. The system of claim 10 , wherein the control circuitry is further configured to: rank each content identifier of the plurality of content identifiers based on a degree to which the metadata corresponding to each respective content identifier matches the string; rank the identifier of the quotation higher than each of the plurality of content identifiers; and order the plurality of content identifiers based on the respective rank of each content identifier of the plurality of content identifiers. 12. The system of claim 9 , wherein the input circuitry configured to capture the image of the pose of the user is further configured to: receive image data representing at least a portion of the body of the user; identify portions of the body of the user represented in the image data; determine a position of each identified portion of the body of the user; and determine a respective relative position of each identified portion of the body of the user relative to each other identified portion of the body of the user. 13. The system of claim 9 , wherein the input circuitry con
Natural language query formulation · CPC title
using context · CPC title
in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.