Methods and apparatus to define virtual scenes using natural language commands and natural gestures

US10403285B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10403285-B1
Application numberUS-201715831617-A
CountryUS
Kind codeB1
Filing dateDec 5, 2017
Priority dateDec 5, 2016
Publication dateSep 3, 2019
Grant dateSep 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed methods and apparatus allow a lay person to easily and intuitively define virtual scenes using natural language commands and natural gestures. Natural language commands include statements that a person would naturally (e.g., spontaneously, simply, easily, intuitively, etc.) speak without any or little training. Example natural language commands include “put a cat on the box,” or “put a ball in front of the red box.” Natural gestures include gestures that a person would naturally do, perform or carry out (e.g., spontaneously, simply, easily, intuitively, etc.) without any or little training. Example natural gestures include pointing, a distance between hands, gazing, head tilt, kicking, etc. The person can simply speak and gesture how it naturally occurs to them.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: translating words spoken by a user while in a scene into text, the scene capable of including a computer-generated virtual element; parsing the text into a spoken command fragment; identifying a scene definition gesture from gesture information captured for the user in the scene; performing ray tracing to identify an object or a location based on the identified scene definition gesture; combining the spoken command fragment and the scene definition gesture to form a scene building instruction; and performing the scene building instruction to at least partially define the computer-generated element of the scene. 2. The method of claim 1 , further comprising time aligning the spoken command fragment and the scene definition gesture. 3. The method of claim 1 , further comprising contextually matching the spoken command fragment and the scene definition gesture. 4. The method of claim 1 , wherein the scene is modified while the user is in the scene. 5. The method of claim 1 , wherein the scene includes a non computer-generated element. 6. The method of claim 1 , wherein the computer-generated element comprises an aspect of the computer-generated element. 7. The method of claim 1 , wherein the computer-generated element comprises at least one of a 2D object, a 3D object, a sound, a video and/or a picture. 8. The method of claim 1 , wherein the gesture of the user includes at least one of pointing, a separation between hands, an eye gaze, and/or a head tilt. 9. The method of claim 1 , further comprising: detecting a spoken start command; and recording the words spoken by the user after the spoken start command is detected. 10. An apparatus comprising: a speech-to-text translator to translate words spoken by a user while in a scene into text, the scene capable of including a computer-generated virtual element; a language parser to parse the text into spoken command fragment; a gesture identifier configured to identify a scene definition gesture from gesture information captured for the user in the scene; a ray tracer configured to identify an object or a location based on the identified scene definition gesture; a gesture/language combiner configured to combine the spoken command fragment and the scene definition gesture to form a scene building instruction; and a builder configured to perform the scene building instruction to at least partially define the computer-generated element of the scene. 11. The apparatus of claim 10 , wherein the gesture/language is configured to time aligning the spoken command fragment and the scene definition gesture. 12. The apparatus of claim 10 , wherein the gesture/language is configured to contextually match the spoken command fragment and the scene definition gesture. 13. The apparatus of claim 10 , wherein the builder is configured to modify the scene while the user is in the scene. 14. The apparatus of claim 10 , wherein the scene includes a non computer-generated element. 15. The apparatus of claim 10 , wherein the computer-generated element comprises an aspect of the computer-generated element. 16. The apparatus of claim 10 , wherein the computer-generated element comprises at least one of a 2D object, a 3D object, a sound, a video and/or a picture. 17. The apparatus of claim 10 , wherein the gesture of the user includes at least one of pointing, a separation between hands, an eye gaze, and/or a head tilt. 18. The apparatus of claim 10 , wherein the speech-to-text engine is configured to detect a spoken start command; and the words spoken by the user are recorded after the spoken start command is detected. 19. A non-transitory machine-readable media storing machine-readable instructions that, when executed, cause a machine to: translate words spoken by a user while in a scene into text, the scene capable of including a computer-generated virtual element; parse the text into a spoken command fragment; identify a scene definition gesture from gesture information captured for the user in the scene; perform ray tracing to identify an object or a location based on the identified scene definition gesture; combine the spoken command fragment and the scene definition gesture to form a scene building instruction; and perform the scene building instruction to at least partially define the computer-generated element of the scene. 20. The non-transitory machine-readable media of claim 19 , wherein the machine-readable instructions, when executed, cause a machine to contextually match the spoken command fragment with the scene definition gesture to form the scene building instruction.

Assignees

Inventors

Classifications

  • Arrangements for interaction with the human body, e.g. for user immersion in virtual reality (blind teaching G09B21/00) · CPC title

  • Parsing · CPC title

  • Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title

  • Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title

  • Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10403285B1 cover?
The disclosed methods and apparatus allow a lay person to easily and intuitively define virtual scenes using natural language commands and natural gestures. Natural language commands include statements that a person would naturally (e.g., spontaneously, simply, easily, intuitively, etc.) speak without any or little training. Example natural language commands include “put a cat on the box,” or “…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/265. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).