Visual confirmation for a recognized voice-initiated action

US9575720B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9575720-B2
Application numberUS-201314109660-A
CountryUS
Kind codeB2
Filing dateDec 17, 2013
Priority dateJul 31, 2013
Publication dateFeb 21, 2017
Grant dateFeb 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques described herein provide a computing device configured to provide an indication that the computing device has recognized a voice-initiated action. In one example, a method is provided for outputting, by a computing device and for display, a speech recognition graphical user interface (GUI) having at least one element in a first visual format. The method further includes receiving, by the computing device, audio data and determining, by the computing device, a voice-initiated action based on the audio data. The method also includes outputting, while receiving additional audio data and prior to executing a voice-initiated action based on the audio data, and for display, an updated speech recognition GUI in which the at least one element is displayed in a second visual format, different from the first visual format, to indicate that the voice-initiated action has been identified.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: outputting, by a first application executing at a computing device and for display, a speech recognition graphical user interface (GUI) having at least one non-textual element in a first visual format; receiving, by the first application executing at the computing device, first audio data of a voice command that indicates one or more words of the voice command; determining, by the first application executing at the computing device, based on the one or more words of the voice command, a voice-initiated action indicated by the first audio data of the voice command, wherein the voice-initiated action is a particular voice-initiated action from a plurality of voice-initiated actions and the voice-initiated action is associated with a second application that is different than the first application; responsive to determining the voice-initiated action indicated by the first audio data of the voice command, and while receiving second audio data of the voice command that indicates one or more additional words of the voice command, and prior to executing the second application to perform the voice command, outputting, by the first application executing at the computing device, for display, an updated speech recognition GUI in which the at least one non-textual element, from the speech recognition GUI, transitions from being displayed in the first visual format to being displayed in a second visual format, different from the first visual format, indicating that the voice-initiated action is the particular voice-initiated action from the plurality of voice-initiated actions that has been determined from the first audio data of the voice command, wherein: the first visual format of the at least one non-textual element is a first image representative of a speech recognition mode of the first application, the second visual format of the at least one non-textual element is a second image that replaces the first image and corresponds to the voice-initiated action from the plurality of voice-initiated actions, and the second image is different from other images corresponding to one or more other voice-initiated actions from the plurality of voice-initiated actions; and after outputting the updated speech recognition GUI and after receiving the second audio data of the voice command, executing, by the computing device, based on the first audio data and the second audio data, the second application that performs the voice-initiated action indicated by the voice command. 2. The method of claim 1 , further comprising: determining, by the computing device, based on the first audio data and the second audio data, a transcription comprising the one or more words of the voice command and the one or more additional words of the voice command, wherein outputting the updated speech recognition GUI comprises outputting at least a portion of the transcription. 3. The method of claim 2 , wherein outputting the updated speech recognition GUI further comprises outputting the one or more words of the voice command and refraining from outputting the one or more additional words of the voice command. 4. The method of claim 1 , wherein the second visual format is further different from the first visual format in at least one of color, font, size, highlighting, style, or position. 5. The method of claim 1 , wherein outputting the updated speech recognition GUI comprises outputting the first image representative of the speech recognition mode with an animation that morphs into the second image in response to determining the voice-initiated action based on the first audio data. 6. The method of claim 1 , further comprising responsive to determining the voice-initiated action based on the first audio data, performing, by the computing device, based on the second audio data, the voice-initiated action. 7. The method of claim 6 , wherein the voice-initiated action is performed in response to receiving, by the computing device, an indication confirming that the voice-initiated action is correct. 8. The method of claim 1 , wherein determining the voice-initiated action further comprises determining the voice-initiated action based at least partially on a comparison of at least one of the one or more words of the voice command to a preconfigured set of actions. 9. The method of claim 1 , wherein determining the voice-initiated action further comprises: identifying, by the computing device, at least one verb in the one or more words of the voice command; and comparing the at least one verb to one or more verbs from a set of verbs, each verb in the set of verbs corresponding to at least one action from the plurality of voice-initiated actions. 10. The method of claim 1 , wherein determining the voice-initiated action further comprises: determining, by the computing device, a context based at least part on data from the computing device; and determining, by the computing device, based at least partially on the context and the first audio data, the voice-initiated action. 11. The method of claim 1 , further comprising: responsive to receiving an indication of a cancellation input, outputting, by the computing device, the at least one non-textual element for display in the first visual format. 12. The method of claim 1 , wherein the first image representative of a speech recognition mode of the first application comprises a microphone. 13. The method of claim 1 , wherein the second image is selected from a group consisting of: a compass arrow associated with a navigation feature of the second application, a play button associated with a media output feature of the second application, a pause button associated with the media output feature of the second application, a stop button associated with the media output feature of the second application, a telephone button associated with a telephone feature of the second application, and a search engine icon associated with a search feature of the second application. 14. The method of claim 1 , wherein: the at least one non-textual element is displayed within a particular region of a display while being output for display in the first visual format; and the at least one non-textual element is displayed within the particular region of the display while being output for display in the second visual format. 15. The method of claim 1 , wherein outputting the updated speech recognition GUI comprises: prior to outputting the at least one non-textual element in the second visual format, outputting, by the first application executing at the computing device, for display, an animation of the at least one non-textual element transitioning from the first visual format to the second visual format. 16. The method of claim 1 , wherein the first audio data is associated with command speech from a user of the computing device and the second audio data is associated with non-command speech from the user. 17. A computing device, comprising: a display device; one or more processors; and a memory that stores instructions associated with a first application that when executed cause the one or more processors to: output, for display at the display device, a speech recognition graphical user interface (GUI) having at least one non-textual element in a first visual format; receive first audio data of a voice command that indicates one or more words of the voice command; determine, based on the one or more words of the voice command, a voice-initiated action indicated by the first audio data of the voice command, wherein the voice-initiat

Assignees

Inventors

Classifications

  • of application context · CPC title

  • G06F3/167Primary

    Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Execution procedure of a spoken command · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9575720B2 cover?
Techniques described herein provide a computing device configured to provide an indication that the computing device has recognized a voice-initiated action. In one example, a method is provided for outputting, by a computing device and for display, a speech recognition graphical user interface (GUI) having at least one element in a first visual format. The method further includes receiving, by…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/167. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).