Devices and Methods for a Speech-Based User Interface

US2016336003A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016336003-A1
Application numberUS-201514711264-A
CountryUS
Kind codeA1
Filing dateMay 13, 2015
Priority dateMay 13, 2015
Publication dateNov 17, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: identifying, by a device that includes one or more processors, a plurality of sources for outputs that the device is configured to provide, wherein the plurality of sources includes at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface (GUI) object; assigning a set of distinct voices to respective sources of the plurality of sources, wherein a voice assigned to one source is characterized by voice characteristics different from voice characteristics of other voices assigned to other sources; receiving a request for speech output; selecting, from within the plurality of sources, a particular source that is associated with the requested speech output; and generating, for the requested speech output, speech having particular voice characteristics of a particular voice assigned to the particular source. 2 . The method of claim 1 , further comprising: obtaining voice data associated with a plurality of voices; determining, based on the voice data, a similarity metric characterizing similarity between the plurality of voices; and selecting, from within the plurality of voices, the set of distinct voices based on the similarity metric indicating similarity between the set of distinct voices being less than a threshold. 3 . The method of claim 2 , further comprising: determining a quantity of the identified plurality of sources; and determining, based on the quantity, the threshold for the similarity between the set of distinct voices. 4 . The method of claim 2 , wherein the voice data is indicative of a subjective similarity comparison between the plurality of voices, and wherein determining the similarity metric is based on the subjective similarity comparison. 5 . The method of claim 2 , wherein the voice data comprises acoustic feature parameters characterizing speech sounds having the plurality of voices, and wherein determining the similarity metric is based on a comparison between the acoustic feature parameters. 6 . The method of claim 2 , further comprising: determining, based on the voice data, a naturalness metric characterizing acoustic transitions between speech sounds having a given voice of the plurality of voices, wherein selecting the set of distinct voices is based also on the naturalness metric indicating naturalness of the set of distinct voices being greater than a given threshold. 7 . The method of claim 2 , further comprising: determining, based on the voice data, an intelligibility metric characterizing cognitive perception of speech sounds having the given voice, wherein selecting the set of distinct voices is based also on the intelligibility metric indicating intelligibility of the set of distinct voices being greater than a given threshold. 8 . The method of claim 2 , wherein the voice data is indicative of voice characteristics of one or more voices, the method further comprising: determining morphing parameters associated with one or more of a tonality, duration, frequency, or quality of a given voice; determining, based on the morphing parameters and the one or more voices, one or more additional voices; and determining the plurality of voices to include the one or more voices indicated by the voice data and the one or more additional voices determined based on the one or more morphing parameters, wherein selecting the set of distinct voices is from within the determined plurality of voices. 9 . The method of claim 2 , wherein the voice data is indicative of voice characteristics of one or more voices, the method further comprising: receiving one or more transforms, wherein a given transform is configured to associate a first voice of the one or more voices with a second voice other than the one or more voices; determining, based on the one or more transforms and the one or more voices, one or more additional voices; and determining the plurality of voices to include the one or more voices indicated by the voice data and the one or more additional voices determined based on the one or more transforms, wherein selecting the set of distinct voices is from within the determined plurality of voices. 10 . The method of claim 1 , further comprising: determining a context of the requested speech output, wherein assigning the set of distinct voices comprises assigning at least two voices of the set of distinct voices to the particular source; and selecting, based on the context, a given voice from within the at least two voices assigned to the particular source, wherein the particular voice of the generated speech corresponds to the selected given voice. 11 . The method of claim 10 , wherein the context is indicative of font characteristics of text associated with the particular source. 12 . The method of claim 10 , wherein the context is indicative of an author of text associated with the particular source, or a type of content in text associated with the particular source. 13 . The method of claim 10 , wherein the context is indicative of a type of the particular source, a status of the particular source, or a status of the device. 14 . A device comprising: one or more processors; data storage storing instructions executable by the one or more processors to cause the device to: identify a plurality of sources for outputs that the device is configured to provide, wherein the plurality of sources includes at least one of a particular application in the device, or an operating system of the device; assign a set of distinct voices to respective sources of the plurality of sources, wherein a voice assigned to one source is characterized by voice characteristics different from voice characteristics of other voices assigned to other sources; receive a request for speech output; select, from within the plurality of sources, a particular source that is associated with the requested speech output; and generate, for the requested speech output, speech having particular voice characteristics of a particular voice assigned to the particular source. 15 . The device of claim 14 , further comprising: a display, wherein the plurality of sources includes at least one of a particular area within the display, or a particular graphical user interface (GUI) object in the display. 16 . The device of claim 15 , wherein the instructions further cause the device to receive an input indicative of selection of the particular area within the display, wherein selecting the particular source is based on the input. 17 . A computer readable medium having stored therein instructions, that when executed by a computing device, cause the computing device to perform functions comprising: identifying a plurality of sources for outputs that the computing device is configured to provide, wherein the plurality of sources includes at least one of a particular application in the computing device, an operating system of the computing device, a particular area within a display of the computing device, or a particular graphical user interface (GUI) object; assigning a set of distinct voices to respective sources of the plurality of sources, wherein a voice assigned to one source is characterized by voice characteristics different from voice characteristics of other voices assigned to other sources; receiving a request for speech output; selecting, from within the plurality of sources, a particular source that is associated with the requested speech output; and

Assignees

Inventors

Classifications

  • G10L13/033Primary

    Voice editing, e.g. manipulating the voice of the synthesiser · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Prosody rules derived from text; Stress or intonation · CPC title

  • Voice conversion or morphing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016336003A1 cover?
A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sourc…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G10L13/033. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 17 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).