Audio processing apparatus
US-12123736-B2 · Oct 22, 2024 · US
US2016336003A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016336003-A1 |
| Application number | US-201514711264-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 13, 2015 |
| Priority date | May 13, 2015 |
| Publication date | Nov 17, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: identifying, by a device that includes one or more processors, a plurality of sources for outputs that the device is configured to provide, wherein the plurality of sources includes at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface (GUI) object; assigning a set of distinct voices to respective sources of the plurality of sources, wherein a voice assigned to one source is characterized by voice characteristics different from voice characteristics of other voices assigned to other sources; receiving a request for speech output; selecting, from within the plurality of sources, a particular source that is associated with the requested speech output; and generating, for the requested speech output, speech having particular voice characteristics of a particular voice assigned to the particular source. 2 . The method of claim 1 , further comprising: obtaining voice data associated with a plurality of voices; determining, based on the voice data, a similarity metric characterizing similarity between the plurality of voices; and selecting, from within the plurality of voices, the set of distinct voices based on the similarity metric indicating similarity between the set of distinct voices being less than a threshold. 3 . The method of claim 2 , further comprising: determining a quantity of the identified plurality of sources; and determining, based on the quantity, the threshold for the similarity between the set of distinct voices. 4 . The method of claim 2 , wherein the voice data is indicative of a subjective similarity comparison between the plurality of voices, and wherein determining the similarity metric is based on the subjective similarity comparison. 5 . The method of claim 2 , wherein the voice data comprises acoustic feature parameters characterizing speech sounds having the plurality of voices, and wherein determining the similarity metric is based on a comparison between the acoustic feature parameters. 6 . The method of claim 2 , further comprising: determining, based on the voice data, a naturalness metric characterizing acoustic transitions between speech sounds having a given voice of the plurality of voices, wherein selecting the set of distinct voices is based also on the naturalness metric indicating naturalness of the set of distinct voices being greater than a given threshold. 7 . The method of claim 2 , further comprising: determining, based on the voice data, an intelligibility metric characterizing cognitive perception of speech sounds having the given voice, wherein selecting the set of distinct voices is based also on the intelligibility metric indicating intelligibility of the set of distinct voices being greater than a given threshold. 8 . The method of claim 2 , wherein the voice data is indicative of voice characteristics of one or more voices, the method further comprising: determining morphing parameters associated with one or more of a tonality, duration, frequency, or quality of a given voice; determining, based on the morphing parameters and the one or more voices, one or more additional voices; and determining the plurality of voices to include the one or more voices indicated by the voice data and the one or more additional voices determined based on the one or more morphing parameters, wherein selecting the set of distinct voices is from within the determined plurality of voices. 9 . The method of claim 2 , wherein the voice data is indicative of voice characteristics of one or more voices, the method further comprising: receiving one or more transforms, wherein a given transform is configured to associate a first voice of the one or more voices with a second voice other than the one or more voices; determining, based on the one or more transforms and the one or more voices, one or more additional voices; and determining the plurality of voices to include the one or more voices indicated by the voice data and the one or more additional voices determined based on the one or more transforms, wherein selecting the set of distinct voices is from within the determined plurality of voices. 10 . The method of claim 1 , further comprising: determining a context of the requested speech output, wherein assigning the set of distinct voices comprises assigning at least two voices of the set of distinct voices to the particular source; and selecting, based on the context, a given voice from within the at least two voices assigned to the particular source, wherein the particular voice of the generated speech corresponds to the selected given voice. 11 . The method of claim 10 , wherein the context is indicative of font characteristics of text associated with the particular source. 12 . The method of claim 10 , wherein the context is indicative of an author of text associated with the particular source, or a type of content in text associated with the particular source. 13 . The method of claim 10 , wherein the context is indicative of a type of the particular source, a status of the particular source, or a status of the device. 14 . A device comprising: one or more processors; data storage storing instructions executable by the one or more processors to cause the device to: identify a plurality of sources for outputs that the device is configured to provide, wherein the plurality of sources includes at least one of a particular application in the device, or an operating system of the device; assign a set of distinct voices to respective sources of the plurality of sources, wherein a voice assigned to one source is characterized by voice characteristics different from voice characteristics of other voices assigned to other sources; receive a request for speech output; select, from within the plurality of sources, a particular source that is associated with the requested speech output; and generate, for the requested speech output, speech having particular voice characteristics of a particular voice assigned to the particular source. 15 . The device of claim 14 , further comprising: a display, wherein the plurality of sources includes at least one of a particular area within the display, or a particular graphical user interface (GUI) object in the display. 16 . The device of claim 15 , wherein the instructions further cause the device to receive an input indicative of selection of the particular area within the display, wherein selecting the particular source is based on the input. 17 . A computer readable medium having stored therein instructions, that when executed by a computing device, cause the computing device to perform functions comprising: identifying a plurality of sources for outputs that the computing device is configured to provide, wherein the plurality of sources includes at least one of a particular application in the computing device, an operating system of the computing device, a particular area within a display of the computing device, or a particular graphical user interface (GUI) object; assigning a set of distinct voices to respective sources of the plurality of sources, wherein a voice assigned to one source is characterized by voice characteristics different from voice characteristics of other voices assigned to other sources; receiving a request for speech output; selecting, from within the plurality of sources, a particular source that is associated with the requested speech output; and
Voice editing, e.g. manipulating the voice of the synthesiser · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Prosody rules derived from text; Stress or intonation · CPC title
Voice conversion or morphing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.