User interface for generating expressive content
US-11321890-B2 · May 3, 2022 · US
US2022230374A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022230374-A1 |
| Application number | US-202217713749-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 5, 2022 |
| Priority date | Nov 9, 2016 |
| Publication date | Jul 21, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Generation of expressive content is provided. An expressive synthesized speech system provides improved voice authoring user interfaces by which a user is enabled to efficiently author content for generating expressive output. An expressive synthesized speech system provides an expressive keyboard for enabling input of textual content and for selecting expressive operators, such as emoji objects or punctuation objects for applying predetermined prosody attributes or visual effects to the textual content. A voicesetting editor mode enables the user to author and adjust particular prosody attributes associated with the content for composing carefully-crafted synthetic speech. An active listening mode (ALM) is provided, which when selected, a set of ALM effect options are displayed, wherein each option is associated with a particular sound effect and/or visual effect. The user is enabled to rapidly respond with expressive vocal sound effects or visual effects while listening to others speak.
Opening claim text (preview).
We claim: 1 . A computer-implemented method for generating expressive content comprising: displaying an expressive keyboard, wherein the expressive keyboard includes an alpha-numeric keyboard for receiving textual input and a plurality of expressive operators for selectively applying an emotional tone or a vocal sound effect associated with each of the plurality of expressive operators to received textual input; receiving textual input; in response to receiving a selection of an expressive operator, identifying a predefined set of one or more prosody attributes and vocal sound effects associated with the selected expressive operator; combining the associated predefined set of prosody attributes or the vocal sound effect with the received textual input; and outputting the combined set of prosody attributes or the vocal sound effect and textual input to a speech generation engine for generating expressive synthesized speech. 2 . The method of claim 1 , wherein displaying the expressive keyboard including the plurality of expressive operators comprises displaying a plurality of emoji objects, wherein each emoji object is illustrative of an emotion. 3 . The method of claim 2 , further comprising displaying a plurality of emoji objects for selectively providing a visual effect associated with a selected emoji object as output. 4 . The method of claim 2 , wherein in response to a selection of an emoji object: identifying a visual effect associated with the selected emoji object; and outputting the visual effect to a visualization generation engine for generating an expressive display of the visual effect. 5 . The method of claim 2 , wherein displaying the plurality of emoji objects comprises: determining a set of emoji objects to display based on data associated with a user's emotional state, wherein the set includes emoji objects associated with an emotion corresponding to the user's emotional state; and displaying the set of emoji objects. 6 . The method of claim 1 , wherein combining the associated predefined set of prosody attributes with the received textual input comprises applying at least one of pause length, pitch, speed, and emphasis properties to the textual input. 7 . The method of claim 1 , wherein combining the vocal sound effect with the received textual input comprises combining a vocal sound effect selected from a group comprised of: a laugh; a sarcastic scoff, a sharp breath in; a disgusted “ugh” sound; an angry “argh” sound; and one or more user-provided sound effects. 8 . The method of claim 1 , wherein displaying the expressive keyboard including the plurality of expressive operators comprises displaying a plurality of punctuation objects. 9 . The method of claim 1 , further comprising; providing a voicesetting editor interface; in response to receiving a selection to launch the voicesetting editor interface: parsing the textual input and any received expressive operator selections; displaying the parsed textual input and any received expressive operator selections as selectable tokens; in response to receiving a selection of a token, displaying a set of prosodic properties that can be applied to the selected token; and in response to receiving a selection of a prosodic property, displaying a value associated with the selected prosodic property for allowing a user to adjust the value for controlling expressivity of the textual input when rendered. 10 . The method of claim 9 , wherein receiving textual input comprises receiving an upload of an existing text file. 11 . The method of claim 1 , further comprising: providing an active listening mode; and in response to receiving a selection to launch the active listening mode, displaying a plurality of selectable active listening mode effect options, wherein each active listening mode effect option has an associated sound effect; in response to receiving a selection of an active listening mode effect option: identifying the associated sound effect; and outputting the associated sound effect to a speech generation engine for playing the associated sound effect on a conversation partner's audio output device. 12 . The method of claim 11 , wherein: displaying the plurality of selectable active listening mode effect options comprises displaying a plurality of selectable active listening mode effect options wherein each active listening mode effect option has an associated visual effect; and in response to receiving a selection of an active listening mode effect option: identifying the associated visual effect; and outputting the associated visual effect to a visualization generation engine for rendering the associated visual effect on a visual output device. 13 . A system for generating expressive content, the computing device comprising: at least one processing device; and at least one computer readable data storage device storing instructions that, when executed by the at least one processing device, cause the computing device to provide an expressive synthesized speech system, the expressive synthesized speech system operative to: display an expressive keyboard, wherein the expressive keyboard includes an alpha-numeric keyboard for receiving textual input and a plurality of expressive operators for selectively applying an emotional tone or a vocal sound effect associated with each of the plurality of expressive operators to received textual input; receive textual input; in response to receiving a selection of an expressive operator, identify a predefined set of one or more prosody attributes and vocal sound effects associated with the selected expressive operator; combine the associated predefined set of prosody attributes or the vocal sound effect with the received textual input; and output the combined set of prosody attributes or the vocal sound effect and textual input to a speech generation engine for generating expressive synthesized speech. 14 . The system of claim 13 , wherein: the plurality of expressive operators comprises a plurality of emoji objects, each emoji object illustrating an emotion; one or more of the plurality of emoji objects has an associated visual effect; and in response to a selection of an emoji object, the expressive synthesized speech system is further operative to: identify a visual effect associated with the selected emoji object; and output the visual effect to a visualization generation engine for generating an expressive display of the visual effect. 15 . The system of claim 13 , wherein in combining the associated predefined set of prosody attributes with the received textual input, the expressive synthesized speech system is operative to apply at least one of pause length, pitch, speed, and emphasis properties to the textual input. 16 . The system of claim 13 , wherein the expressive synthesized speech system is further operative to: provide a voicesetting editor interface; in response to receiving a selection to launch the voicesetting editor interface: parse the textual input and any received expressive operator selections; display the parsed textual input and any received expressive operator selections as selectable tokens; in response to receiving a selection of a token, display a set of prosodic properties that can be applied to the selected token; and in response to receiving a selection of a prosodic property, display a value associated with the selected prosodic property for allowing a user to adjust the value for controlling expressivity of the textual input when rendered.
using icons (graphical or visual programming using iconic symbols G06F8/34) · CPC title
using selection techniques to select from displayed items · CPC title
using prediction or retrieval techniques · CPC title
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
Animation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.