Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06F3/04817. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jul 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

User interface for generating expressive content

US2022230374A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2022230374-A1
Application number	US-202217713749-A
Country	US
Kind code	A1
Filing date	Apr 5, 2022
Priority date	Nov 9, 2016
Publication date	Jul 21, 2022
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Generation of expressive content is provided. An expressive synthesized speech system provides improved voice authoring user interfaces by which a user is enabled to efficiently author content for generating expressive output. An expressive synthesized speech system provides an expressive keyboard for enabling input of textual content and for selecting expressive operators, such as emoji objects or punctuation objects for applying predetermined prosody attributes or visual effects to the textual content. A voicesetting editor mode enables the user to author and adjust particular prosody attributes associated with the content for composing carefully-crafted synthetic speech. An active listening mode (ALM) is provided, which when selected, a set of ALM effect options are displayed, wherein each option is associated with a particular sound effect and/or visual effect. The user is enabled to rapidly respond with expressive vocal sound effects or visual effects while listening to others speak.

First claim

Opening claim text (preview).

We claim: 1 . A computer-implemented method for generating expressive content comprising: displaying an expressive keyboard, wherein the expressive keyboard includes an alpha-numeric keyboard for receiving textual input and a plurality of expressive operators for selectively applying an emotional tone or a vocal sound effect associated with each of the plurality of expressive operators to received textual input; receiving textual input; in response to receiving a selection of an expressive operator, identifying a predefined set of one or more prosody attributes and vocal sound effects associated with the selected expressive operator; combining the associated predefined set of prosody attributes or the vocal sound effect with the received textual input; and outputting the combined set of prosody attributes or the vocal sound effect and textual input to a speech generation engine for generating expressive synthesized speech. 2 . The method of claim 1 , wherein displaying the expressive keyboard including the plurality of expressive operators comprises displaying a plurality of emoji objects, wherein each emoji object is illustrative of an emotion. 3 . The method of claim 2 , further comprising displaying a plurality of emoji objects for selectively providing a visual effect associated with a selected emoji object as output. 4 . The method of claim 2 , wherein in response to a selection of an emoji object: identifying a visual effect associated with the selected emoji object; and outputting the visual effect to a visualization generation engine for generating an expressive display of the visual effect. 5 . The method of claim 2 , wherein displaying the plurality of emoji objects comprises: determining a set of emoji objects to display based on data associated with a user's emotional state, wherein the set includes emoji objects associated with an emotion corresponding to the user's emotional state; and displaying the set of emoji objects. 6 . The method of claim 1 , wherein combining the associated predefined set of prosody attributes with the received textual input comprises applying at least one of pause length, pitch, speed, and emphasis properties to the textual input. 7 . The method of claim 1 , wherein combining the vocal sound effect with the received textual input comprises combining a vocal sound effect selected from a group comprised of: a laugh; a sarcastic scoff, a sharp breath in; a disgusted “ugh” sound; an angry “argh” sound; and one or more user-provided sound effects. 8 . The method of claim 1 , wherein displaying the expressive keyboard including the plurality of expressive operators comprises displaying a plurality of punctuation objects. 9 . The method of claim 1 , further comprising; providing a voicesetting editor interface; in response to receiving a selection to launch the voicesetting editor interface: parsing the textual input and any received expressive operator selections; displaying the parsed textual input and any received expressive operator selections as selectable tokens; in response to receiving a selection of a token, displaying a set of prosodic properties that can be applied to the selected token; and in response to receiving a selection of a prosodic property, displaying a value associated with the selected prosodic property for allowing a user to adjust the value for controlling expressivity of the textual input when rendered. 10 . The method of claim 9 , wherein receiving textual input comprises receiving an upload of an existing text file. 11 . The method of claim 1 , further comprising: providing an active listening mode; and in response to receiving a selection to launch the active listening mode, displaying a plurality of selectable active listening mode effect options, wherein each active listening mode effect option has an associated sound effect; in response to receiving a selection of an active listening mode effect option: identifying the associated sound effect; and outputting the associated sound effect to a speech generation engine for playing the associated sound effect on a conversation partner's audio output device. 12 . The method of claim 11 , wherein: displaying the plurality of selectable active listening mode effect options comprises displaying a plurality of selectable active listening mode effect options wherein each active listening mode effect option has an associated visual effect; and in response to receiving a selection of an active listening mode effect option: identifying the associated visual effect; and outputting the associated visual effect to a visualization generation engine for rendering the associated visual effect on a visual output device. 13 . A system for generating expressive content, the computing device comprising: at least one processing device; and at least one computer readable data storage device storing instructions that, when executed by the at least one processing device, cause the computing device to provide an expressive synthesized speech system, the expressive synthesized speech system operative to: display an expressive keyboard, wherein the expressive keyboard includes an alpha-numeric keyboard for receiving textual input and a plurality of expressive operators for selectively applying an emotional tone or a vocal sound effect associated with each of the plurality of expressive operators to received textual input; receive textual input; in response to receiving a selection of an expressive operator, identify a predefined set of one or more prosody attributes and vocal sound effects associated with the selected expressive operator; combine the associated predefined set of prosody attributes or the vocal sound effect with the received textual input; and output the combined set of prosody attributes or the vocal sound effect and textual input to a speech generation engine for generating expressive synthesized speech. 14 . The system of claim 13 , wherein: the plurality of expressive operators comprises a plurality of emoji objects, each emoji object illustrating an emotion; one or more of the plurality of emoji objects has an associated visual effect; and in response to a selection of an emoji object, the expressive synthesized speech system is further operative to: identify a visual effect associated with the selected emoji object; and output the visual effect to a visualization generation engine for generating an expressive display of the visual effect. 15 . The system of claim 13 , wherein in combining the associated predefined set of prosody attributes with the received textual input, the expressive synthesized speech system is operative to apply at least one of pause length, pitch, speed, and emphasis properties to the textual input. 16 . The system of claim 13 , wherein the expressive synthesized speech system is further operative to: provide a voicesetting editor interface; in response to receiving a selection to launch the voicesetting editor interface: parse the textual input and any received expressive operator selections; display the parsed textual input and any received expressive operator selections as selectable tokens; in response to receiving a selection of a token, display a set of prosodic properties that can be applied to the selected token; and in response to receiving a selection of a prosodic property, display a value associated with the selected prosodic property for allowing a user to adjust the value for controlling expressivity of the textual input when rendered.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06F3/04817Primary
using icons (graphical or visual programming using iconic symbols G06F8/34) · CPC title
G06F3/0236
using selection techniques to select from displayed items · CPC title
G06F3/0237
using prediction or retrieval techniques · CPC title
G10L13/04
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
G06T13/00Primary
Animation · CPC title

Patent family

Related publications grouped by family.

View patent family 62063961

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022230374A1 cover?: Generation of expressive content is provided. An expressive synthesized speech system provides improved voice authoring user interfaces by which a user is enabled to efficiently author content for generating expressive output. An expressive synthesized speech system provides an expressive keyboard for enabling input of textual content and for selecting expressive operators, such as emoji object…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06F3/04817. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jul 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).