Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G10L13/033. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Artificially generated speech for a communication session

US10930262B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10930262-B2
Application number	US-201816147838-A
Country	US
Kind code	B2
Filing date	Sep 30, 2018
Priority date	Feb 2, 2017
Publication date	Feb 23, 2021
Grant date	Feb 23, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device for communicating with a remote device is disclosed, which includes a processor and a memory in communication with the processor. The memory includes executable instructions that, when executed, cause the processor to control the device to perform functions of establishing, via a communication network, a communication session with the remote device; capturing a speech spoken by a user and generating audio data representing the captured speech by the user; encoding the audio data for transmission to the remote device via the communication network; converting the audio data to text data representing the captured speech; and transmitting, during the communication session, the encoded audio data and the text data to the remote device via the communication network. The device thus can provide the text data representing the captured speech when a quality of the encoded audio signal received by the remote device is below a predetermined level.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising first and second devices in communication with each other via a communication network, the system comprising: a processor; and a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the processor to control the system to perform functions of: during a communication session between the first and second devices, capturing, at the first device, a first speech spoken by a person, the first and second device storing a speech model specific to the person; generating audio data representing the captured first speech; converting, based on a speech model stored at the first device, the audio data to text data representing the captured first speech; during the communication session, transmitting, to the second device via the communication network, the audio data and the text data; receiving, at the first device, a user input that trains the speech model stored at the first device, the user input comprising correction of the text data converted from the audio data; updating, based on the received user input, a voice parameter value of the speech model stored at the first device; during the communication session, transmitting, to the second device via the communication network, the updated voice parameter value of the speech model; during the communication session, updating, at the second device, the speech model stored at the second device based on the updated voice parameter value transmitted to the second device; and converting, at the second device, the text data transmitted to the second device to a second speech based on the updated speech model stored at the second device. 2. The system of claim 1 , wherein: the voice parameter value is dynamically updated in response to the user input received during the communication session, and in response to the updating of the voice parameter value, the updated voice parameter value is dynamically transmitted to the second device during the communication session. 3. The system of claim 1 , wherein the text data and the audio data are continuously transmitted in parallel to the second device via the communication network. 4. The system of claim 1 , wherein the text data is selectively transmitted to the second device when a predetermined condition is met. 5. The system of claim 4 , wherein the predetermined condition includes a condition that a quality of the audio data received by the second device is below a predetermined level. 6. The system of claim 4 , wherein, for selectively transmitting the audio data, the instructions further include instructions that, when executed by the processor, cause the processor to control the system to perform functions of: transmitting the audio data to the second device via the communication network; receiving, from the second device, a feedback signal indicating that the predetermined condition is met; and in response to receiving the feedback signal, stopping transmitting the audio data and starting to transmit the text data to the second device via the communication network. 7. The system of claim 1 , wherein the instructions further include instructions that, when executed by the processor, cause the processor to control the system to perform a function of synchronizing the text data with the audio data. 8. The system of claim 7 , wherein, for synchronizing the text data with the audio data, the instructions further include instructions that, when executed, cause the processor to control the system to perform functions of: inserting a first time stamp into a portion of the audio data; and inserting a second time stamp into a portion of the text data corresponding to the portion of the audio data. 9. The system of claim 1 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of transmitting the audio data and the text data in separate packets to the second device via the communication network. 10. The system of claim 1 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: transmitting, to the second device, the audio data via a first communication modality; and transmitting, to the second device, the text data via a second communication modality having a higher robustness than the first communication modality. 11. The system of claim 10 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: transmitting, to the second device, the text data via a first transport layer protocol requiring retransmission of unreceived packets, and transmitting, to the second device, the audio data via a second transport layer protocol not involving retransmission of unreceived packets. 12. The system of claim 11 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: transmitting, to the second device, the text data at a first quality of service level, and transmitting, to the second device, the audio data at a second quality of service level that is lower than the first quality of service level. 13. A method of operating a system comprising first and second devices in communication with each other via a communication network, comprising: during a communication session between the first and second devices, capturing, at the first device, a first speech spoken by a person, the first and second devices storing a speech model specific to the person; generating audio data representing the captured first speech; converting, based on a speech model specific to the person, the audio data to text data representing the captured first speech; during the communication session, transmitting, to the second device via the communication network, the audio data and the text data; receiving, at the first device, a user input that trains the speech model stored at the first device, the user input comprising correction of the text data converted from the audio data; updating, based on the received user input, a voice parameter value of the speech model stored at the first device; during the communication session, transmitting, to the second device via the communication network, the updated voice parameter value of the speech model; during the communication session, updating, at the second device, the speech model stored at the second device based on the updated voice parameter value transmitted to the second device; and converting, at the second device, the text data transmitted to the second device to a second speech based on the updated speech model stored at the second device. 14. The method of claim 13 , wherein transmitting the audio data and the text data comprises continuously transmitting the text data and the audio data in parallel to the second device via the communication network. 15. The method of claim 13 , wherein transmitting the audio data and the text data comprises selectively transmitting the text data to the second device when a predetermined condition is met. 16. The method of claim 15 , wherein the predetermined condition includes a condition that a quality of the audio data received by the second device is below a predetermined level. 17. T

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

H04L65/752
adapting media to network capabilities · CPC title
H04L65/762
at the source (reformatting of additional data in video distribution servers H04N21/2355) · CPC title
G10L19/0018
Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis · CPC title
G10L13/04
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
H04R27/00
Public address systems (circuits for preventing acoustic reaction H04R3/02; circuits for distributing signals to loudspeakers H04R3/12; {monitoring or testing arrangements for public address systems H04R29/007}; amplifiers H03F) · CPC title

Patent family

Related publications grouped by family.

View patent family 62980166

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10930262B2 cover?: A device for communicating with a remote device is disclosed, which includes a processor and a memory in communication with the processor. The memory includes executable instructions that, when executed, cause the processor to control the device to perform functions of establishing, via a communication network, a communication session with the remote device; capturing a speech spoken by a user …
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G10L13/033. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Multi-User Personalization at a Voice Interface Device

Method of outputting content of text data to sender voice

Speaker recognition including proactive voice model retrieval and sharing features

Frequently asked questions