Multi-User Personalization at a Voice Interface Device
US-2018096690-A1 · Apr 5, 2018 · US
US10930262B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10930262-B2 |
| Application number | US-201816147838-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 30, 2018 |
| Priority date | Feb 2, 2017 |
| Publication date | Feb 23, 2021 |
| Grant date | Feb 23, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A device for communicating with a remote device is disclosed, which includes a processor and a memory in communication with the processor. The memory includes executable instructions that, when executed, cause the processor to control the device to perform functions of establishing, via a communication network, a communication session with the remote device; capturing a speech spoken by a user and generating audio data representing the captured speech by the user; encoding the audio data for transmission to the remote device via the communication network; converting the audio data to text data representing the captured speech; and transmitting, during the communication session, the encoded audio data and the text data to the remote device via the communication network. The device thus can provide the text data representing the captured speech when a quality of the encoded audio signal received by the remote device is below a predetermined level.
Opening claim text (preview).
What is claimed is: 1. A system comprising first and second devices in communication with each other via a communication network, the system comprising: a processor; and a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the processor to control the system to perform functions of: during a communication session between the first and second devices, capturing, at the first device, a first speech spoken by a person, the first and second device storing a speech model specific to the person; generating audio data representing the captured first speech; converting, based on a speech model stored at the first device, the audio data to text data representing the captured first speech; during the communication session, transmitting, to the second device via the communication network, the audio data and the text data; receiving, at the first device, a user input that trains the speech model stored at the first device, the user input comprising correction of the text data converted from the audio data; updating, based on the received user input, a voice parameter value of the speech model stored at the first device; during the communication session, transmitting, to the second device via the communication network, the updated voice parameter value of the speech model; during the communication session, updating, at the second device, the speech model stored at the second device based on the updated voice parameter value transmitted to the second device; and converting, at the second device, the text data transmitted to the second device to a second speech based on the updated speech model stored at the second device. 2. The system of claim 1 , wherein: the voice parameter value is dynamically updated in response to the user input received during the communication session, and in response to the updating of the voice parameter value, the updated voice parameter value is dynamically transmitted to the second device during the communication session. 3. The system of claim 1 , wherein the text data and the audio data are continuously transmitted in parallel to the second device via the communication network. 4. The system of claim 1 , wherein the text data is selectively transmitted to the second device when a predetermined condition is met. 5. The system of claim 4 , wherein the predetermined condition includes a condition that a quality of the audio data received by the second device is below a predetermined level. 6. The system of claim 4 , wherein, for selectively transmitting the audio data, the instructions further include instructions that, when executed by the processor, cause the processor to control the system to perform functions of: transmitting the audio data to the second device via the communication network; receiving, from the second device, a feedback signal indicating that the predetermined condition is met; and in response to receiving the feedback signal, stopping transmitting the audio data and starting to transmit the text data to the second device via the communication network. 7. The system of claim 1 , wherein the instructions further include instructions that, when executed by the processor, cause the processor to control the system to perform a function of synchronizing the text data with the audio data. 8. The system of claim 7 , wherein, for synchronizing the text data with the audio data, the instructions further include instructions that, when executed, cause the processor to control the system to perform functions of: inserting a first time stamp into a portion of the audio data; and inserting a second time stamp into a portion of the text data corresponding to the portion of the audio data. 9. The system of claim 1 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of transmitting the audio data and the text data in separate packets to the second device via the communication network. 10. The system of claim 1 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: transmitting, to the second device, the audio data via a first communication modality; and transmitting, to the second device, the text data via a second communication modality having a higher robustness than the first communication modality. 11. The system of claim 10 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: transmitting, to the second device, the text data via a first transport layer protocol requiring retransmission of unreceived packets, and transmitting, to the second device, the audio data via a second transport layer protocol not involving retransmission of unreceived packets. 12. The system of claim 11 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: transmitting, to the second device, the text data at a first quality of service level, and transmitting, to the second device, the audio data at a second quality of service level that is lower than the first quality of service level. 13. A method of operating a system comprising first and second devices in communication with each other via a communication network, comprising: during a communication session between the first and second devices, capturing, at the first device, a first speech spoken by a person, the first and second devices storing a speech model specific to the person; generating audio data representing the captured first speech; converting, based on a speech model specific to the person, the audio data to text data representing the captured first speech; during the communication session, transmitting, to the second device via the communication network, the audio data and the text data; receiving, at the first device, a user input that trains the speech model stored at the first device, the user input comprising correction of the text data converted from the audio data; updating, based on the received user input, a voice parameter value of the speech model stored at the first device; during the communication session, transmitting, to the second device via the communication network, the updated voice parameter value of the speech model; during the communication session, updating, at the second device, the speech model stored at the second device based on the updated voice parameter value transmitted to the second device; and converting, at the second device, the text data transmitted to the second device to a second speech based on the updated speech model stored at the second device. 14. The method of claim 13 , wherein transmitting the audio data and the text data comprises continuously transmitting the text data and the audio data in parallel to the second device via the communication network. 15. The method of claim 13 , wherein transmitting the audio data and the text data comprises selectively transmitting the text data to the second device when a predetermined condition is met. 16. The method of claim 15 , wherein the predetermined condition includes a condition that a quality of the audio data received by the second device is below a predetermined level. 17. T
adapting media to network capabilities · CPC title
at the source (reformatting of additional data in video distribution servers H04N21/2355) · CPC title
Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis · CPC title
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
Public address systems (circuits for preventing acoustic reaction H04R3/02; circuits for distributing signals to loudspeakers H04R3/12; {monitoring or testing arrangements for public address systems H04R29/007}; amplifiers H03F) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.