Voice font speaker and prosody interpolation
US-2016379623-A1 · Dec 29, 2016 · US
US2023410788A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023410788-A1 |
| Application number | US-202318241126-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 31, 2023 |
| Priority date | Mar 2, 2021 |
| Publication date | Dec 21, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An electronic device includes a communication module and a processor operatively connected to the communication module. The processor is configured to: receive and store a first speech voice related to at least a first external device, and a second speech voice related to a second external device; if individual speech is detected, transmit the first speech voice or the second speech voice having a first playback speed to at least a first external device and a second external device; and, if simultaneous speech is detected, convert, into a second playback speed different from the first playback speed, at least a part of a synthesized voice in which at least first overlap speech of the first speech voice and at least second overlap speech of the second speech voice are successively connected, and transmit the synthesized voice to the at least first external device and the second external device.
Opening claim text (preview).
1 . An electronic device comprising: a communication module; and a processor operatively connected to the communication module, wherein the processor is configured to: receive and store a first uttered voice related to at least a first external device and a second uttered voice related to a second external device; transmit the first uttered voice or the second uttered voice having a first reproduction speed to at least the first external device and the second external device when a single utterance is sensed based on the first uttered voice and the second uttered voice; and convert a reproduction speed of at least a portion of a synthesized voice where at least a first overlapping utterance of the first uttered voice and at least a second overlapping utterance of the second uttered voice are continuously connected to each other to a second reproduction speed different from the first reproduction speed and transfer the synthesized voice to at least the first external device and the second external device when simultaneous utterances are sensed based on the first uttered voice and the second uttered voice. 2 . The electronic device of claim 1 , wherein the first reproduction speed is a speed substantially equal to an utterance speed of a speaker, and the second reproduction speed includes a speed higher than the first reproduction speed. 3 . The electronic device of claim 1 , wherein the processor is configured to: identify a first utterance time period related to the first overlapping utterance and a second utterance time period related to the second overlapping utterance; and determine the second reproduction speed such that the synthesized voice is reproduced within a time period smaller than a sum of the first utterance time period and the second utterance time period. 4 . The electronic device of claim 1 , wherein the processor is configured to convert a reproduction speed of at least one of at least a portion of the first overlapping utterance and at least a portion of the second overlapping utterance into the second reproduction speed. 5 . The electronic device of claim 1 , wherein the processor is configured to generate the synthesized voice with a silence period added between the first overlapping utterance and the second overlapping utterance. 6 . The electronic device of claim 1 , wherein the processor is configured to: obtain a portion of the first uttered voice corresponding to a certain range based on the first overlapping utterance as a first additional utterance; obtain a portion of the second uttered voice corresponding to a certain range based on the second overlapping utterance as a second additional utterance; and use the first additional utterance and the second additional utterance to generate the synthesized voice. 7 . The electronic device of claim 1 , wherein the processor is configured to: receive information related to the second reproduction speed from the first external device or the second external device; and convert the synthesized voice based on the received information. 8 . The electronic device of claim 1 , wherein the processor is configured to convert the synthesized voice such that a certain level of pitch is maintained for the first overlapping utterance and the second overlapping utterance. 9 . A method for operating an electronic device, the method comprising: receiving and storing a first uttered voice related to at least a first external device and a second uttered voice related to a second external device; sensing a single utterance or simultaneous utterances based on the first uttered voice and the second uttered voice; transmitting the first uttered voice or the second uttered voice having a first reproduction speed to at least the first external device and the second external device when the single utterance is sensed; and converting a reproduction speed of at least a portion of a synthesized voice where at least a first overlapping utterance of the first uttered voice and at least a second overlapping utterance of the second uttered voice are continuously connected to each other to a second reproduction speed different from the first reproduction speed and transferring the synthesized voice to at least the first external device and the second external device when the simultaneous utterances are sensed. 10 . The method of claim 9 , wherein the first reproduction speed is a speed substantially equal to an utterance speed of a speaker, and the second reproduction speed includes a speed higher than the first reproduction speed. 11 . The method of claim 9 , further comprising: identifying a first utterance time period related to the first overlapping utterance and a second utterance time period related to the second overlapping utterance; and determining the second reproduction speed such that the synthesized voice is reproduced within a time period smaller than a sum of the first utterance time period and the second utterance time period. 12 . The method of claim 9 , further comprising: converting a reproduction speed of at least one of at least a portion of the first overlapping utterance and at least a portion of the second overlapping utterance into the second reproduction speed. 13 . The method of claim 9 , further comprising: generating the synthesized voice with a silence period added between the first overlapping utterance and the second overlapping utterance. 14 . The method of claim 9 , further comprising: obtaining a portion of the first uttered voice corresponding to a certain range based on the first overlapping utterance as a first additional utterance; obtaining a portion of the second uttered voice corresponding to a certain range based on the second overlapping utterance as a second additional utterance; and using the first additional utterance and the second additional utterance to generate the synthesized voice. 15 . The method of claim 10 , further comprising: converting the synthesized voice such that a certain level of pitch is maintained for the first overlapping utterance and the second overlapping utterance. 16 . An electronic device operating method, comprising: receiving and storing first and second uttered phrases from first and second external devices, respectively; sensing simultaneous portions of the first and second uttered phrases; generating a synthesized voice for combining the simultaneous portions; and outputting the synthesized voice with a combination of the simultaneous portions to a user. 17 . The method of claim 16 , wherein the outputting comprises connecting the first and second uttered phrases. 18 . The method of claim 16 , wherein the outputting comprises connecting portions of the first uttered phrases with portions of the second uttered phrase. 19 . The method of claim 16 , wherein: the outputting comprises executing natural language processing with respect to the first and second uttered phrases and modifying the first and second uttered phrases in accordance with results of the natural language processing, the generating comprises generating the synthesized voice for combining the simultaneous portions as modified; and the outputting further includes outputting the synthesized voice with a combination of the simultaneous portions as modified to a user. 20 . The method of claim 16 , further comprising delaying the outputting in accordance with user instructions.
Pitch control · CPC title
using natural language modelling · CPC title
Overlap-add techniques · CPC title
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.