Automatic interpretation method and apparatus
US-2018011843-A1 · Jan 11, 2018 · US
US11514885B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11514885-B2 |
| Application number | US-201616342416-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 21, 2016 |
| Priority date | Nov 21, 2016 |
| Publication date | Nov 29, 2022 |
| Grant date | Nov 29, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An automatic dubbing method is disclosed. The method comprises: extracting speeches of a voice from an audio portion of a media content ( 504 ); obtaining a voice print model for the extracted speeches of the voice ( 506 ); processing the extracted speeches by utilizing the voice print model to generate replacement speeches ( 508 ); and replacing the extracted speeches of the voice with the generated replacement speeches in the audio portion of the media content ( 510 ).
Opening claim text (preview).
The invention claimed is: 1. An automatic dubbing method, comprising: extracting speeches of a first voice from an audio portion of a media content; receiving an audio input of a second voice of a user of a user device; extracting speeches of the second voice from the audio input including a set of phonemes of the second voice; generating a voice print model for the extracted speeches of the second voice; receiving a selection of the media content for playback on the user device by the user of the user device; responsive to receiving the selection of the media content for playback and during playback of the media content on the user device; processing the extracted speeches of the first voice by utilizing the voice print model to generate replacement speeches, the replacement speeches generated using the set of phonemes of the second voice; replacing the extracted speeches of the first voice with the generated replacement speeches in the audio portion of the media content; and outputting the audio portion with the generated replacement speeches for playing on the user device. 2. The method of claim 1 , wherein the generating the voice print model further comprises: sampling speeches of the user by using a speech capturing device and creating the voice print model based on the sampled speeches of the user. 3. The method of claim 1 , wherein the generating the voice print model further comprises: creating the voice print model for the voice based further on at least one of a closed caption, a subtitle, a script, a transcript, and a lyric of the media content. 4. The method of claim 1 , further comprises: determining positional data of speeches in the audio portion based on predefined speaker locations for the audio portion and a virtual microphone array. 5. The method of claim 1 , wherein the extracting speeches of the first voice comprises: detecting the speeches from the audio portion of the media content based on a plurality of audio versions in different languages; or detecting the speeches from the audio portion of the media content based on a plurality of audio channels and positional data obtained from the audio portion; or detecting the speeches from the audio portion of the media content based on predefined speaker locations and a virtual microphone array. 6. The method of claim 1 , wherein the processing the extracted speeches further comprises: translating the extracted speeches of the first voice in a first language to the replacement speeches in a second language by utilizing the voice print model. 7. The method of claim 6 , wherein the translating further comprises: generating the translated replacement speeches by further utilizing characteristics of the extracted speeches of the first voice, wherein the characteristics includes at least one of a stress, a tonality, a speed, a volume and an inflection of the speeches. 8. The method of claim 7 , wherein the translating further comprises: performing speech-to-text conversion for the extracted speeches of the first voice based on at least one of a closed caption, a subtitle, a script, a transcript and a lyric of the media content; and/or performing text-to-text translation for text converted from the first language to the second language based on at least one of the characteristics of the speeches, a genre information of the media content, a scene knowledge; and generating the translated replacement speeches for the first voice by performing text-to-speech conversion for the translated text based on the voice print model and the characteristics of the extracted speeches. 9. The method of claim 1 , wherein the extracting speeches of the first voice comprises: grouping the speeches to be associated with the first voice based on at least one of: voice characteristic of the speeches, audio positional data, detection of visual scene transition, visual recognition of speaker, subtitles, and closed captions. 10. The method of claim 1 , wherein the replacing comprises: muting the speeches of the first voice from the audio portion; and adding the replacement speeches in place of the muted speeches in the audio portion. 11. The method of claim 10 , wherein the muting comprises: muting the speeches of the first voice by utilizing the extracted speeches from the audio portion; or muting the speeches of the voice by utilizing a plurality of audio channels obtained from the audio portion based on positional data; or regenerating speeches for the voice based on the voice print model of the voice and positional data, and muting the speeches based on the regenerated speeches. 12. An automatic dubbing apparatus, comprising: a speech extracting module configured to extract speeches of a first voice from an audio portion of a media content, receive an audio input of a second voice of a user of a user device and configured to extract speeches of the second voice from the audio input including a set of phonemes of the second voice; a voice print model obtaining module configured to generate a voice print model for the extracted speeches of the second voice; and a speech processing module configured to, responsive to receiving a selection of the media content for playback on the user device and during playback of the media content on the user device: process the extracted speeches of the first voice by utilizing the voice print model to generate replacement speeches, the replacement speeches generated using the set of phonemes of the second voice; replace the extracted speeches of the first voice with the generated replacement speeches of the second voice in the audio portion of the media content; and output the audio portion with the generated replacement speeches for playing on the user device. 13. The apparatus of claim 12 , wherein the speech extracting module is further configured to: detect the speeches from the audio portion of the media content based on a plurality of audio versions in different languages; or detect the speeches from the audio portion of the media content based on a plurality of audio channels and positional data obtained from the audio portion; or detect the speeches from the audio portion of the media content based on predefined speaker locations and a virtual microphone array. 14. The apparatus of claim 12 , wherein the speech extracting module is further configured to: group the speeches to be associated with the first voice based on at least one of: voice characteristic of the speeches, audio positional data, detection of visual scene transition, visual recognition of speaker, subtitles, and closed captions. 15. The apparatus of claim 12 , wherein the voice print model obtaining module is further configured to: create the voice print model based on speeches of the user, which are sampled by using a speech capturing device. 16. The apparatus of claim 15 , wherein the voice print model obtaining module is further configured to: create the voice print model for the first voice based on the extracted speeches of the first voice and at least one of a closed caption, a subtitle, a script, a transcript, and a lyric of the media content. 17. The apparatus of one of claim 12 , wherein the speech processing module is further configured to: translate the extracted speeches of the first voice in a first language to the replacement speeches in a second language by utilizing the voice print model. 18. The apparatus of claim 17 , wherein the speech processing module is further configured to: generate the translated replacement speeches
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Detection of discrete points within a voice signal · CPC title
Speech synthesis; Text to speech systems · CPC title
Speaker identification or verification techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.