Rendering responses to a spoken utterance of a user utilizing a local text-response map
US-2021097999-A1 · Apr 1, 2021 · US
US12562147B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12562147-B2 |
| Application number | US-202218283433-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 15, 2022 |
| Priority date | Mar 31, 2021 |
| Publication date | Feb 24, 2026 |
| Grant date | Feb 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided are a synchronization method and apparatus for audio and text, a device, and a medium. The method includes: determining a plurality of first text segments for audio conversion and a second text for reading display, in which the plurality of first text segments and the second text are from an initial text; converting the plurality of first text segments into audio segments, to obtain a first mapping relationship between the first text segments and the audio segments; performing matching on the first text segments and the second text, to obtain a second mapping relationship between the first text segments and second text segments in the second text; determining the second text segment synchronized with each of the audio segments based on the first mapping relationship and the second mapping relationship.
Opening claim text (preview).
What is claimed is: 1 . A synchronization method for audio and text, performed by a server, comprising: determining a plurality of first text segments for audio conversion and a second text for reading display, the plurality of first text segments and the second text being from an initial text; converting the plurality of first text segments into audio segments playable by an audio device of a terminal, to obtain a first mapping relationship between the plurality of first text segments and the audio segments; performing matching on the plurality of first text segments and the second text, to obtain a second mapping relationship between the plurality of first text segments and second text segments in the second text; and determining a second text segment synchronized with each of the audio segments based on the first mapping relationship and the second mapping relationship; sending each of the audio segments and the second text segment synchronized with each of the audio segments to a client installed on the terminal, to enable the client to play each of the audio segments via the audio device while displaying the second text segment synchronized with the played audio segment on a user interface of the client. 2 . The method according to claim 1 , wherein the performing the matching on each of the plurality of first text segments and the second text comprises: performing matching on each of the plurality of first text segments and the second text based on one or more symbols in each of the plurality of first text segments and one or more symbols in the second text. 3 . The method according to claim 2 , wherein the performing the matching on each of the plurality of first text segments and the second text based on one or more symbols in each of the plurality of first text segments and one or more symbols in the second text comprises: deleting the one or more symbols in the second text to obtain a third text; and for each of the plurality of first text segments: deleting the one or more symbols in the first text segment to obtain a first temporary text segment; searching the third text for a second temporary text segment same as the first temporary text segment; searching the second text for a first symbol previous to the second temporary text segment and a second symbol following the second temporary text segment; and determining, based on the first symbol and the second symbol, the second text segment in the second text that matches with the first text segment. 4 . The method according to claim 3 , wherein the determining, based on the first symbol and the second symbol, the second text segment in the second text that matches with the first text segment comprises: determining, based on the first text segment, a third symbol previous to the first temporary text segment and a fourth symbol following the first temporary text segment; performing matching on the first symbol and third second symbol and on the second symbol and the fourth symbol, respectively; and determining, based on a result of the matching, the second text segment in the second text that matches with the first text segment. 5 . The method according to claim 4 , wherein the determining, based on the result of the matching, the second text segment in the second text that matches with the first text segment comprises: determining a starting position of the second text segment as the first symbol and an ending position of the second text segment as the second symbol, when the result of the matching indicates that the first symbol is same as the third symbol and the second symbol is same as the fourth symbol; determining the starting position of the second text segment as the first symbol and the ending position as an end of the second text segment, when the result of the matching indicates that the first symbol is same as the third symbol and the second symbol is different from the fourth symbol; determining that the starting position of the second text segment as a beginning of the second text segment and the ending position as the second symbol, when the result of the matching indicates that the first symbol is different from the third symbol and the second symbol is same as the fourth symbol; and determining the starting position of the second text segment as the beginning of the second text segment and the ending position as the end of the second text segment, when the result of the matching indicates that the first symbol is different from the third symbol and the second symbol is different from the fourth symbol. 6 . The method according to claim 3 , further comprising: merging the first text segment with a next first text segment to obtain a merged text segment, when no second temporary text segment same as the first temporary text segment is found in the third text; determining an ending position of a previous first text segment to the first text segment in the second text as a starting position of the merged text segment in the second text; and determining an ending position of a next first text segment in the second text as an ending position of the merged text segment in the second text. 7 . The method according to claim 1 , wherein the determining the plurality of first text segments for audio conversion and the second text for reading display comprises: obtaining the initial text, and determining, based on the initial text, a first text for audio conversion and the second text for the reading display; and splitting the first text into the plurality of first text segments. 8 . The method according to claim 7 , wherein the determining, based on the initial text, the first text for audio conversion and the second text for reading display comprises: performing first text normalization processing on the initial text to obtain the first text; and performing second text normalization processing on the initial text to obtain the second text. 9 . The method according to claim 8 , wherein: the first text normalization processing comprises one or more of: deleting target content satisfying a first predetermined condition from the initial text; and performing punctuating on a sentence exceeding a length threshold; and the second text normalization processing comprises deleting target content satisfying a second predetermined condition from the initial text. 10 . The method according to claim 7 , wherein the splitting the first text into the plurality of first text segments comprises: determining one or more symbols in the first text, and splitting the first text based on the one or more symbols, to obtain the plurality of first text segments. 11 . The method according to claim 1 , further comprising: synthesizing the audio segments into a complete audio, and determining an audio starting time of each of the audio segments in the complete audio; and determining, based on the second text segment synchronized with each of the audio segments, a synchronization relationship between the audio starting time and a text starting position of the second text segment in the second text. 12 . The method according to claim 11 , further comprising: obtaining an association relationship by associating the complete audio, the second text, and the synchronization relationship. 13 . A synchronization method for audio and text, performed by a client installed on a terminal, comprising: obtaining a plurality of audio segments and a second text segment synchronized with each of the plurality of audio segments from a server, wherein the plurality of audio segments and the second text segment synchronized with each of the plurality of audio segments
for synchronising with other signals, e.g. video signals · CPC title
Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
Methods for producing synthetic speech; Speech synthesisers · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.