Synchronization method and apparatus for audio and text, device, and medium

US12562147B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12562147-B2
Application numberUS-202218283433-A
CountryUS
Kind codeB2
Filing dateFeb 15, 2022
Priority dateMar 31, 2021
Publication dateFeb 24, 2026
Grant dateFeb 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a synchronization method and apparatus for audio and text, a device, and a medium. The method includes: determining a plurality of first text segments for audio conversion and a second text for reading display, in which the plurality of first text segments and the second text are from an initial text; converting the plurality of first text segments into audio segments, to obtain a first mapping relationship between the first text segments and the audio segments; performing matching on the first text segments and the second text, to obtain a second mapping relationship between the first text segments and second text segments in the second text; determining the second text segment synchronized with each of the audio segments based on the first mapping relationship and the second mapping relationship.

First claim

Opening claim text (preview).

What is claimed is: 1 . A synchronization method for audio and text, performed by a server, comprising: determining a plurality of first text segments for audio conversion and a second text for reading display, the plurality of first text segments and the second text being from an initial text; converting the plurality of first text segments into audio segments playable by an audio device of a terminal, to obtain a first mapping relationship between the plurality of first text segments and the audio segments; performing matching on the plurality of first text segments and the second text, to obtain a second mapping relationship between the plurality of first text segments and second text segments in the second text; and determining a second text segment synchronized with each of the audio segments based on the first mapping relationship and the second mapping relationship; sending each of the audio segments and the second text segment synchronized with each of the audio segments to a client installed on the terminal, to enable the client to play each of the audio segments via the audio device while displaying the second text segment synchronized with the played audio segment on a user interface of the client. 2 . The method according to claim 1 , wherein the performing the matching on each of the plurality of first text segments and the second text comprises: performing matching on each of the plurality of first text segments and the second text based on one or more symbols in each of the plurality of first text segments and one or more symbols in the second text. 3 . The method according to claim 2 , wherein the performing the matching on each of the plurality of first text segments and the second text based on one or more symbols in each of the plurality of first text segments and one or more symbols in the second text comprises: deleting the one or more symbols in the second text to obtain a third text; and for each of the plurality of first text segments: deleting the one or more symbols in the first text segment to obtain a first temporary text segment; searching the third text for a second temporary text segment same as the first temporary text segment; searching the second text for a first symbol previous to the second temporary text segment and a second symbol following the second temporary text segment; and determining, based on the first symbol and the second symbol, the second text segment in the second text that matches with the first text segment. 4 . The method according to claim 3 , wherein the determining, based on the first symbol and the second symbol, the second text segment in the second text that matches with the first text segment comprises: determining, based on the first text segment, a third symbol previous to the first temporary text segment and a fourth symbol following the first temporary text segment; performing matching on the first symbol and third second symbol and on the second symbol and the fourth symbol, respectively; and determining, based on a result of the matching, the second text segment in the second text that matches with the first text segment. 5 . The method according to claim 4 , wherein the determining, based on the result of the matching, the second text segment in the second text that matches with the first text segment comprises: determining a starting position of the second text segment as the first symbol and an ending position of the second text segment as the second symbol, when the result of the matching indicates that the first symbol is same as the third symbol and the second symbol is same as the fourth symbol; determining the starting position of the second text segment as the first symbol and the ending position as an end of the second text segment, when the result of the matching indicates that the first symbol is same as the third symbol and the second symbol is different from the fourth symbol; determining that the starting position of the second text segment as a beginning of the second text segment and the ending position as the second symbol, when the result of the matching indicates that the first symbol is different from the third symbol and the second symbol is same as the fourth symbol; and determining the starting position of the second text segment as the beginning of the second text segment and the ending position as the end of the second text segment, when the result of the matching indicates that the first symbol is different from the third symbol and the second symbol is different from the fourth symbol. 6 . The method according to claim 3 , further comprising: merging the first text segment with a next first text segment to obtain a merged text segment, when no second temporary text segment same as the first temporary text segment is found in the third text; determining an ending position of a previous first text segment to the first text segment in the second text as a starting position of the merged text segment in the second text; and determining an ending position of a next first text segment in the second text as an ending position of the merged text segment in the second text. 7 . The method according to claim 1 , wherein the determining the plurality of first text segments for audio conversion and the second text for reading display comprises: obtaining the initial text, and determining, based on the initial text, a first text for audio conversion and the second text for the reading display; and splitting the first text into the plurality of first text segments. 8 . The method according to claim 7 , wherein the determining, based on the initial text, the first text for audio conversion and the second text for reading display comprises: performing first text normalization processing on the initial text to obtain the first text; and performing second text normalization processing on the initial text to obtain the second text. 9 . The method according to claim 8 , wherein: the first text normalization processing comprises one or more of: deleting target content satisfying a first predetermined condition from the initial text; and performing punctuating on a sentence exceeding a length threshold; and the second text normalization processing comprises deleting target content satisfying a second predetermined condition from the initial text. 10 . The method according to claim 7 , wherein the splitting the first text into the plurality of first text segments comprises: determining one or more symbols in the first text, and splitting the first text based on the one or more symbols, to obtain the plurality of first text segments. 11 . The method according to claim 1 , further comprising: synthesizing the audio segments into a complete audio, and determining an audio starting time of each of the audio segments in the complete audio; and determining, based on the second text segment synchronized with each of the audio segments, a synchronization relationship between the audio starting time and a text starting position of the second text segment in the second text. 12 . The method according to claim 11 , further comprising: obtaining an association relationship by associating the complete audio, the second text, and the synchronization relationship. 13 . A synchronization method for audio and text, performed by a client installed on a terminal, comprising: obtaining a plurality of audio segments and a second text segment synchronized with each of the plurality of audio segments from a server, wherein the plurality of audio segments and the second text segment synchronized with each of the plurality of audio segments

Assignees

Inventors

Classifications

  • for synchronising with other signals, e.g. video signals · CPC title

  • Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title

  • G10L13/04Primary

    Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title

  • G10L13/02Primary

    Methods for producing synthetic speech; Speech synthesisers · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12562147B2 cover?
Provided are a synchronization method and apparatus for audio and text, a device, and a medium. The method includes: determining a plurality of first text segments for audio conversion and a second text for reading display, in which the plurality of first text segments and the second text are from an initial text; converting the plurality of first text segments into audio segments, to obtain a …
Who is the assignee on this patent?
Beijing Bytedance Network Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L13/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).