Who is the assignee on this patent?

Beijing Bytedance Network Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G10L13/04. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Synchronization method and apparatus for audio and text, device, and medium

US12562147B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12562147-B2
Application number	US-202218283433-A
Country	US
Kind code	B2
Filing date	Feb 15, 2022
Priority date	Mar 31, 2021
Publication date	Feb 24, 2026
Grant date	Feb 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a synchronization method and apparatus for audio and text, a device, and a medium. The method includes: determining a plurality of first text segments for audio conversion and a second text for reading display, in which the plurality of first text segments and the second text are from an initial text; converting the plurality of first text segments into audio segments, to obtain a first mapping relationship between the first text segments and the audio segments; performing matching on the first text segments and the second text, to obtain a second mapping relationship between the first text segments and second text segments in the second text; determining the second text segment synchronized with each of the audio segments based on the first mapping relationship and the second mapping relationship.

First claim

Opening claim text (preview).

What is claimed is: 1 . A synchronization method for audio and text, performed by a server, comprising: determining a plurality of first text segments for audio conversion and a second text for reading display, the plurality of first text segments and the second text being from an initial text; converting the plurality of first text segments into audio segments playable by an audio device of a terminal, to obtain a first mapping relationship between the plurality of first text segments and the audio segments; performing matching on the plurality of first text segments and the second text, to obtain a second mapping relationship between the plurality of first text segments and second text segments in the second text; and determining a second text segment synchronized with each of the audio segments based on the first mapping relationship and the second mapping relationship; sending each of the audio segments and the second text segment synchronized with each of the audio segments to a client installed on the terminal, to enable the client to play each of the audio segments via the audio device while displaying the second text segment synchronized with the played audio segment on a user interface of the client. 2 . The method according to claim 1 , wherein the performing the matching on each of the plurality of first text segments and the second text comprises: performing matching on each of the plurality of first text segments and the second text based on one or more symbols in each of the plurality of first text segments and one or more symbols in the second text. 3 . The method according to claim 2 , wherein the performing the matching on each of the plurality of first text segments and the second text based on one or more symbols in each of the plurality of first text segments and one or more symbols in the second text comprises: deleting the one or more symbols in the second text to obtain a third text; and for each of the plurality of first text segments: deleting the one or more symbols in the first text segment to obtain a first temporary text segment; searching the third text for a second temporary text segment same as the first temporary text segment; searching the second text for a first symbol previous to the second temporary text segment and a second symbol following the second temporary text segment; and determining, based on the first symbol and the second symbol, the second text segment in the second text that matches with the first text segment. 4 . The method according to claim 3 , wherein the determining, based on the first symbol and the second symbol, the second text segment in the second text that matches with the first text segment comprises: determining, based on the first text segment, a third symbol previous to the first temporary text segment and a fourth symbol following the first temporary text segment; performing matching on the first symbol and third second symbol and on the second symbol and the fourth symbol, respectively; and determining, based on a result of the matching, the second text segment in the second text that matches with the first text segment. 5 . The method according to claim 4 , wherein the determining, based on the result of the matching, the second text segment in the second text that matches with the first text segment comprises: determining a starting position of the second text segment as the first symbol and an ending position of the second text segment as the second symbol, when the result of the matching indicates that the first symbol is same as the third symbol and the second symbol is same as the fourth symbol; determining the starting position of the second text segment as the first symbol and the ending position as an end of the second text segment, when the result of the matching indicates that the first symbol is same as the third symbol and the second symbol is different from the fourth symbol; determining that the starting position of the second text segment as a beginning of the second text segment and the ending position as the second symbol, when the result of the matching indicates that the first symbol is different from the third symbol and the second symbol is same as the fourth symbol; and determining the starting position of the second text segment as the beginning of the second text segment and the ending position as the end of the second text segment, when the result of the matching indicates that the first symbol is different from the third symbol and the second symbol is different from the fourth symbol. 6 . The method according to claim 3 , further comprising: merging the first text segment with a next first text segment to obtain a merged text segment, when no second temporary text segment same as the first temporary text segment is found in the third text; determining an ending position of a previous first text segment to the first text segment in the second text as a starting position of the merged text segment in the second text; and determining an ending position of a next first text segment in the second text as an ending position of the merged text segment in the second text. 7 . The method according to claim 1 , wherein the determining the plurality of first text segments for audio conversion and the second text for reading display comprises: obtaining the initial text, and determining, based on the initial text, a first text for audio conversion and the second text for the reading display; and splitting the first text into the plurality of first text segments. 8 . The method according to claim 7 , wherein the determining, based on the initial text, the first text for audio conversion and the second text for reading display comprises: performing first text normalization processing on the initial text to obtain the first text; and performing second text normalization processing on the initial text to obtain the second text. 9 . The method according to claim 8 , wherein: the first text normalization processing comprises one or more of: deleting target content satisfying a first predetermined condition from the initial text; and performing punctuating on a sentence exceeding a length threshold; and the second text normalization processing comprises deleting target content satisfying a second predetermined condition from the initial text. 10 . The method according to claim 7 , wherein the splitting the first text into the plurality of first text segments comprises: determining one or more symbols in the first text, and splitting the first text based on the one or more symbols, to obtain the plurality of first text segments. 11 . The method according to claim 1 , further comprising: synthesizing the audio segments into a complete audio, and determining an audio starting time of each of the audio segments in the complete audio; and determining, based on the second text segment synchronized with each of the audio segments, a synchronization relationship between the audio starting time and a text starting position of the second text segment in the second text. 12 . The method according to claim 11 , further comprising: obtaining an association relationship by associating the complete audio, the second text, and the synchronization relationship. 13 . A synchronization method for audio and text, performed by a client installed on a terminal, comprising: obtaining a plurality of audio segments and a second text segment synchronized with each of the plurality of audio segments from a server, wherein the plurality of audio segments and the second text segment synchronized with each of the plurality of audio segments

Assignees

Beijing Bytedance Network Tech Co Ltd

Inventors

Classifications

G10L21/055
for synchronising with other signals, e.g. video signals · CPC title
G06F40/10
Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title
G10L13/04Primary
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
G10L13/02Primary
Methods for producing synthetic speech; Speech synthesisers · CPC title

Patent family

Related publications grouped by family.

View patent family 76672952

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12562147B2 cover?: Provided are a synchronization method and apparatus for audio and text, a device, and a medium. The method includes: determining a plurality of first text segments for audio conversion and a second text for reading display, in which the plurality of first text segments and the second text are from an initial text; converting the plurality of first text segments into audio segments, to obtain a …
Who is the assignee on this patent?: Beijing Bytedance Network Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L13/04. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).