What technology area does this patent fall under?

Primary CPC classification G06F40/284. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for character-to-phone conversion

US12555563B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12555563-B2
Application number	US-202217888243-A
Country	US
Kind code	B2
Filing date	Aug 15, 2022
Priority date	Aug 15, 2022
Publication date	Feb 17, 2026
Grant date	Feb 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for training a model to perform end-to-end character-to-phoneme (C2P) conversion include: selecting a plurality of unlabeled sentences from a first data source, selecting a plurality of labeled sentences from a second data source, preprocessing a combined corpus of the selected unlabeled and labeled sentences to extract a plurality of linguistic features, generating mixed training data by automatically labeling tokens in the preprocessed corpus based on the plurality of extracted linguistic features, and training a pre-trained model, using the mixed training data, to perform end-to-end C2P conversion.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for training a model to perform end-to-end character-to-phoneme (C2P) conversion, performed by at least one processor and comprising: selecting, for a combined corpus of selected unlabeled and labeled sentences, a plurality of unlabeled sentences from a first data source wherein at least a predetermined minimum number of sentences are selected with respect to a target polyphone character; selecting, for the combined corpus, a plurality of labeled sentences from a second data source; preprocessing the combined corpus to extract a plurality of linguistic features; generating mixed training data by automatically labeling tokens in the preprocessed corpus based on the plurality of extracted linguistic features; further training a pre-trained model, using the mixed training data, to perform end-to-end C2P conversion, wherein the pre-trained model is configured to output a plurality of phoneme labels of a plurality of characters of a sentence that is input into the pre-trained model; and inputting a plurality of characters into the further trained model to obtain a plurality of phonemes corresponding to the plurality of characters. 2 . The method of claim 1 , wherein selecting the plurality of unlabeled sentences from the first data source comprises: selecting at least one sentence, from the plurality of unlabeled sentences, that includes a target polyphone character, for each of a plurality of target polyphone characters. 3 . The method of claim 1 , wherein selecting the plurality of unlabeled sentences from the first data source comprises: selecting a sentence, from the plurality of unlabeled sentences, only once for each of a plurality of target polyphone characters; and selecting at least a predetermined minimum number of sentences, from the plurality of unlabeled sentences, for each of the plurality of target polyphone characters. 4 . The method of claim 1 , wherein preprocessing the combined corpus of the selected unlabeled and labeled sentences to extract a plurality of linguistic features comprises: identifying one or more of a word boundary, part-of-speech (POS) tag, or named entity phrase using one or more linguistic tools; and extracting each identified word boundary, POS tag, or named entity phrase as a linguistic feature. 5 . The method of claim 1 , wherein preprocessing the combined corpus of the selected unlabeled and labeled sentences to extract a plurality of linguistic features comprises: identifying one or more tokens of the combined corpus that are associated with an extracted linguistic feature and a manually-labeled linguistic feature, wherein the extracted and manually-labeled linguistic features are mismatched; and masking the mismatched extracted linguistic feature associated with each of the one or more identified tokens. 6 . The method of claim 1 , wherein generating mixed training data by automatically labeling tokens in the preprocessed corpus based on the plurality of extracted linguistic features comprises: inputting the preprocessed corpus to a baseline system for automatic labeling, wherein the baseline system includes a plurality of rules and decision trees configured to label one or more tokens in the preprocessed corpus; obtaining a plurality of labeled tokens, output by the baseline system in response to inputting the preprocessed corpus, each token corresponding to a character in the preprocessed corpus, and each label corresponding to a phoneme associated with the character; and mixing the plurality of labeled tokens. 7 . The method of claim 6 , wherein mixing the plurality of labeled tokens comprises: identifying a mismatch among the plurality of labeled tokens, between an automatically generated label associated with a token and a manually assigned label associated with a token; and converting the mismatched automatically generated label to be consistent with the manually assigned label. 8 . The method of claim 1 , wherein training the pre-trained model, using the mixed training data, to perform end-to-end C2P conversion comprises: obtaining the pre-trained model that is previously trained to perform C2P conversion; and retraining the pre-trained model, using the mixed training data, to simultaneously label all characters in an input sentence when performing C2P conversion on the input sentence. 9 . An electronic device comprising: at least one memory configured to store computer program code; and at least one processor configured to operate as instructed by the computer program code, the computer program code including: selecting code configured to cause the at least one processor to select, for a combined corpus of selected unlabeled and labeled sentences, a plurality of unlabeled sentences from a first data source wherein at least a predetermined minimum number of sentences are selected with respect to a target polyphone character, and select, for the combined corpus, a plurality of labeled sentences from a second data source, preprocessing code configured to cause the at least one processor to preprocess the combined corpus to extract a plurality of linguistic features, generating code configured to cause the at least one processor to generate mixed training data by automatically labeling tokens in the preprocessed corpus based on the plurality of extracted linguistic features, training code configured to cause the at least one processor to further train a pre-trained model, using the mixed training data, to perform end-to-end C2P conversion, wherein the pre-trained model is configured to output a plurality of phoneme labels of a plurality of characters of a sentence that is input into the pre-trained model; and inference code configured to cause the at least one processor to input a plurality of characters into the further trained model to obtain a plurality of phonemes corresponding to the plurality of characters. 10 . The electronic device of claim 9 , wherein the selecting code is configured to cause the at least one processor to: select a sentence, from the plurality of unlabeled sentences, only once for each of a plurality of target polyphone characters; and select at least a predetermined minimum number of sentences, from the plurality of unlabeled sentences, for each of the plurality of target polyphone characters. 11 . The electronic device of claim 9 , wherein the preprocessing code is configured to cause the at least one processor to: identify one or more of a word boundary, part-of-speech (POS) tag, or named entity phrase using one or more linguistic tools; and extract each identified word boundary, POS tag, or named entity phrase as a linguistic feature. 12 . The electronic device of claim 9 , wherein the generating code is configured to cause the at least one processor to: input the preprocessed corpus to a baseline system for automatic labeling, wherein the baseline system includes a plurality of rules and decision trees configured to label one or more tokens in the preprocessed corpus; obtain a plurality of labeled tokens, output by the baseline system in response to inputting the preprocessed corpus, each token corresponding to a character in the preprocessed corpus, and each label corresponding to a phoneme associated with the character; and mix the plurality of labeled tokens. 13 . The electronic device of claim 9 , wherein the training code is configured to cause the at least one processor to: obtain the pre-trained model that is previously trained to perform C2P conversion; and retrain the pre-trained model, using the mixed training data, to simultaneously label all characters in an input

Assignees

Tencent America LLC

Inventors

Cui Jia

Classifications

G06F40/284Primary
Lexical analysis, e.g. tokenisation or collocates · CPC title
G10L13/08
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title
G06F40/30
Semantic analysis · CPC title
G06F40/295
Named entity recognition · CPC title
G10L13/047Primary
Architecture of speech synthesisers · CPC title

Patent family

Related publications grouped by family.

View patent family 89846555

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12555563B2 cover?: Systems and methods for training a model to perform end-to-end character-to-phoneme (C2P) conversion include: selecting a plurality of unlabeled sentences from a first data source, selecting a plurality of labeled sentences from a second data source, preprocessing a combined corpus of the selected unlabeled and labeled sentences to extract a plurality of linguistic features, generating mixed tr…
Who is the assignee on this patent?: Tencent America LLC
What technology area does this patent fall under?: Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Speech synthesis method, device and computer readable storage medium

Phoneme-based natural language processing

Active featuring in computer-human interactive learning

Frequently asked questions