Text processing method and apparatus, electronic device, and medium

US2023326466A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2023326466-A1
Application numberUS-202118043514-A
CountryUS
Kind codeA1
Filing dateAug 24, 2021
Priority dateAug 31, 2020
Publication dateOct 12, 2023
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a text processing method and apparatus, an electronic device, and a medium. The method includes the following: target text information generated based on audio information is acquired; a to-be-error-corrected word in the target text information and a target candidate replacement word corresponding to the to-be-error-corrected word are determined; and a target replacement word corresponding to the to-be-error-corrected word is determined according to the target candidate replacement word, and the target text information is updated based on the target replacement word.

First claim

Opening claim text (preview).

1 . A text processing method, comprising: acquiring target text information generated based on audio information; determining a to-be-error-corrected word in the target text information and a target candidate replacement word corresponding to the to-be-error-corrected word; and determining, according to the target candidate replacement word, a target replacement word corresponding to the to-be-error-corrected word, and updating the target text information based on the target replacement word. 2 . The method according to claim 1 , before acquiring the target text information generated based on the audio information, further comprising: collecting the audio information of a speaker and converting the audio information to corresponding text information; and generating, according to the text information, a speech timestamp corresponding to the speaker, and an identifier of the speaker, current text content displayed on a client, and determining the target text information based on the current text content. 3 . The method according to claim 2 , wherein acquiring the target text information generated based on the audio information comprises: determining a timestamp of text content without error correction among all text content, and acquiring text content without error correction within a preset duration based on the timestamp; and determining the target text information based on the text content without error correction within the preset duration. 4 . The method according to claim 3 , wherein the all text content is determined based on text information displayed in a preset region of the client, or the all text content is retrieved from a speech-to-text module. 5 . The method according to claim 1 , wherein determining the to-be-error-corrected word in the target text information and the target candidate replacement word corresponding to the to-be-error-corrected word comprises: determining the to-be-error-corrected word in the target text information and the target candidate replacement word corresponding to the to-be-error-corrected word in a correction manner corresponding to a correction type, to determine the target replacement word based on the target candidate replacement word, wherein the error correction type comprises a type of text-pronunciation-based error correction and a type of text-content-based error correction. 6 . The method according to claim 5 , wherein the error correction type comprises the type of text-pronunciation-based error correction, and adopting the error correction manner corresponding to the error correction type to determine the to-be-error-corrected word in the target text information and the target candidate replacement word corresponding to the to-be-error-corrected word comprises: acquiring a pronunciation of each piece of text in the target text information; determining, according to the pronunciation of the each piece of text and hot words pre-stored in a hot word dictionary, whether a target hot word corresponding to the pronunciation of the each piece of text exists in the target text information, wherein the hot word dictionary is configured to store a plurality of hot words, and the plurality of hot words are determined based on audio information and text information that are collected in a real-time interactive process; and in response to the target hot word corresponding to the pronunciation of the each piece of text existing in the target text information, determining the to-be-error-corrected word in the target text information according to the target hot word, and determining the target candidate replacement word based on the to-be-error-corrected word. 7 . The method according to claim 6 , wherein the type of text-pronunciation-based error correction comprises a type of pinyin-based error correction. 8 . The method according to claim 6 , wherein determining, according to the target candidate replacement word, the target replacement word corresponding to the to-be-error-corrected word, and updating the target text information based on the target replacement word comprises: acquiring a first to-be-processed sentence to which the to-be-error-corrected word belongs in the target text information, and updating the first to-be-processed sentence based on the target candidate replacement word to acquire a second to-be-processed sentence; determining a perplexity value of the second to-be-processed sentence; in response to the perplexity value being greater than or equal to a preset perplexity threshold, determining the target replacement word according to the to-be-error-corrected word; and in response to the perplexity value being less than the preset perplexity threshold, determining the target replacement word according to the target candidate replacement word. 9 . The method according to claim 5 , wherein the error correction type comprises the type of text-content-based error correction, and adopting the error correction manner corresponding to the error correction type to determine the to-be-error-corrected word in the target text information and the target candidate replacement word corresponding to the to-be-error-corrected word so as to determine the target replacement word based on the target candidate replacement word comprises at least one of: performing matching for text content in the target text information based on a pre-determined confusion word lexicon to determine a confusion word in the target text information, and determining the to-be-error-corrected word and the corresponding target candidate replacement word based on the confusion word; or determining a suspected to-be-error-corrected word in the target text information based on a pre-determined common word lexicon, and determining the to-be-error-corrected word and the corresponding target candidate replacement word based on the suspected to-be-error-corrected word. 10 . The method according to claim 9 , wherein performing the matching for the text content in the target text information based on the pre-determined confusion word lexicon to determine the confusion word in the target text information, and determining the to-be-error-corrected word and the corresponding target candidate replacement word based on the confusion word comprises: determining the confusion word in the target text information based on the pre-determined confusion word lexicon, and taking a word in the pre-determined confusion word lexicon corresponding to the confusion word as the target candidate replacement word; and determining, according to the target candidate replacement word, the target replacement word corresponding to the to-be-error-corrected word, and updating the target text information based on the target replacement word comprises: determining the target replacement word according to the target candidate replacement word; and updating, based on the target replacement word, the to-be-error-corrected word in the target text information corresponding to the target replacement word. 11 . The method according to claim 9 , before determining the suspected to-be-error-corrected word in the target text information based on the pre-determined common word lexicon, and determining the to-be-error-corrected word and the corresponding target candidate replacement word based on the suspected to-be-error-corrected word, further comprising: segmenting a sentence in the target text information to acquire at least one key word to determine the to-be-error-corrected word from the at least one key word. 12 . The method according to claim 11 , wherein determining the to-be-error-corrected word in the target text information and the target candidate replacement word cor

Assignees

Inventors

Classifications

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • G10L17/20Primary

    Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions · CPC title

  • Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • Use of phonemic categorisation or speech recognition prior to speaker recognition or verification · CPC title

  • G06F40/232Primary

    Orthographic correction, e.g. spell checking or vowelisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023326466A1 cover?
Provided are a text processing method and apparatus, an electronic device, and a medium. The method includes the following: target text information generated based on audio information is acquired; a to-be-error-corrected word in the target text information and a target candidate replacement word corresponding to the to-be-error-corrected word are determined; and a target replacement word corre…
Who is the assignee on this patent?
Beijing Bytedance Network Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L17/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 12 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).