Who is the assignee on this patent?

Beijing Bytedance Network Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06F16/353. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Annotation data determination method and apparatus, and readable medium and electronic device

US12405987B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12405987-B2
Application number	US-202218552781-A
Country	US
Kind code	B2
Filing date	Mar 17, 2022
Priority date	Mar 31, 2021
Publication date	Sep 2, 2025
Grant date	Sep 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to an annotation data determination method and apparatus, and a readable medium and an electronic device. By means of the present disclosure, high-quality data to be annotated is obtained for model performance evaluation. The method includes: acquiring candidate data from a candidate data set; respectively inputting the candidate data into a first text recognition model and a second text recognition model, so as to obtain a first recognition result output by the first text recognition model and a second recognition result output by the second text recognition model, wherein both the first text recognition model and the second text recognition model can recognize whether text data is of a target category; according to the first recognition result and the second recognition result, determining whether the candidate data meets an annotation condition, wherein the annotation condition is the category of the candidate data being recognized by the first text recognition model or the second text recognition model as at least one target category among target categories; and if it is determined that the candidate data meets the annotation condition, determining the candidate data as text data to be annotated.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for determining labeled data, comprising steps of: obtaining candidate data from a candidate data set, wherein the candidate data set is a set constituted by a plurality of unlabeled text data; inputting the candidate data into a first text recognition model and a second text recognition model respectively to obtain a first recognition result output from the first text recognition model and a second recognition result output from the second text recognition model, wherein the first text recognition model and the second text recognition model are both capable of recognizing whether text data belongs to a target category; determining whether the candidate data meets a labeling condition according to the first recognition result and the second recognition result, wherein the labeling condition is that the candidate data is recognized by at least one of the first text recognition model or the second text recognition model as belonging to the target category; determining the candidate data as text data needing to be labeled if it is determined that the candidate data meets the labeling condition; and determining the candidate data as text data not needing to be labeled if it is determined that the candidate data does not meet the labeling condition, wherein the first recognition result is a first score output from the first text recognition model for the candidate data, the second recognition result is a second score output from the second text recognition model for the candidate data, the determining whether the candidate data meets the labeling condition according to the first recognition result and the second recognition result comprises: determining that the candidate data meets the labeling condition if the first score is greater than or equal to a score threshold, or if the second score is greater than or equal to the score threshold, and wherein the score threshold is determined by the following steps: determining whether the text data meets the labeling condition for each text data in the candidate data set according to the first text recognition model, the second text recognition model and a target score used this time; increasing the target score if a number of text data in the candidate data set that meets the labeling condition is greater than a maximum sampling number; performing the step of determining whether the text data meets the labeling condition for each text data in the candidate data set again based on the increased target score and determining whether a number of text data in the candidate data set that meets the labeling condition is greater than the maximum sampling number; and determining the increased target score as the score threshold if the number of text data in the candidate data set that meets the labeling condition is less than or equal to the maximum sampling number. 2. The method for determining labeled data according to claim 1 , wherein the first recognition result and the second recognition result are both configured to indicate whether the candidate data belongs to the target category, and the determining whether the candidate data meets the labeling condition according to the first recognition result and the second recognition result comprises: determining that the candidate data meets the labeling condition if the first identification result indicates that the candidate data belongs to the target category, or if the second identification result indicates that the candidate data belongs to the target category. 3. The method for determining labeled data according to claim 1 , wherein the determining whether the text data meets the labeling condition for each text data in the candidate data set according to the first text recognition model, the second text recognition model and the target score used this time comprises: inputting the text data into the first text recognition model and the second text recognition model to obtain a third score output from the first text recognition model and a fourth score output from the second text model; and determining that the text data meets the labeling condition if the third score is greater than or equal to the target score, or if the fourth score is greater than or equal to the target score. 4. The method for determining labeled data according to claim 1 , further comprising, after the determining the candidate data as the text data to be labeled: repeating the steps until any of following two conditions is satisfied: all text data in the candidate data set are traversed; or a number of text data to be labeled reaches a preset sampling number. 5. The method for determining labeled data according to claim 1 , further comprising: obtaining labeling information for the text data to be labeled; labeling the text data to be labeled by using the labeling information to obtain labeled data; and adding the labeled data to an evaluation data set for performing model evaluation on the first text recognition model and the second text recognition model. 6. A non-transitory computer-readable medium having a computer program stored thereon that, when executed by a processing device, implements a method for determining labeled data, comprising: obtaining candidate data from a candidate data set, wherein the candidate data set is a set constituted by a plurality of unlabeled text data; inputting the candidate data into a first text recognition model and a second text recognition model respectively to obtain a first recognition result output from the first text recognition model and a second recognition result output from the second text recognition model, wherein the first text recognition model and the second text recognition model are both capable of recognizing whether text data belongs to a target category; determining whether the candidate data meets a labeling condition according to the first recognition result and the second recognition result, wherein the labeling condition is that the candidate data is recognized by at least one of the first text recognition model or the second text recognition model as belonging to the target category; determining the candidate data as text data needing to be labeled if it is determined that the candidate data meets the labeling condition; and determining the candidate data as text data not needing to be labeled if it is determined that the candidate data does not meet the labeling condition, wherein the first recognition result is a first score output from the first text recognition model for the candidate data, the second recognition result is a second score output from the second text recognition model for the candidate data, and the computer program implements following steps: determining that the candidate data meets the labeling condition if the first score is greater than or equal to a score threshold, or if the second score is greater than or equal to the score threshold, and wherein the score threshold is determined by the following steps: determining whether the text data meets the labeling condition for each text data in the candidate data set according to the first text recognition model, the second text recognition model and a target score used this time; increasing the target score if a number of text data in the candidate data set that meets the labeling condition is greater than a maximum sampling number, performing the determining whether the text data meets the labeling condition for each text data in the candidate data set, the second text recognition model and a target score used this time again based on the increased target score, and determining whether a number of text data in the candidate data set that meets the labeling condition is greater than the maximum sampling number; and determining the increase

Assignees

Beijing Bytedance Network Tech Co Ltd

Inventors

Classifications

G06F16/335
Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title
G06F16/353Primary
into predefined classes · CPC title

Patent family

Related publications grouped by family.

View patent family 76516828

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12405987B2 cover?: The present disclosure relates to an annotation data determination method and apparatus, and a readable medium and an electronic device. By means of the present disclosure, high-quality data to be annotated is obtained for model performance evaluation. The method includes: acquiring candidate data from a candidate data set; respectively inputting the candidate data into a first text recognition…
Who is the assignee on this patent?: Beijing Bytedance Network Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06F16/353. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).