Leveraging query executions to improve index recommendations
US-2020272667-A1 · Aug 27, 2020 · US
US12283085B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12283085-B2 |
| Application number | US-202217902323-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 2, 2022 |
| Priority date | Mar 31, 2022 |
| Publication date | Apr 22, 2025 |
| Grant date | Apr 22, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided is a data labeling method based on artificial intelligence, an apparatus, and a storage medium relating to the field of artificial intelligence, particularly data labeling, image recognition, and natural language processing. The method includes: determining a plurality of samples involved in clustering; performing a plurality of following operations circularly to realize iterative processing, until a convergence condition is satisfied or a quantity of iterations reaches a number threshold, comprising: pre-clustering the plurality of samples according to a vector representation of the respective samples to obtain a plurality of class clusters, each class cluster containing at least one sample; receiving labeling information for the respective class clusters and re-determining the plurality of samples according to the labeling information; and determining a clustering result according to the labeling information for the respective class clusters.
Opening claim text (preview).
What is claimed is: 1. A data labeling method based on artificial intelligence, comprising: determining a plurality of samples involved in clustering; performing a plurality of following operations circularly to realize iterative processing, until a convergence condition is satisfied, or a quantity of iterations reaches a number threshold, comprising: pre-clustering the plurality of samples involved in clustering, according to a vector representation of the respective samples involved in clustering, to obtain a plurality of class clusters, wherein each class cluster contains at least one sample involved in clustering; receiving labeling information for the respective class clusters, wherein the labeling information for the respective class clusters comprises: at least one sub-cluster contained in the respective class clusters, and a representative sample in each sub-cluster, wherein the sub-cluster comprises one representative sample and at least one non-representative sample; re-determining the plurality of samples involved in clustering, according to the labeling information by: taking the representative sample in the sub-cluster in the labeling information for the respective class clusters, as the re-determined plurality of samples involved in clustering; for the representative sample, determining a non-representative sample that belongs to, in a previous iteration process, a same sub-cluster as the representative sample; and determining a sub-cluster to which the non-representative sample belongs in a current iteration process, to be the same as a sub-cluster to which the representative sample belongs in the current iteration process; and determining a clustering result according to the labeling information for the respective class clusters. 2. The method of claim 1 , wherein pre-clustering the plurality of samples involved in clustering, according to the vector representation of the respective samples involved in clustering, comprises: pre-clustering the plurality of samples involved in clustering, by using a cluster algorithm in combination with a restriction condition, to enable the respective class clusters obtained by the pre-clustering to satisfy the restriction condition. 3. The method of claim 2 , wherein the restriction condition comprises at least one of: that a quantity of samples involved in clustering contained in each class cluster is not greater than a sample number threshold; or that respective samples involved in clustering contained in each class cluster belong to, in a pre-clustering process of a last iterative processing, different class clusters. 4. The method of claim 3 , wherein pre-clustering the plurality of samples involved in clustering, by using the cluster algorithm in combination with the restriction condition, comprises: determining a density of the respective samples involved in clustering; and performing following operations for the respective samples involved in clustering respectively, according to a descending order of densities: determining a plurality of neighboring samples of a sample involved in clustering; and traversing the respective neighboring samples in sequence according to a descending order of similarities between the respective neighboring samples and the sample involved in clustering, wherein the sample involved in clustering is added to a class cluster to which a neighboring sample belongs, in a case of all of first judgment conditions are satisfied, wherein the first judgment conditions comprise: that a density of the neighboring sample is greater than a density of the sample involved in clustering; that the class cluster to which the neighboring sample belongs exists; that a similarity between the neighboring sample and the sample involved in clustering is greater than or equal to a similarity threshold; that a quantity of samples contained in the class cluster to which the neighboring sample belongs is less than the sample number threshold; and that the neighboring sample and the sample involved in clustering belong to, in the pre-clustering process of the last iterative processing, different class clusters. 5. The method of claim 4 , further comprising: establishing a new class cluster, in a case of at least one of the first judgment conditions is not satisfied, the new class cluster including the sample involved in clustering. 6. The method of claim 3 , wherein pre-clustering the plurality of samples involved in clustering, by using the cluster algorithm in combination with the restriction condition, comprises: selecting a part from the plurality of samples involved in clustering; taking each selected sample involved in clustering as a cluster center; and for each sample involved in clustering other than the cluster center, adding the sample involved in clustering to a class cluster to which a nearest cluster center belongs, in a case of all of second judgment conditions are satisfied, wherein the second judgment conditions comprise: that a quantity of samples contained in the class cluster to which the nearest cluster center belongs is less than the sample number threshold; and that the class cluster to which the nearest cluster center belongs does not include a sample, wherein the sample belongs to, in the pre-clustering process of the last iterative processing, a same class cluster as the sample involved in clustering. 7. The method of claim 6 , further comprising: adding the sample involved in clustering to a class cluster to which another cluster center belongs, in a case of at least one of the second judgment conditions is not satisfied. 8. The method of claim 3 , wherein the convergence condition comprises that a quantity of samples contained in the respective class clusters is less than the sample number threshold. 9. The method of claim 3 , wherein the number threshold is determined by the sample number threshold and a quantity of samples involved in clustering in a first iteration process. 10. The method of claim 1 , wherein the sample involved in clustering comprises an image sample or a text sample. 11. An electronic apparatus, comprising: at least one processor; and a memory connected in communication with the at least one processor, wherein the memory stores an instruction executable by the at least one processor, and the instruction, when executed by the at least one processor, enables the at least one processor to execute: determining a plurality of samples involved in clustering; performing following operations circularly to realize iterative processing, until a convergence condition is satisfied, or a quantity of iterations reaches a number threshold: pre-clustering the plurality of samples involved in clustering, according to a vector representation of the respective samples involved in clustering, to obtain a plurality of class clusters, wherein each class cluster contains at least one sample involved in clustering; receiving labeling information for the respective class clusters, wherein the labeling information for the respective class clusters comprises: at least one sub-cluster contained in the respective class clusters, and a representative sample in each sub-cluster, wherein the sub-cluster comprises one representative sample and at least one non-representative sample; re-determining the plurality of samples involved in clustering, according to the labeling information by: taking the representative sample in the sub-cluster in the labeling information for the respective class clusters, as the re-determined plurality of samples involved in clustering; for the representative sample, determining a non-representative sample that belongs to, in a previous iteration process, a
using classification, e.g. of video objects · CPC title
Proximity, similarity or dissimilarity measures · CPC title
Clustering or classification · CPC title
Clustering; Classification · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.