Method for selecting annotated sample, apparatus, electronic device and storage medium

US11907668B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11907668-B2
Application numberUS-202218148904-A
CountryUS
Kind codeB2
Filing dateDec 30, 2022
Priority dateFeb 9, 2022
Publication dateFeb 20, 2024
Grant dateFeb 20, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method for selecting an annotated sample. The method includes: determining a first attribute and a second attribute of a sample characteristic; in which the first attribute is a characteristic attribute of the sample characteristic in a source field sample set, and the second attribute is a characteristic attribute of the sample characteristic in a target field sample set; and determining a target annotated sample from a plurality of candidate annotated samples of the source field sample set according to the first attribute and the second attribute; in which the target annotated sample is configured to train a classification model, the classification model includes a model for determining an emotion polarity by analyzing an input sample to be classified.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for selecting an annotated sample, comprising: determining a first attribute and a second attribute of a sample characteristic; wherein the first attribute is a characteristic attribute of the sample characteristic in a source field sample set, and the second attribute is a characteristic attribute of the sample characteristic in a target field sample set; and determining a target annotated sample from a plurality of candidate annotated samples of the source field sample set according to the first attribute and the second attribute; wherein the target annotated sample is configured to train a classification model, the classification model comprises a model for determining an emotion polarity by analyzing an input sample to be classified; wherein determining the first attribute comprises: determining a first frequency value of the sample characteristic in the source field sample set; determining an importance level and an emotion polarity of the sample characteristic in the source field sample set using the first frequency value; and determining the importance level and the emotion polarity of the sample characteristic in the source field sample set as the first attribute. 2. The method as claimed in claim 1 , wherein determining the importance level of the sample characteristic in the source field sample set using the first frequency value comprises: determining a first chi-square value of the sample characteristic using the first frequency value and a sample number of the source field sample set; wherein the first chi-square value is a chi-square value of the sample characteristic in the source field sample set; and determining the importance level of the sample characteristic in the source field sample set based on the first chi-square value. 3. The method as claimed in claim 2 , wherein determining the emotion polarity of the sample characteristic in the source field sample set using the first frequency value comprises: determining an emotion polarity value of the sample characteristic using the first frequency value and the sample number of the source field sample set; and determining the emotion polarity of the sample characteristic in the source field sample set using the emotion polarity value. 4. The method as claimed in claim 1 , wherein determining the second attribute comprises: adding reference annotations for the target field sample set according to a preset condition; wherein the reference annotations comprises a positive annotation or a negative annotation; and determining the second attribute of the sample characteristic using the reference annotations. 5. The method as claimed in claim 4 , wherein determining the second attribute of the sample characteristic using the reference annotation comprises: determining a second frequency value of the sample characteristic in the target field sample set using the reference annotations; determining a reference importance level and a reference emotion polarity of the sample characteristic in the target field sample set using a sample number of the target field sample set and the second frequency value; and determining the reference importance level and the reference emotion polarity as the second attribute. 6. The method as claimed in claim 5 , wherein determining the reference importance level comprises: determining a second chi-square value and an estimated deviation value of the sample characteristic using the second frequency value and the sample number of the target field sample set; wherein the second chi-square value is a chi-square value of the sample characteristic in the target field sample set; and determining the reference importance level of the sample characteristic in the target field sample set using the second chi-square value and the estimated deviation value. 7. The method as claimed in claim 1 , wherein determining the target annotated sample from the plurality of candidate annotated samples of the source field sample set according to the first attribute and the second attribute comprises: determining a target sample characteristic according to the first attribute and the second attribute; and determining the target annotated sample by performing data filtration in the source field sample set using the target sample characteristic. 8. The method as claimed in claim 7 , wherein determining the target sample characteristic according to the first attribute and the second attribute comprises: selecting, in the source field sample set, at least one sample characteristic with both an importance level in the first attribute and a reference importance level in the second attribute meeting a predetermined condition as at least one candidate sample characteristic; and selecting, from the at least one candidate sample characteristic, a sample characteristic with an emotion polarity in the first attribute being same as a reference emotion polarity in the second attribute as the target sample characteristic. 9. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein, the memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor to cause the at least one processor to implement a method for selecting an annotated sample, comprising: determining a first attribute and a second attribute of a sample characteristic; wherein the first attribute is a characteristic attribute of the sample characteristic in a source field sample set, and the second attribute is a characteristic attribute of the sample characteristic in a target field sample set; and determining a target annotated sample from a plurality of candidate annotated samples of the source field sample set according to the first attribute and the second attribute; wherein the target annotated sample is configured to train a classification model, the classification model comprises a model for determining an emotion polarity by analyzing an input sample to be classified; wherein determining the first attribute comprises: determining a first frequency value of the sample characteristic in the source field sample set; determining an importance level and an emotion polarity of the sample characteristic in the source field sample set using the first frequency value; and determining the importance level and the emotion polarity of the sample characteristic in the source field sample set as the first attribute. 10. The electronic device as claimed in claim 9 , wherein determining the importance level of the sample characteristic in the source field sample set using the first frequency value comprises: determining a first chi-square value of the sample characteristic using the first frequency value and a sample number of the source field sample set; wherein the first chi-square value is a chi-square value of the sample characteristic in the source field sample set; and determining the importance level of the sample characteristic in the source field sample set based on the first chi-square value. 11. The electronic device as claimed in claim 10 , wherein determining the emotion polarity of the sample characteristic in the source field sample set using the first frequency value comprises: determining an emotion polarity value of the sample characteristic using the first frequency value and the sample number of the source field sample set; and determining the emotion polarity of the sample characteristic in the source field sample set using the emotion polarity value. 12. The electronic device as claimed in claim 9 , wherein determining the second attri

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Classification techniques · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11907668B2 cover?
The present disclosure provides a method for selecting an annotated sample. The method includes: determining a first attribute and a second attribute of a sample characteristic; in which the first attribute is a characteristic attribute of the sample characteristic in a source field sample set, and the second attribute is a characteristic attribute of the sample characteristic in a target field…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 20 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).