Model training method, apparatus, and device, and data similarity determining method, apparatus, and device

US11288599B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11288599-B2
Application numberUS-202016777659-A
CountryUS
Kind codeB2
Filing dateJan 30, 2020
Priority dateJul 19, 2017
Publication dateMar 29, 2022
Grant dateMar 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A model training method includes: acquiring a plurality of user data pairs, wherein data fields of two sets of user data in each user data pair have an identical part; acquiring a user similarity corresponding to each user data pair, wherein the user similarity is a similarity between users corresponding to the two sets of user data in each user data pair; determining, according to the user similarity corresponding to each user data pair and the plurality of user data pairs, sample data for training a preset classification model; and training the classification model based on the sample data to obtain a similarity classification model.

First claim

Opening claim text (preview).

The invention claimed is: 1. A model training method, comprising: acquiring a plurality of user data pairs, wherein each user data pair is acquired by comparing data fields of acquired user data to find two sets of user data corresponding to two different users, respectively, and having data fields that share an identical part to form the user data pair corresponding to the two different users; acquiring a user similarity corresponding to each user data pair, wherein the user similarity is a similarity between users corresponding to the two sets of user data in each user data pair; determining, according to the user similarity corresponding to each user data pair and the plurality of user data pairs, sample data for training a preset classification model, wherein the determining the sample data comprises: performing feature extraction on each user data pair in the plurality of user data pairs to obtain associated user features between the two sets of user data in each user data pair; and determining, according to the associated user features between the user data in each user data pair and the user similarity corresponding to each user data pair, the sample data for training the classification model, wherein the determining comprises: selecting positive sample features and negative sample features from user features corresponding to the plurality of user data pairs according to the user similarity corresponding to each user data pair and a predetermined similarity threshold; and using the positive sample features and the negative sample features as the sample data for training the classification model; and training the classification model based on the sample data to obtain a similarity classification model. 2. The method according to claim 1 , wherein the acquiring the user similarity corresponding to each user data pair comprises: acquiring biological features of users corresponding to a first user data pair, wherein the first user data pair is any user data pair in the plurality of user data pairs; and determining a user similarity corresponding to the first user data pair according to the biological features of the users corresponding to the first user data pair. 3. The method according to claim 2 , wherein the biological features comprise a facial image feature; the acquiring the biological features of the users corresponding to the first user data pair comprises: acquiring facial images of the users corresponding to the first user data pair; and performing feature extraction on the facial images to obtain facial image features of the users corresponding to the first user data pair; and the determining the user similarity corresponding to the first user data pair according to the biological features of the users corresponding to the first user data pair comprises: determining the user similarity corresponding to the first user data pair according to the facial image features of the users corresponding to the first user data pair. 4. The method according to claim 2 , wherein the biological features comprise a speech feature; the acquiring biological features of users corresponding to the first user data pair comprises: acquiring speech data of the users corresponding to the first user data pair; and performing feature extraction on the speech data to obtain speech features of the users corresponding to the first user data pair; and the determining the user similarity corresponding to the first user data pair according to the biological features of the users corresponding to the first user data pair comprises: determining the user similarity corresponding to the first user data pair according to the speech features of the users corresponding to the first user data pair. 5. The method according to claim 1 , wherein the associated user features comprise at least one of a household registration dimension feature, a name dimension feature, a social feature, or an interest feature, wherein the household registration dimension feature comprises a feature of user identity information, the name dimension feature comprises a feature of user name information and a feature of a degree of scarcity of a user surname, and the social feature comprises a feature of social relationship information of a user. 6. The method according to claim 1 , wherein the positive sample features comprise the same quantity of features as the negative sample features. 7. The method according to claim 1 , wherein the similarity classification model is a binary classifier model. 8. The method according to claim 1 , further comprising: acquiring a to-be-detected user data pair, the to-be-detected user data pair including two sets of to-be-detected user data; performing feature extraction on each set of to-be-detected user data in the to-be-detected user data pair to obtain to-be-detected user features; and determining a similarity between users corresponding to the two sets of to-be-detected user data in the to-be-detected user data pair according to the to-be-detected user features and the similarity classification model. 9. The method according to claim 8 , further comprising: determining to-be-detected users corresponding to the to-be-detected user data pair as twins if the similarity between the users corresponding to the two sets of to-be-detected user data in the to-be-detected user data pair is greater than a predetermined similarity classification threshold. 10. A model training device, comprising: a processor; and a memory configured to store instructions, wherein the processor is configured to execute the instructions to: acquire a plurality of user data pairs, wherein each user data pair is acquired by comparing data fields of acquired user data to find two sets of user data corresponding to two different users, respectively, and having data fields that share an identical part to form the user data pair corresponding to the two different users; acquire a user similarity corresponding to each user data pair, wherein the user similarity is a similarity between users corresponding to the two sets of user data in each user data pair; determine, according to the user similarity corresponding to each user data pair and the plurality of user data pairs, sample data for training a preset classification model, wherein determining the sample data comprises: performing feature extraction on each user data pair in the plurality of user data pairs to obtain associated user features between the two sets of user data in each user data pair; and determining, according to the associated user features between the user data in each user data pair and the user similarity corresponding to each user data pair, the sample data for training the classification model, wherein the determining comprises: selecting positive sample features and negative sample features from user features corresponding to the plurality of user data pairs according to the user similarity corresponding to each user data pair and a predetermined similarity threshold; and using the positive sample features and the negative sample features as the sample data for training the classification model; and train the classification model based on the sample data to obtain a similarity classification model. 11. The device according to claim 10 , wherein the processor is further configured to execute the instructions to: acquire biological features of users corresponding to a first user data pair, wherein the first user data pair is any user data pair in the plurality of user data pairs; and determine a user similarity corresponding to the first user data pair according to the biological features of the users correspond

Assignees

Inventors

Classifications

  • G06N20/20Primary

    Ensemble learning · CPC title

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Proximity, similarity or dissimilarity measures · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11288599B2 cover?
A model training method includes: acquiring a plurality of user data pairs, wherein data fields of two sets of user data in each user data pair have an identical part; acquiring a user similarity corresponding to each user data pair, wherein the user similarity is a similarity between users corresponding to the two sets of user data in each user data pair; determining, according to the user sim…
Who is the assignee on this patent?
Advanced New Technologies Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).