Data processing method, apparatus, and device, computer-readable storage medium, and computer program product

US12579472B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12579472-B2
Application numberUS-202218073333-A
CountryUS
Kind codeB2
Filing dateDec 1, 2022
Priority dateMar 10, 2021
Publication dateMar 17, 2026
Grant dateMar 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This application provides a federated-learning-based data processing method, apparatus, and device, and a computer-readable storage medium. The method includes obtaining data to be processed, the data to be processed comprising multiple object identifiers and a feature value corresponding to each object identifier; binning the data to be processed based on the feature value corresponding to each object identifier to obtain a number of bins; determining multiple target identifier sets from each bin, and transmitting each target identifier set to a label-party device; receiving each piece of set label distribution information corresponding to each target identifier set from the label-party device, and determining bin label distribution information corresponding to each bin based on each piece of set label distribution information; and merging bins based on a binning policy and each piece of bin label distribution information to obtain a final binning result.

First claim

Opening claim text (preview).

What is claimed is: 1 . A data processing method, applied to a feature-party device, comprising: obtaining data to be processed, the data to be processed comprising multiple object identifiers and a feature value corresponding to each object identifier; binning the data to be processed based on the feature value corresponding to each object identifier to obtain a number of bins; determining multiple target identifier sets from each bin, and transmitting each target identifier set to a label-party device; receiving each piece of set label distribution information corresponding to each target identifier set from the label-party device, and determining bin label distribution information corresponding to each bin based on each piece of set label distribution information; and merging bins based on a binning policy and each piece of bin label distribution information to obtain a final binning result. 2 . The method according to claim 1 , wherein the binning the data to be processed based on the feature value corresponding to each object identifier to obtain a number of bins comprises: determining a maximum feature value and a minimum feature value based on the feature values corresponding to the object identifiers; determining (M−1) feature quantile values based on the maximum feature value, the minimum feature value, and a number M; determining M feature intervals based on the minimum feature value, the maximum feature value, and the (M−1) feature quantile values; and binning based on each feature value in the data to be processed and the M feature intervals to obtain M bins, an i th bin comprising multiple pieces of feature data, feature values corresponding to the multiple pieces of feature data being in an i th feature interval, and i=1, 2, . . . , M. 3 . The method according to claim 2 , further comprising: obtaining a partition rule and a number of partitions N; determining a partition identifier of each piece of feature data in the i th bin based on each object identifier in the i th bin and the partition rule, the partition identifier corresponding to one of N partitions; and allocating each piece of feature data in the i th bin to an i th bin in the partition corresponding to the partition identifier of each piece of feature data. 4 . The method according to claim 1 , wherein the determining multiple target identifier sets from each bin comprises: determining a number of feature values S in an i th bin, i=1, 2, . . . , M, M being the number, and S being a positive integer; determining R unprocessed object identifiers corresponding to a j th feature value randomly, R being a positive integer greater than 2, and j=1, 2, . . . , S; and determining the R unprocessed object identifiers as a target identifier set. 5 . The method according to claim 1 , wherein the determining bin label distribution information corresponding to each bin based on each piece of set label distribution information comprises: obtaining, when the set label distribution information corresponding to an i th bin is not invalid information, a number of positive samples and number of negative samples in the set identifier distribution information, i=1, 2, . . . , M, and M being the number; determining the object identifier in the target identifier set corresponding to the set label distribution information as a processed object identifier; and updating the bin label distribution information corresponding to the i th bin based on the number of positive samples and number of negative samples in the set label distribution information until there is no unprocessed object identifier in the i th bin. 6 . The method according to claim 5 , further comprising: deleting the set label distribution information when the set label distribution information is the invalid information; and determining the object identifier in the target identifier set corresponding to the set label distribution information as an unprocessed object identifier. 7 . The method according to claim 1 , wherein the merging bins based on a binning policy and each piece of bin label distribution information to obtain a final binning result comprises: merging each bin with an adjacent bin of the bin to obtain multiple candidate bins; determining candidate label attribute information of each candidate bin, the candidate label attribute information comprising a number of positive samples, a number of negative samples, and a percentage of positive samples; determining an information value of each candidate bin in response to determining based on the candidate label attribute information that the candidate bin satisfies a merging condition; determining a target bin based on the information value of each candidate bin; and merging each target bin with an adjacent bin of the target bin again until an optimization objective is achieved, to obtain multiple final bins. 8 . The method according to claim 1 , further comprising: obtaining each final bin in each feature dimension; determining an information value of each final bin in each feature dimension and a total information value corresponding to each feature dimension; performing feature selection based on the information value of each final bin and each total information value to obtain multiple target final bins; and obtaining label distribution information of each target final bin, and performing modeling based on feature data in each target final bin and the label distribution information. 9 . A data processing method, applied to a label-party device, comprising: receiving a target identifier set transmitted from a feature-party device, and obtaining multiple object identifiers in the target identifier set; obtaining label information corresponding to each object identifier; determining set label distribution information of the target identifier set based on each piece of label information; and transmitting the set label distribution information to the feature-party device. 10 . The method according to claim 9 , wherein the determining set label distribution information of the target identifier set based on each piece of label information comprises: determining a number of positive samples and a number of negative samples based on each piece of label information; and determining the number of positive samples and the number of negative samples as the set label distribution information when the number of positive samples is less than a total number of object identifiers in the target identifier set and the number of negative samples is less than the total number of object identifiers. 11 . The method according to claim 10 , wherein the determining set label distribution information of the target identifier set based on each piece of label information comprises: determining invalid information as the set label distribution information of the target identifier set when the number of positive samples is equal to the total number of object identifiers in the target identifier set or the number of negative samples is equal to the total number of object identifiers. 12 . The method according to claim 9 , further comprising: obtaining a partition rule and a number of partitions N; determining a partition identifier corresponding to each piece of label data based on each object identifier and the partition rule, the label data comprising the object identifier and the label information corresponding to the object identifier, and the partition identifier corresponding to one of N partitions; and adding each piece of label data to the corresponding partition based on the partition identifier. 13 . A non-

Assignees

Inventors

Classifications

  • Management thereof · CPC title

  • Distributed learning, e.g. federated learning · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • Distributed queries · CPC title

  • Optimisations to support specific applications; Extensibility of optimisers · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12579472B2 cover?
This application provides a federated-learning-based data processing method, apparatus, and device, and a computer-readable storage medium. The method includes obtaining data to be processed, the data to be processed comprising multiple object identifiers and a feature value corresponding to each object identifier; binning the data to be processed based on the feature value corresponding to eac…
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).