Distributed random binning featurization with hybrid two-level parallelism

US11080228B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11080228-B2
Application numberUS-201715457422-A
CountryUS
Kind codeB2
Filing dateMar 13, 2017
Priority dateMar 13, 2017
Publication dateAug 3, 2021
Grant dateAug 3, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A random binning featurization process method, system, and computer program product for a distributed random binning featurization process on one or more multicore systems with a hybrid two-level parallelism, the method including in a training phase, receiving a first data matrix dividing the random binning featurization process into two orthogonal levels, in a high-level generating a randomized number of high-dimension grids and evenly partitioning the grids into nodes in a parallel system, and in a low-level, evenly partitioning dimensions in each grid to construct look-up tables of index vectors and compute a local feature matrix for each node.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented random binning featurization process method for a distributed random binning featurization process on one or more multicore systems with a hybrid two-level parallelism, the method comprising: in a training phase: receiving a first data matrix; dividing the random binning featurization process into two orthogonal levels comprising a high-level and a low-level; in the high-level: generating a plurality of randomized grids; and evenly partitioning the generating of grids into nodes in a parallel system; and in the low-level: evenly partitioning dimensions in each grid to construct look-up tables of index vectors and compute a local feature matrix for each node, wherein the hybrid two-level parallelism includes random binning and kernel approximations via featurization, wherein the random binning has a decomposition of a first form, where a variable in the first form is a grid parameterized by a first function that specifies a width and a bias of a grid with respect to dimensions, and a second variable in the first form is a vector, wherein, for each grid of the first form, a number of bins is set so a third variable in the first form has only one non-zero entry, and wherein, in the low-level, a number of threads are launched to simultaneously process attributes of the first data matrix to construct the index vector look-up tables and to compute the local feature matrix, further comprising: synchronizing the local feature matrix for each of the nodes to gather global feature offsets; and generating a global feature matrix from all of the global feature offsets, wherein the plurality of randomized grids are generated by setting a different random seed to a random number generator, and wherein the training phase sets up, the hybrid two-level parallelism to achieve advantages of parallel computing. under constraints of data dependency, the advantages including a near-linear speedup in the training phase and a near-linear memory reduction for storing the local feature matrix. 2. The computer-implemented method of claim 1 , further comprising: in a testing phase: receiving a second data matrix; in a high-level of the second data matrix: generating a plurality of randomized grids; and evenly partitioning the generating of grids into nodes in a parallel system of the second data matrix; and in a low-level of the second data matrix: evenly partitioning dimensions in each grid to search the look-up tables of index vectors and compute a local feature matrix for each nod; wherein the advantages include the near-linear speedup in the testing phase. 3. The computer-implemented method of claim 2 , further comprising, in each of the training phase and the testing phase: synchronizing local feature matrices t of respective nodes to gather global feature offsets; and generating a global feature vector from all of the global feature offsets. 4. The computer-implemented method of claim 2 , wherein a global feature matrix and a global feature vector are generated by gathering different respective local feature index offsets. 5. The computer-implemented method of claim 1 , embodied in a cloud-computing environment. 6. A computer program product for random binning featurization process for a distributed random binning featurization process on one or more multicore systems with a hybrid two-level parallelism, the computer program product comprising a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform: in a training phase: receiving a first data matrix; dividing the random binning featurization process into two orthogonal levels comprising a high-level and a low-level; in the high-level: generating a plurality of randomized grids; and evenly partitioning the generating of grids into nodes in a parallel system; and in the low-level: evenly partitioning dimensions in each grid to construct look-up tables of index vectors and compute a local feature matrix for each node, wherein the hybrid two-level parallelism includes random binning and kernel approximations via featurization, and wherein the training phase sets up the hybrid two-level parallelism for parallel computing under constraints of data dependency, and wherein the training phase sets up the hybrid two-level parallelism to achieve advantages of parallel computing under constraints of data dependency, the advantages, including a near-linear speedup in the training phase and a near-linear memory reduction for storing the local feature matrix. 7. The computer program product of claim 6 , wherein the program instructions are executed by the computer to cause the computer to further perform: in a testing phase: receiving a second data matrix; in a high-level of the second data matrix: generating a plurality of randomized grids; and evenly partitioning the generating of grids into nodes in a parallel system; and in a low-level of the second data matrix: evenly partitioning dimensions in each grid to search the look-up tables of index vectors and compute a local feature matrix for each nod; wherein the advantages include the near-linear speedup in the testing phase. 8. The computer program product of claim 7 , further comprising, in each of the training phase and the testing phase: synchronizing local feature matrices t of respective nodes to gather global feature offsets; and generating a global feature vector from all of the global feature offsets. 9. The computer program product of claim 7 , wherein a global feature matrix and a global feature vector are generated by gathering different respective local feature index offsets. 10. A random binning featurization process system for a distributed random binning featurization process on one or more multicore systems with a hybrid two-level parallelism, said system comprising: a processor; and a memory, the memory storing instructions to cause the processor to perform: in a training phase: receiving a first data matrix; dividing the random binning featurization process into two orthogonal levels comprising a high-level and a low-level; in the high-level: generating a plurality of randomized grids; and evenly partitioning the generating of grids into nodes in a parallel system; and in the low-level: evenly partitioning dimensions in each grid to construct look-up tables of index vectors and compute a local feature matrix for each node, wherein the hybrid two-level parallelism includes random binning and kernel approximations via featurization, wherein the random binning has a decomposition of a first form k RB (x i , x j )=∫ δ p(δ) ϕ Bδ (x i ) T ϕ Bδ (x j )dδ, where B δ is a grid parameterized by δ=(δ 1 , u 1 , . . . , δ d , u d ) that specifies a width and a bias of a grid with respect to d dimensions, and B δ (x) is a vector which has ϕ b (x i ) =1, if b=└(x i (1)−u 1 /δ 1 ┘, . . . , └(x i (d)−u d /δd┘), and ϕ b (x i )=0 otherwise for any b∈B δ , wherein, for each grid B δ , a number of bins |B δ | is set so ϕ b (x) has only 1 non-zero entry, and wherein, in the low-level, a number of threads are launched to simultaneously process attributes of the first data matrix to construct the index vector look-up tables and to compute the local feature matrix, further comprising: synchronizing the local feature matrix for each of the nodes to gather global feature offsets; and generating a global feature matrix from all of the global feature offsets, wherein the plurality of randomized grids are generated by setting a different random seed to a random number generator, wherein th

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Architectures of general purpose stored program computers (with program plugboard G06F15/08; multicomputers G06F15/16) · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • using kernel methods, e.g. support vector machines [SVM] · CPC title

  • G06F15/80Primary

    comprising an array of processing units with common control, e.g. single instruction multiple data processors (G06F15/82 takes precedence {; for correlation function computation G06F17/15}) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11080228B2 cover?
A random binning featurization process method, system, and computer program product for a distributed random binning featurization process on one or more multicore systems with a hybrid two-level parallelism, the method including in a training phase, receiving a first data matrix dividing the random binning featurization process into two orthogonal levels, in a high-level generating a randomize…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 03 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).