Distributed event prediction and machine learning object recognition system

US10127477B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10127477-B2
Application numberUS-201715686863-A
CountryUS
Kind codeB2
Filing dateAug 25, 2017
Priority dateApr 21, 2016
Publication dateNov 13, 2018
Grant dateNov 13, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing device predicts occurrence of an event or classifies an object using distributed unlabeled data. Supervised data that includes a labeled subset of a plurality of observation vectors is identified. A total number of threads that will perform labeling of an unlabeled subset of the plurality of observation vectors is determined. The identified supervised data is uploaded to each thread of the total number of threads. Unlabeled observation vectors are randomly select from the unlabeled subset of the plurality of observation vectors to allocate to each thread of the total number of threads. The randomly selected, unlabeled observation vectors are uploaded to each thread of the total number of threads based on the allocation. The value of the target variable for each observation vector of the unlabeled subset of the plurality of observation vectors is determined based on a converged classification matrix and output to a labeled dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable medium having stored thereon computer-readable instructions that when executed by a computing device cause the computing device to: read a label set, wherein the label set defines permissible values for a target variable; identify supervised data that includes a labeled subset of a plurality of observation vectors, wherein a value of the permissible values of the target variable is pre-defined for the labeled subset of the plurality of observation vectors; determine a total number of threads that will perform labeling of an unlabeled subset of the plurality of observation vectors, wherein the unlabeled subset of the plurality of observation vectors have not been labeled; upload the identified supervised data to each thread of the total number of threads; randomly select unlabeled observation vectors from the unlabeled subset of the plurality of observation vectors to allocate to each thread of the total number of threads; upload the randomly selected, unlabeled observation vectors to each thread of the total number of threads based on the allocation; receive, from each thread of the total number of threads, the value of the target variable for each observation vector of the unlabeled subset of the plurality of observation vectors uploaded to a respective thread, the value of the target variable selected based on a label probability defined by a thread, converged classification matrix that defines the label probability for each permissible value defined in the label set for each observation vector of the unlabeled subset of the plurality of observation vectors uploaded to the respective thread, wherein the thread, converged classification matrix is computed by the respective thread; determine the value of the target variable for each observation vector of the unlabeled subset of the plurality of observation vectors based on the value of the target variable received from each thread; and output the determined value of the target variable for each observation vector of the plurality of observation vectors to a labeled dataset. 2. The non-transitory computer-readable medium of claim 1 , wherein the labeled subset of the plurality of observation vectors is less than one percent of the plurality of observation vectors. 3. The non-transitory computer-readable medium of claim 1 , wherein each observation vector defines an image, and the value of the target variable defines an image label determined using the converged classification matrix. 4. The non-transitory computer-readable medium of claim 1 , wherein the total number of threads are all controlled by the computing device. 5. The non-transitory computer-readable medium of claim 1 , wherein the total number of threads include at least one thread controlled by a different computing device than the computing device. 6. The non-transitory computer-readable medium of claim 1 , wherein the total number of threads are all controlled by a different computing device than the computing device. 7. The non-transitory computer-readable medium of claim 1 , wherein computing the converged classification matrix by the respective thread comprises computer-readable instructions that further cause the computing device to: compute an affinity matrix using a kernel function, the identified supervised data, and the randomly selected, unlabeled observation vectors allocated to the respective thread; compute a diagonal matrix by summing each row of the computed affinity matrix, wherein the sum of each row is stored in a diagonal of the row with zeroes in remaining positions of the row; compute a normalized distance matrix using the computed affinity matrix and the computed diagonal matrix; and define a label matrix using the value of the target variable of each observation vector of the randomly selected, unlabeled observation vectors allocated to the respective thread. 8. The non-transitory computer-readable medium of claim 7 , wherein the converged classification matrix is initialized as the defined label matrix. 9. The non-transitory computer-readable medium of claim 8 , wherein the converged classification matrix is converged using F(t+1)=αSF(t)+(1−α)Y, where F(t+1) is a next classification matrix, a is a relative weighting value, S is the normalized distance matrix, F(t) is the classification matrix, Y is the label matrix, and t is an iteration number. 10. The non-transitory computer-readable medium of claim 9 , wherein the classification matrix is converged when a second predefined number of iterations of computations of F(t+1)=αSF(t)+(1−α)Y is complete. 11. The non-transitory computer-readable medium of claim 7 , wherein the kernel function is a Gaussian kernel function. 12. The non-transitory computer-readable medium of claim 7 , wherein the affinity matrix is defined as W ij = exp ⁢ -  x i - x j  2 2 ⁢ ⁢ s 2 if i≠j and W ii =0, where s is a Gaussian bandwidth parameter, x is an observation vector of the randomly selected, unlabeled observation vectors allocated to the respective thread, i=1, . . . , n, j=1, . . . , n, and n is a number of vectors of the randomly selected, unlabeled observation vectors allocated to the respective thread. 13. The non-transitory computer-readable medium of claim 7 , wherein the diagonal matrix is defined as D ii =Σ j=1 n W ij and D ij =0 if i≠j, where W is the computed affinity matrix, i=1, . . . , n, and n is a number of vectors of the randomly selected, unlabeled observation vectors allocated to the respective thread. 14. The non-transitory computer-readable medium of claim 7 , wherein the normalized distance matrix is defined as S=D −1/2 WD −1/2 , where W is the computed affinity matrix and D is the computed diagonal matrix. 15. The non-transitory computer-readable medium of claim 7 , wherein the label matrix is defined as Y ik =1 if x i is labeled as y i =k; otherwise, Y ik =0, where x i is an observation vector of the randomly selected, unlabeled observation vectors allocated to the respective thread, i=1, . . . , n, n is a number of vectors of the randomly selected, unlabeled observation vectors allocated to the respective thread, k=1, . . . , c, and c is a number of permissible values of the label set. 16. The non-transitory computer-readable medium of claim 1 , comprising computer-readable instructions that further cause the computing device to train a predictive model with the labeled dataset. 17. The non-transitory computer-readable medium of claim 1 , comprising compu

Assignees

Inventors

Classifications

  • the supervisor being an automated module, e.g. intelligent oracle · CPC title

  • G06N20/10Primary

    using kernel methods, e.g. support vector machines [SVM] · CPC title

  • based on distances to training or reference patterns · CPC title

  • characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10127477B2 cover?
A computing device predicts occurrence of an event or classifies an object using distributed unlabeled data. Supervised data that includes a labeled subset of a plurality of observation vectors is identified. A total number of threads that will perform labeling of an unlabeled subset of the plurality of observation vectors is determined. The identified supervised data is uploaded to each thread…
Who is the assignee on this patent?
Sas Inst Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 13 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).