Large-scale batch active learning using locality sensitive hashing

US2016307113A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016307113-A1
Application numberUS-201514691136-A
CountryUS
Kind codeA1
Filing dateApr 20, 2015
Priority dateApr 20, 2015
Publication dateOct 20, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for selection of a batch of objects are provided. Each object in a pool is assigned to a subset of a set of buckets. The assignment is based on signatures, generated, for example, by LSH hashing object representations of the objects in the pool. The signatures are then segmented into bands which are each assigned to a respective bucket in the set, based on the elements of the band. An entropy value is computed for each of a set of objects remaining in the pool using a current classifier model. A batch of objects for retraining the model is selected. This includes selecting objects from the set of objects based on their computed entropy values and respective assigned buckets.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for selection of a batch of objects comprising: for each object in a pool of objects: performing Locality Sensitive Hashing on a multidimensional representation of the object to compute a signature comprising a sequence of elements; segmenting the signature into a plurality of bands, each band comprising a subset of the elements in the signature; assigning each of a plurality of bands of the signature to a respective one of a set of buckets based on values of the elements of the band; computing an entropy value for each of a set of objects remaining in the pool using a current classifier model; and selecting a batch of objects, including selecting objects from the set of objects based on their computed entropy values and respective assigned buckets, wherein at least one of the performing Locality Sensitive Hashing, segmenting the signature, assigning the bands, computing an entropy value, and selecting the batch of objects is performed with a processor. 2 . The method of claim 1 , further comprising outputting the batch for labeling. 3 . The method of claim 1 , further comprising receiving labels for the objects in the batch, and retraining the classifier model based on the received labels. 4 . The method of claim 3 , further comprising: removing the labeled objects from the pool; repeating the computing of the entropy value for each of a set of objects remaining in the pool using the current classifier model, the current classifier model being the retrained classifier model; and repeating the selecting a batch of objects from the set of objects remaining in the pool based on the computed entropy values of the objects in the pool and respective assigned buckets. 5 . The method of claim 1 , wherein the batch comprises at least 5 objects. 6 . The method of claim 1 , wherein the computing of the entropy H(d) comprises computing a function of Σ 1 c P(c|d)log c P(c|d), where c represents a class and P(c|d) represents the probability assigned for that class by the classifier model for the object d. 7 . The method of claim 1 , wherein the Locality Sensitive Hashing is performed with a family of at least 32 hash functions. 8 . The method of claim 1 , further comprising ranking the set of objects remaining in the pool based on their entropy values. 9 . The method of claim 8 , wherein the selection of objects in the batch includes drawing a new object from the pool based on its entropy value, comparing the entropy value of the new object with an entropy value of an object previously added to the batch and, if a difference in the entropy value does not exceed a threshold, comparing the assigned buckets of the new object with the assigned buckets of objects previously added to the batch and determining whether to add the new object to the batch based on the comparison. 10 . The method of claim 9 , wherein when the decision is not to add the new object to the batch, the method includes identifying the object remaining in the queue which has the highest entropy of the objects in the queue to be the next object. 11 . The method of claim 1 wherein the method includes storing a list of the buckets of the objects already added to the batch and comparing the buckets of a new object drawn from the pool with the list of the buckets, the selecting of the batch of objects being based on the comparison. 12 . The method of claim 1 , wherein the objects are documents. 13 . The method of claim 1 , wherein the document representations are at least one of bag-of-words and bag-of-n-gram based representations. 14 . The method of claim 1 , further comprising using the retrained classifier model to label objects in the pool. 15 . The method of claim 1 , further comprising partitioning the pool of objects into a set of smaller pools, the selecting of the batch of objects including for each smaller pool: selecting objects from the set of objects in the smaller pool based on their computed entropy values and respective assigned buckets; and identifying the batch of objects based on the selected objects for each smaller pool. 16 . A system for selection of a batch of objects comprising memory which stores instructions for performing the method of claim 1 and a processor in communication with the memory for executing the instructions. 17 . A computer program product comprising non-transitory memory which stores instructions, which when executed by a computer, perform the method of claim 1 . 18 . A system for selection of a batch of objects comprising: a classifier model training component for training a classifier model based on labeled objects; a representation generator for providing representations of objects in a pool of objects; an indexing component which indexes the objects of the pool based on signatures of the objects in the pool, the signatures having been segmented to form a plurality of bands, each band serving as a hash key to retrieve one of a plurality of buckets, the indexing being based on the buckets for which the bands are hash keys; an entropy computation component which computes an entropy value for each of a set of objects remaining in the pool using a current classifier model; a batch selection component for selecting objects to form a batch of objects from the set of objects in the pool based on the computed entropy values of the objects and respective assigned buckets; and a processor which implements the classifier model training component, representation generator, indexing component, entropy computation component, and batch selection component. 19 . The system of claim 18 , further comprising a Locality Sensitive Hashing component for generating the signatures by Locality Sensitive Hashing. 20 . A method for training a classifier comprising: providing a current classifier model for labeling objects based on representations of the objects; providing representations of objects in a pool of objects; indexing the objects in the pool based on signatures of the objects, the signatures having been segmented to form a plurality of bands and each band assigned to one of a plurality of buckets, the indexing being based on the buckets to which the bands are assigned; computing an entropy for each of a set of objects in a pool of objects with the current classifier model, based on the representations of the objects; selecting a batch of objects, including selecting objects from the set of objects in the pool to form the batch of objects, the selection being based on the computed entropy values of the objects and respective assigned buckets; and retraining the current classifier model with labels received for the objects in the batch to generate an updated classifier model, wherein at least one of the indexing, computing an entropy value, selecting the batch of objects, and retraining the classifier model is performed with a processor.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016307113A1 cover?
A system and method for selection of a batch of objects are provided. Each object in a pool is assigned to a subset of a set of buckets. The assignment is based on signatures, generated, for example, by LSH hashing object representations of the objects in the pool. The signatures are then segmented into bands which are each assigned to a respective bucket in the set, based on the elements of th…
Who is the assignee on this patent?
Xerox Corp
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).