Hybrid active learning for non-stationary streaming data with asynchronous labeling

US10102481B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10102481-B2
Application numberUS-201514658894-A
CountryUS
Kind codeB2
Filing dateMar 16, 2015
Priority dateMar 16, 2015
Publication dateOct 16, 2018
Grant dateOct 16, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A continuous electronic data stream of unlabeled data instances is received and fed into both a stream-based selection strategy and a pool-based selection strategy. The stream-based selection strategy is continuously applied to each of the unlabeled data instances to continually select stream-based data instances that are to be annotated. Additionally, the pool-based selection strategy is periodically applied to a pool of data obtained from the unlabeled data instances, to periodically select pool-based data instances that are to be annotated. Each time the pool-based selection strategy is applied, these methods automatically replace the stream-based data instances with the pool-based data instances. Also, these methods provide, on demand, access to allow a user to annotate the stream-based data instances and the pool-based data instances.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a continuous electronic data stream of unlabeled data instances; automatically feeding said unlabeled data instances into a stream-based selection strategy and a pool-based selection strategy; automatically continuously applying said stream-based selection strategy to each of said unlabeled data instances to continually select stream-based data instances by performing an incremental computerized selection processes that selects ones of said unlabeled data instances based on human annotation criteria, incrementally as each unlabeled data instance is received; automatically storing said stream-based data instances in an electronic storage element; automatically periodically applying said pool-based selection strategy to a pool of data obtained from said unlabeled data instances to periodically select pool-based data instances by performing a batch computerized selection processes that selects ones of said unlabeled data instances based on human annotation criteria, from all unlabeled data instances in said electronic storage element; each time said pool-based selection strategy is applied, automatically replacing ones of said stream-based data instances in said electronic storage element with said pool-based data instances; providing, on demand, access to said electronic storage element to annotate ones of said stream-based data instances and said pool-based data instances currently maintained by said electronic storage element at a time when a user accesses said electronic storage element; receiving annotations relating to said stream-based data instances and said pool-based data instances from said user to produce annotated data instances; and automatically training a previous model with said annotated data instances to produce an updated model by updating said previous model using labels said annotations provide. 2. The method according to claim 1 , further comprising updating a classification confidence threshold used by said stream-based selection strategy based on classification confidence values produced during said applying said pool-based selection strategy. 3. The method according to claim 1 , said providing, on demand, access to said electronic storage element to annotate ones of said stream-based data instances and said pool-based data instances at unpredictable times. 4. The method according to claim 1 , said stream-based selection strategy and said pool-based selection strategy having independent selection criteria. 5. The method according to claim 1 , said the pool-based selection strategy and said the stream-based selection strategy automatically making decisions as to whether said unlabeled data instances should be annotated by said user. 6. The method according to claim 1 , said stream-based selection strategy making a selection decision on every one of said unlabeled data instances, and said pool-based selection strategy evaluating and ranking said unlabeled data instances in said pool of data before making a selection decision. 7. The method according to claim 1 , said stream-based selection strategy making lower quality selection relative to said pool-based selection strategy. 8. A method comprising: receiving a continuous electronic data stream of unlabeled data instances; automatically feeding said unlabeled data instances into a stream-based selection strategy and a pool-based selection strategy; automatically continuously applying said stream-based selection strategy to each of said unlabeled data instances to continually select stream-based data instances by performing an incremental computerized selection processes that selects ones of said unlabeled data instances based on human annotation criteria, incrementally as each unlabeled data instance is received; automatically storing said stream-based data instances in an electronic storage element; automatically periodically applying said pool-based selection strategy to a pool of data obtained from said unlabeled data instances to periodically select pool-based data instances by performing a batch computerized selection processes that selects ones of said unlabeled data instances based on human annotation criteria, from all unlabeled data instances in said electronic storage element; each time said pool-based selection strategy is applied, automatically replacing ones of said stream-based data instances in said electronic storage element with said pool-based data instances; providing, on demand, access to said electronic storage element to annotate ones of said stream-based data instances and said pool-based data instances currently maintained by said electronic storage element at a time when a user accesses said electronic storage element; receiving annotations relating to said stream-based data instances and said pool-based data instances from said user to produce annotated data instances; automatically training a previous model with said annotated data instances to produce an updated model by updating said previous model using labels said annotations provide; automatically replacing said previous model with said updated model; and automatically labeling said unlabeled data instances using said updated model. 9. The method according to claim 8 , further comprising updating a classification confidence threshold used by said stream-based selection strategy based on classification confidence values produced during said applying said pool-based selection strategy. 10. The method according to claim 8 , said providing, on demand, access to said electronic storage element to annotate ones of said stream-based data instances and said pool-based data instances at unpredictable times. 11. The method according to claim 8 , said stream-based selection strategy and said pool-based selection strategy having independent selection criteria. 12. The method according to claim 8 , said the pool-based selection strategy and said the stream-based selection strategy automatically making decisions as to whether said unlabeled data instances should be annotated by said user. 13. The method according to claim 8 , said stream-based selection strategy making a selection decision on every one of said unlabeled data instances, and said pool-based selection strategy evaluating and ranking said unlabeled data instances in said pool of data before making a selection decision. 14. The method according to claim 8 , said stream-based selection strategy making lower quality selection relative to said pool-based selection strategy. 15. A system comprising: an input receiving a continuous electronic data stream of unlabeled data instances; a first processing element operatively connected to said input, said first processing element automatically and continuously applying a stream-based selection strategy to each of said unlabeled data instances to continually select stream-based data instances by performing an incremental computerized selection processes that selects ones of said unlabeled data instances based on human annotation criteria, incrementally as each unlabeled data instance is received; an electronic storage element operatively connected to said first processing element, said electronic storage element storing said stream-based data instances; a second processing element operatively connected to said input and said electronic storage element, said second processing element automatically and periodically applying a pool-based selection strategy to a pool of data obtained from said unlabeled data instances to periodically select pool-based data instances by performing a batch computerized selection pr

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10102481B2 cover?
A continuous electronic data stream of unlabeled data instances is received and fed into both a stream-based selection strategy and a pool-based selection strategy. The stream-based selection strategy is continuously applied to each of the unlabeled data instances to continually select stream-based data instances that are to be annotated. Additionally, the pool-based selection strategy is perio…
Who is the assignee on this patent?
Conduent Business Services Llc
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 16 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).