Feature completion in computer-human interactive learning

US9779081B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9779081-B2
Application numberUS-201615135266-A
CountryUS
Kind codeB2
Filing dateApr 21, 2016
Priority dateJul 12, 2013
Publication dateOct 3, 2017
Grant dateOct 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, and products. Creation of classifiers and schematizers is provided on large data sets. Exercising the classifiers and schematizers on hundreds of millions of items may expose value that is inherent to the data by adding usable meta-data. Some aspects include active labeling exploration, automatic regularization and cold start, scaling with the number of items and the number of classifiers, active featuring, and segmentation and schematization.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for feature completion for machine learning, comprising: one or more processing devices that: store a first set of data items, wherein each data item includes a text stream of words; provide a dictionary, wherein the dictionary includes a list of words that define a concept usable as an input feature for training a machine-learning model to score data items with a probability of being a positive example or a negative example of a particular class of data item; provide a feature that is trained to calculate a first probability of a presence, within a stream of one or more words, of a disjunction of one or more n-grams that correspond semantically to the concept defined by the words in the dictionary; utilize the feature to determine the first probability of the presence, within a stream of one or more words, of a disjunction of one or more n-grams that correspond semantically to the concept defined by the words in the dictionary at a given word position in the data item; provide a machine-learning model that is trainable to calculate a second probability of the presence, within the stream of one or more words at the given word position, of the disjunction of the one or more n-grams that correspond semantically to the concept defined by the words in the dictionary, based on one or more words in the data item not utilized by the feature to determine the first probability; utilize the machine-learning model to determine the second probability of the presence, within the stream of one or more words at the given word position, of a disjunction of one or more n-grams that correspond semantically to the concept defined by the words in the dictionary, based on the one or more words in the data item not utilized by the feature to determine the first probability; determine an actual presence or absence, at the given word position, of the disjunction of the one or more n-grams that correspond semantically to the concept defined by the words in the dictionary; and modify the machine-learning model to adjust the second probability in a positive or negative direction based on the determined actual presence or absence of the disjunction of the one or more n-grams that correspond semantically to the concept defined by the words in the dictionary; wherein the feature determines one or more of: whether any words from a given list appear at the center of a window of text around the given word position in which center words in the window of text have been removed, a presence or absence of a verb in the window, a presence or absence of a noun followed by an adjective, or a number of occurrences of a given word in the window. 2. The system of claim 1 , wherein the feature determines the presence of a disjunction of one or more n-grams at each considered position in a text stream, while the machine-learning model input includes a window of text around the considered position in which center words in the window of text have been removed. 3. The system of claim 1 , wherein the feature is a regular expression that operates over strings to predict semantically matching positions in text within a string at each considered position, while the machine-learning model input includes a window of text around the considered position in which center words in the window of text have been removed. 4. The system of claim 1 , wherein modify the machine-learning model to adjust the calculated probability includes adjust the calculated probability in a positive direction when the disjunction of the one or more n-grams is present. 5. The system of claim 4 , wherein modify the machine-learning model to adjust the calculated probability includes adjust the calculated probability in a negative direction when the disjunction of the one or more n-grams is not present. 6. The system of claim 1 , wherein the window of text is a sliding window. 7. The system of claim 1 , wherein utilize the one or more words in the data item not utilized by the feature includes utilize a text window that includes a number of words immediately preceding a given word position and a number of words immediately following the given word position. 8. The system of claim 1 , wherein the window of text is a sliding window that includes a number of words immediately preceding a given word position and a number of words immediately following the given word position. 9. One or more hardware computer-storage media having embodied thereon computer-usable instructions that, when executed, facilitate a method of feature completion for machine learning, the method comprising: storing a first set of data items, wherein each data item includes a text stream of words; accessing a dictionary, wherein the dictionary includes a list of words that define a concept usable as an input feature for training a machine-learning model to score data items with a probability of being a positive example or a negative example of a particular class of data item; and training the machine-learning model with the dictionary as an input feature, wherein the training includes, for each data item in the first set of data items: for a first word position in the text stream within the data item, examining a window of text centered at a second word position in the text stream, wherein the window of text includes one or more words, utilizing a probability function to calculate a probability of the presence, at the first word position, of a disjunction of one or more n-grams that correspond semantically to the concept defined by the words in the dictionary, based on the one or more words in the window of text, determining an actual presence or absence, at the first word position, of a disjunction of one or more n-grams that correspond semantically to the concept defined by the words in the dictionary, wherein the determining comprises determining one or more of: whether any words from a given list appear at the center of a window of text around the given word position in which center words in the window of text have been removed, a presence or absence of a verb in the window, a presence or absence of a noun followed by an adjective, or a number of occurrences of a given word in the window, and modifying the probability function to adjust the probability in a positive or negative direction based on the determined actual presence or absence of the disjunction of one or more n-grams that correspond semantically to the concept defined by the words in the dictionary. 10. The media of claim 9 , wherein the window of text is a sliding window that includes a number of words immediately preceding a given word position and a number of words immediately following the given word position. 11. The media of claim 9 , wherein modifying the probability function to adjust the probability includes modifying the probability function to increase the probability when the disjunction of the one or more n-grams corresponds semantically to the concept defined by the words in the dictionary. 12. The media of claim 11 , wherein modifying the probability function to adjust the probability includes modifying the probability function to decrease the probability when the disjunction of the one or more n-grams does not correspond semantically to the concept defined by the words in the dictionary. 13. The media of claim 9 , wherein when the window of text overlaps the first word position, one or more words at the first word position are excluded from the window of text, and wherein the second word position may be different than the first word position or the same as the first word position. 14. One or more hardware comput

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Formats for control data (H04L1/16 takes precedence; training sequences H04L25/00 and H04L27/00) · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9779081B2 cover?
A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, a…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).