Interactive concept editing in computer-human interactive learning

US2024135098A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024135098-A1
Application numberUS-202318527104-A
CountryUS
Kind codeA1
Filing dateDec 1, 2023
Priority dateJul 12, 2013
Publication dateApr 25, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, and products. Creation of classifiers and schematizers is provided on large data sets. Exercising the classifiers and schematizers on hundreds of millions of items may expose value that is inherent to the data by adding usable meta-data. Some aspects include active labeling exploration, automatic regularization and cold start, scaling with the number of items and the number of classifiers, active featuring, and segmentation and schematization.

First claim

Opening claim text (preview).

1 . One or more hardware computer-storage devices having embodied thereon computer-usable instructions that, when executed by at least one processor, cause performance of operations associated with interactively generating a machine-learning feature, the operations comprising: causing presentation of a user interface for generating a representation of a first concept by associating, to the first concept, a list of words that are specific examples of the first concept; providing, via the user interface, an input field that receives a first input comprising a plurality of characters of text that form at least one specific example of the first concept; generating a generalized concept determined at least in part from the first input; generating, based on a classifier, a set of words that are each a specific example of the generalized concept; providing, via the user interface, a first field that displays the set of words as suggested words that are selectable for inclusion as specific examples of the first concept; receiving a second input indicative of a selection of at least one suggested word of the suggested words; providing, via the user interface, a second field that displays (1) the list of words that are specific examples of the first concept and (2) the at least one suggested word of the second input; and saving (1) the list of words that are specific examples of the first concept and (2) the at least one suggested word of the second input as an updated representation of the first concept, wherein the classifier is trained based on a first machine learning feature indicating the updated representation that comprises (1) the list of words that are specific examples of the first concept and (2) the at least one suggested word of the second input. 2 . The one or more hardware computer-storage devices of claim 1 , wherein the set of words displayed in the first field as the suggested words are selectable for exclusion as non-examples of the first concept, wherein the classifier is further trained based on a second machine learning feature comprising at least one word of the suggested words selected for exclusion as non-examples of the first concept. 3 . The one or more hardware computer-storage devices of claim 2 , wherein the classifier is further trained based on a third machine learning feature comprising a computer-generated word, wherein the first machine learning feature has a higher weight than the third machine learning feature based on the first machine learning feature comprising (1) the list of words that are specific examples of the first concept and (2) the at least one suggested word of the second input. 4 . The one or more hardware computer-storage devices of claim 1 , the operations further comprising: receiving a first word that is an example of the first concept; and presenting a list of suggested words that represent the generalized concept generated based at least on the first word. 5 . The one or more hardware computer-storage devices of claim 4 , the operations further comprising: subsequent to presenting the list of suggested words, receiving a second word that is a second example of the first concept; refining the list of suggested words based at least on a combination of the first word and the second word; and presenting the refined list of suggested words that represent a refined generalized concept. 6 . The one or more hardware computer-storage devices of claim 4 , the operations further comprising: subsequent to presenting the list of suggested words, receiving a second word that is designated as not being an example of the first concept; refining the list of suggested words based at least on a combination of the first word and the second word; and presenting the refined list of suggested words that represent a refined generalized concept. 7 . The one or more hardware computer-storage devices of claim 4 , the operations further comprising: including the one or more of the suggested words associated with the second input in the input field that receives the first input that forms at least one specific examples of the first concept. 8 . The one or more hardware computer-storage devices of claim 1 , wherein each word in the list of words is assigned a respective weight that is scaled, based on training data and during generation of the list of words, by a function of frequency of a size of the list of words, wherein the scaled weights are related by a regularization constraint that adjusts the respective weights of the words that have less training data toward a value determined by the words that have more training data. 9 . The one or more hardware computer-storage devices of claim 8 , wherein the classifier comprises a neural network. 10 . A method of interactively generating an updated representation of a first concept for training a classifier, the method comprising: presenting a user interface for generating a representation of the first concept by associating, to the first concept, a list of phrases that are specific examples of the first concept; providing, via the user interface, an input field that receives a first input comprising a plurality of characters of text that form at least one specific example of the first concept; generating a generalized concept determined at least in part from the first input; generating, based on the classifier, a set of phrases that are each a specific example of the generalized concept; providing, via the user interface, a first field that displays the set of phrases as suggested phrases that are selectable for inclusion as specific examples of the first concept; receiving a second input indicative of a selection of at least one suggested phrase of the suggested phrases; providing, via the user interface, a second field that displays (1) the list of phrases that are specific examples of the first concept and (2) the at least one suggested phrase of the second input; and saving (1) the list of phrases that are specific examples of the first concept and (2) the at least one suggested phrase of the second input as the updated representation of the first concept, wherein the classifier comprises a neural network that is trained based on a first machine learning feature indicating the updated representation that comprises (1) the list of phrases that are specific examples of the first concept and (2) the at least one suggested phrase of the second input. 11 . The method of claim 10 , wherein the set of phrases displayed in the first field as suggested phrases are selectable for exclusion as non-examples of the first concept, wherein the classifier is further trained based on a second machine learning feature comprising at least one phrase of the suggested phrases selected for exclusion as non-examples of the first concept. 12 . The method of claim 11 , wherein the classifier is further trained based on a third machine learning feature comprising a computer-generated phrase, wherein the first machine learning feature has a higher weight than the third machine learning feature based on the first machine learning feature comprising (1) the list of phrases that are specific examples of the first concept and (2) the at least one suggested phrase of the second input. 13 . The method of claim 10 , further comprising: generating a refinement of the suggested phrases that represents a refinement of the generalized concept based on at least one or more additional input phrases; and presenting the refinement of the set of phrases on the user interface. 14 . The method of claim 13 , further comprising: repeating steps of generating the refinem

Assignees

Inventors

Classifications

  • G06F40/242Primary

    Dictionaries · CPC title

  • Interaction with lists of selectable items, e.g. menus · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Semantic analysis · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024135098A1 cover?
A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, a…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/242. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).