Training a natural language processing model with information retrieval model annotations
US-9536522-B1 · Jan 3, 2017 · US
US2016012336A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016012336-A1 |
| Application number | US-201514657343-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 13, 2015 |
| Priority date | Jul 14, 2014 |
| Publication date | Jan 14, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to an aspect, automatically linking text to concepts in a knowledge base using differential analysis includes receiving a text string and selecting, based on contents of the text string, a plurality of data sources that correspond to concepts in the knowledge base. In a further aspect, automatically linking the text to the concepts includes calculating, for each of the selected data sources, a probability that the text string is output by a language model built using the selected data source, calculating a probability that the text string is output by a generic language model, calculating link confidence scores for each concept based on a differential analysis of the probabilities, and creating a link from the text string to one of the concepts in the knowledge base. The creating is based on a link confidence score of the concept being more than a threshold value away from a prescribed threshold.
Opening claim text (preview).
What is claimed is: 1 . A method for automatically linking text to concepts in a knowledge base using a differential analysis, the method comprising: receiving a text string; selecting a plurality of data sources that correspond to concepts in the knowledge base, the selecting based on contents of the text string; calculating, for each of the selected data sources, a probability that the text string is output by a language model built using the selected data source; calculating a probability that the text string is output by a generic language model; calculating link confidence scores for each concept based on a differential analysis of the probabilities; and creating a link from the text string to one of the concepts in the knowledge base, the creating based on a link confidence score of the concept being more than a threshold value away from a prescribed threshold. 2 . The method of claim 1 , wherein the differential analysis compares the probability that the text string is output by a language model built using a data source to the probability that the text string is output by a generic language model. 3 . The method of claim 1 , wherein the differential analysis compares the probability that the text string is output by a language model built using a data source to a probability that the text string is output by a language model built using a competing data source. 4 . The method of claim 1 , wherein the generic language model is derived from a generic data source not specific to any of the concepts in the knowledge base. 5 . The method of claim 1 , wherein the calculating link confidence scores includes comparing the probabilities to a probability that the text string is contained in a generic data source that is not associated with any of the concepts in the knowledge base. 6 . The method of claim 1 , wherein the text string is linked to a second one of the concepts in the knowledge base. 7 . The method of claim 1 , wherein the link applies to a subset of the text string and the subset is indicated in the link. 8 . The method of claim 7 , wherein words in the subset are not consecutive in the text string. 9 . The method of claim 1 , wherein the text string is one of a collection of words, a sentence, a paragraph, and a whole document. 10 . The method of claim 1 , wherein each of the selected data sources includes one or more collection of names for the corresponding concept, a description for the corresponding concept, sentences referring to the corresponding concept, and paragraphs referring to the corresponding concept.
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Hyperlinking · CPC title
Selection or weighting of terms for indexing · CPC title
Selection or weighting of terms from queries, including natural language queries · CPC title
using probabilistic model · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.