Training a natural language processing model with information retrieval model annotations
US-9536522-B1 · Jan 3, 2017 · US
US2016012122A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016012122-A1 |
| Application number | US-201414330381-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 14, 2014 |
| Priority date | Jul 14, 2014 |
| Publication date | Jan 14, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to an aspect, automatically linking text to concepts in a knowledge base using differential analysis includes receiving a text string and selecting, based on contents of the text string, a plurality of data sources that correspond to concepts in the knowledge base. In a further aspect, automatically linking the text to the concepts includes calculating, for each of the selected data sources, a probability that the text string is output by a language model built using the selected data source, calculating a probability that the text string is output by a generic language model, calculating link confidence scores for each concept based on a differential analysis of the probabilities, and creating a link from the text string to one of the concepts in the knowledge base. The creating is based on a link confidence score of the concept being more than a threshold value away from a prescribed threshold.
Opening claim text (preview).
1 - 10 . (canceled) 11 . A computer program product for automatically linking text to concepts in a knowledge base, the computer program product comprising: a tangible storage medium readable by a processing circuit and storing instructions for execution by the processing circuit to perform a method comprising: receiving a text string; selecting a plurality of data sources that correspond to concepts in the knowledge base, the selecting based on contents of the text string; calculating, for each of the selected data sources, a probability that the text string is output by a language model built using the selected data source; calculating a probability that the text string is output by a generic language model; calculating link confidence scores for each concept based on a differential analysis of the probabilities; and creating a link from the text string to one of the concepts in the knowledge base, the creating based on a link confidence score of the concept being more than a threshold value away from a prescribed threshold. 12 . The computer program product of claim 11 , wherein the differential analysis compares at least one of: the probability that the text string is output by a language model built using a data source to the probability that the text string is output by a generic language model; and the probability that the text string is output by a language model built using a data source to a probability that the text string is output by a language model built using a competing data source. 13 . The computer program product of claim 11 , wherein the generic language model is derived from a generic data source not specific to any of the concepts in the knowledge base. 14 . The computer program product of claim 11 , wherein the calculating link confidence scores includes comparing the probabilities to a probability that the text string is contained in a generic data source that is not associated with any of the concepts in the knowledge base. 15 . The computer program product of claim 11 , wherein the text string is linked to a second one of the concepts in the knowledge base. 16 . The computer program product of claim 11 , wherein the link applies to a subset of the text string and the subset is indicated in the link, and words in the subset are not consecutive in the text string. 17 . A system for automatically linking text to concepts in a knowledge base, the system comprising: a memory having computer readable computer instructions; and a processor for executing the computer readable instructions, the computer readable instructions including: receiving a text string; selecting a plurality of data sources that correspond to concepts in the knowledge base, the selecting based on contents of the text string; calculating, for each of the selected data sources, a probability that the text string is output by a language model built using the selected data source; calculating a probability that the text string is output by a generic language model; calculating link confidence scores for each concept based on a differential analysis of the probabilities; and creating a link from the text string to one of the concepts in the knowledge base, the creating based on a link confidence score of the concept being more than a threshold value away from a prescribed threshold. 18 . The system of claim 17 , wherein the differential analysis compares at least one of: the probability that the text string is output by a language model built using a data source to the probability that the text string is output by a generic language model; and the probability that the text string is output by a language model built using a data source to a probability that the text string is output by a language model built using a competing data source. 19 . The system of claim 17 , wherein the generic language model is derived from a generic data source not specific to any of the concepts in the knowledge base. 20 . The system of claim 17 , wherein the calculating link confidence scores includes comparing the probabilities to a probability that the text string is contained in a generic data source that is not associated with any of the concepts in the knowledge base.
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Hyperlinking · CPC title
using probabilistic model · CPC title
Extracting rules from data · CPC title
Selection or weighting of terms from queries, including natural language queries · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.