Automatically linking text to concepts in a knowledge base

US2016012336A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016012336-A1
Application numberUS-201514657343-A
CountryUS
Kind codeA1
Filing dateMar 13, 2015
Priority dateJul 14, 2014
Publication dateJan 14, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to an aspect, automatically linking text to concepts in a knowledge base using differential analysis includes receiving a text string and selecting, based on contents of the text string, a plurality of data sources that correspond to concepts in the knowledge base. In a further aspect, automatically linking the text to the concepts includes calculating, for each of the selected data sources, a probability that the text string is output by a language model built using the selected data source, calculating a probability that the text string is output by a generic language model, calculating link confidence scores for each concept based on a differential analysis of the probabilities, and creating a link from the text string to one of the concepts in the knowledge base. The creating is based on a link confidence score of the concept being more than a threshold value away from a prescribed threshold.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for automatically linking text to concepts in a knowledge base using a differential analysis, the method comprising: receiving a text string; selecting a plurality of data sources that correspond to concepts in the knowledge base, the selecting based on contents of the text string; calculating, for each of the selected data sources, a probability that the text string is output by a language model built using the selected data source; calculating a probability that the text string is output by a generic language model; calculating link confidence scores for each concept based on a differential analysis of the probabilities; and creating a link from the text string to one of the concepts in the knowledge base, the creating based on a link confidence score of the concept being more than a threshold value away from a prescribed threshold. 2 . The method of claim 1 , wherein the differential analysis compares the probability that the text string is output by a language model built using a data source to the probability that the text string is output by a generic language model. 3 . The method of claim 1 , wherein the differential analysis compares the probability that the text string is output by a language model built using a data source to a probability that the text string is output by a language model built using a competing data source. 4 . The method of claim 1 , wherein the generic language model is derived from a generic data source not specific to any of the concepts in the knowledge base. 5 . The method of claim 1 , wherein the calculating link confidence scores includes comparing the probabilities to a probability that the text string is contained in a generic data source that is not associated with any of the concepts in the knowledge base. 6 . The method of claim 1 , wherein the text string is linked to a second one of the concepts in the knowledge base. 7 . The method of claim 1 , wherein the link applies to a subset of the text string and the subset is indicated in the link. 8 . The method of claim 7 , wherein words in the subset are not consecutive in the text string. 9 . The method of claim 1 , wherein the text string is one of a collection of words, a sentence, a paragraph, and a whole document. 10 . The method of claim 1 , wherein each of the selected data sources includes one or more collection of names for the corresponding concept, a description for the corresponding concept, sentences referring to the corresponding concept, and paragraphs referring to the corresponding concept.

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Hyperlinking · CPC title

  • G06F16/313Primary

    Selection or weighting of terms for indexing · CPC title

  • Selection or weighting of terms from queries, including natural language queries · CPC title

  • using probabilistic model · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016012336A1 cover?
According to an aspect, automatically linking text to concepts in a knowledge base using differential analysis includes receiving a text string and selecting, based on contents of the text string, a plurality of data sources that correspond to concepts in the knowledge base. In a further aspect, automatically linking the text to the concepts includes calculating, for each of the selected data s…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/313. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).