Framework for annotated-text search using indexed parallel fields

US10083398B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10083398-B2
Application numberUS-201414569690-A
CountryUS
Kind codeB2
Filing dateDec 13, 2014
Priority dateDec 13, 2014
Publication dateSep 25, 2018
Grant dateSep 25, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach is provided in which a knowledge manager generates term tokens from terms included in an original text stream, and generates annotation tokens with corresponding term location information. In turn, the knowledge manager generates a knowledge structure that indexes the term tokens into original text fields and indexes the annotation tokens into parallel fields that align to the original text fields based upon the term location information.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method implemented by an information handling system that includes a memory and a processor, the method comprising: generating a plurality of term tokens from a plurality of terms that are located at a plurality of term locations in an original text stream; generating a plurality of annotation tokens from a plurality of annotations corresponding to the plurality of terms, wherein each of the plurality of annotation tokens includes term location information based on one or more of the plurality of term locations of its corresponding one or more of the plurality of terms; generating a knowledge structure that stores the plurality of term tokens in a plurality of original text fields and stores the plurality of annotation tokens in a plurality of parallel fields, wherein each of the plurality of annotation tokens align to at least one of the plurality of original text fields based upon its corresponding term location information; receiving a search request that comprises a set of query terms, a set of query annotation types, and a relative annotation position parameter; creating a plurality of sub queries based on the set of query terms and the set of query annotation types; searching the knowledge structure using the plurality of sub queries, resulting in one or more term token matches and one or more annotation token matches; and generating search results based upon the one or more term token matches and the one or more annotation token matches, wherein the generation of the search results further comprises: determining that a first one of the plurality of annotation tokens corresponds to one of the plurality of sub queries; identifying a position increment value corresponding to the first annotation token, wherein the position increment value indicates a relative position of the first annotation token to a second annotation token; and including the first annotation token in the search results in response to determining that the position increment value adheres to the relative annotation position parameter. 2. The method of claim 1 wherein a first one of the one or more annotation token matches corresponds to a first annotation type and a second one of the one or more annotation token matches corresponds to a second annotation type. 3. The method of claim 2 wherein the first annotation type is selected from the group consisting of an entity annotation type, a synonym annotation type, an abbreviation annotation type, a concept annotation type, a sentiment annotation type, a geospatial coordinate annotation type, a syntactic-relationship structure annotation type, and a co-reference annotation type. 4. The method of claim 1 wherein the generation of the knowledge structure further comprises: adding a first set of parallel fields to the knowledge structure, the first set of parallel fields comprised in the plurality of parallel fields; indexing a first set of annotation tokens corresponding to the first annotation type into the first set of parallel fields; adding a second set of parallel fields to the knowledge structure, the second set of parallel fields comprised in the plurality of parallel fields; and indexing a second set of annotation tokens corresponding to the second annotation type into the second set of parallel fields, wherein the first set of annotation tokens and the second set of annotation tokens are included in the plurality of annotation tokens. 5. The method of claim 4 wherein a plurality of the first set of annotation tokens are indexed into a single parallel field in the first set of parallel fields. 6. The method of claim 1 further comprising: selecting an annotation token match from the one or more annotation token matches; identifying one of the plurality of term tokens that align to the selected annotation token; extracting the term from the identified term token; and including the extracted term in the generation of the search results. 7. An information handling system comprising: one or more processors; a memory coupled to at least one of the processors; and a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions of: generating a plurality of term tokens from a plurality of terms that are located at a plurality of term locations in an original text stream; generating a plurality of annotation tokens from a plurality of annotations corresponding to the plurality of terms, wherein each of the plurality of annotation tokens includes term location information based on one or more of the plurality of term locations of its corresponding one or more of the plurality of terms; generating a knowledge structure that stores the plurality of term tokens in a plurality of original text fields and stores the plurality of annotation tokens in a plurality of parallel fields, wherein each of the plurality of annotation tokens align to at least one of the plurality of original text fields based upon its corresponding term location information; receiving a search request that comprises a set of query terms, a set of query annotation types, and a relative annotation position parameter; creating a plurality of sub queries based on the set of query terms and the set of query annotation types; searching the knowledge structure using the plurality of sub queries, resulting in one or more term token matches and one or more annotation token matches; and generating search results based upon the one or more term token matches and the one or more annotation token matches, wherein the generation of the search results further comprises: determining that a first one of the plurality of annotation tokens corresponds to one of the plurality of sub queries; identifying a position increment value corresponding to the first annotation token, wherein the position increment value indicates a relative position of the first annotation token to a second annotation token; and including the first annotation token in the search results in response to determining that the position increment value adheres to the relative annotation position parameter. 8. The information handling system of claim 7 wherein a first one of the one or more annotation token matches corresponds to a first annotation type and a second one of the one or more annotation token matches corresponds to a second annotation type, and wherein the first annotation type is selected from the group consisting of an entity annotation type, a synonym annotation type, an abbreviation annotation type, a concept annotation type, a sentiment annotation type, a geospatial coordinate annotation type, a syntactic-relationship structure annotation type, and a co-reference annotation type. 9. The information handling system of claim 7 wherein the one or more processors perform additional actions comprising: adding a first set of parallel fields to the knowledge structure, the first set of parallel fields comprised in the plurality of parallel fields; indexing a first set of annotation tokens corresponding to the first annotation type into the first set of parallel fields; adding a second set of parallel fields to the knowledge structure, the second set of parallel fields comprised in the plurality of parallel fields; and indexing a second set of annotation tokens corresponding to the second annotation type into the second set of parallel fields, wherein the first set of annotation tokens and the second set of annotation tokens are included in the plurality of annotation tokens. 10. The information handling system of claim 7 wherein the one or more processors perform additional actions comprising: selecting an annotation token ma

Assignees

Inventors

Classifications

  • Annotation, e.g. comment data or footnotes · CPC title

  • Version control (for software G06F8/71) · CPC title

  • G06N5/022Primary

    Knowledge engineering; Knowledge acquisition · CPC title

  • Information retrieval; Database structures therefor; File system structures therefor · CPC title

  • Querying · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10083398B2 cover?
An approach is provided in which a knowledge manager generates term tokens from terms included in an original text stream, and generates annotation tokens with corresponding term location information. In turn, the knowledge manager generates a knowledge structure that indexes the term tokens into original text fields and indexes the annotation tokens into parallel fields that align to the origi…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N5/022. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 25 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).