Multiple rule development support for text analytics

US9519706B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9519706-B2
Application numberUS-201113306054-A
CountryUS
Kind codeB2
Filing dateNov 29, 2011
Priority dateNov 29, 2011
Publication dateDec 13, 2016
Grant dateDec 13, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, computer program products and systems are provided for applying text analytics rules to a corpus of documents. The embodiments facilitate selection of a document from the corpus within a graphical user interface (GUI), where the GUI opens the selected document to display text of the selected document and also a token parse tree that lists tokens associated with text components of the document, facilitate construction of a text analytics rule, via the GUI, by user selection of one or more tokens from the token parse tree, and, in response to a user selecting one or more tokens from the token parse tree, provide a list of hits via the GUI, the hits including a listing of text components from documents of the corpus that are associated with tokens that comply with the constructed text analytics rule.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for applying text analytics rules to a corpus of documents, the method comprising: facilitating selection of a document from the corpus of documents within a graphical user interface (GUI), wherein the GUI opens the selected document to display text of the selected document and also displays a token parse tree that lists tokens corresponding to text components of the document, wherein each token identifies an individual word, one or more numbers, or a punctuation mark and wherein each token is individually selectable via the GUI; facilitating construction of a text analytics rule, via the GUI, by user selection of at least one token and at least one corresponding token component from the token parse tree; in response to a user selecting the at least one token and the at least one corresponding token component from the token parse tree to facilitate construction of the text analytics rule, applying the constructed text analytics rule to the corpus of documents and providing a list of matching documents or hits via the GUI, the list of matching documents or hits including a listing of matching text components from of the corpus of documents that are associated with tokens that comply with the constructed text analytics rule, the listing of matching text components including, on each line, the identified text component that matches or complies with the text analytics rule as well as preceding text components and post text components in relation to a matching text component; and presenting text of the selected document within a first window of the GUI, the token parse tree for the selected document in a second window of the GUI, and the list of matching documents or hits in a third window of the GUI so as to facilitate visualization of information within the first, second and third windows simultaneously within a single display, none of the displayed windows obstructing a view of any other window of the displayed windows; wherein a change by user selection of at least one other token and at least one other corresponding token component from the token parse tree for the selected document results in the construction of a different text analytics rule and also a modified list of matching documents or hits via the GUI. 2. The method of claim 1 , wherein each token of the token parse tree is expandable to display corresponding token components associated with the token, the corresponding token components for each token comprising different types of information in relation to the associated corresponding text component and its location within the selected document. 3. The method of claim 2 , further comprising: in response to a user selecting a text component of the displayed text for the selected document, expanding a token associated with the selected text component, via the GUI, to initiate construction of a rule at the expanded token. 4. The method of claim 2 , further comprising: facilitating, via the GUI, modification of the text analytics rule by at least one of selection of one or more new token components from the token parse tree and deletion of one or more previously selected token components from the token parse tree; and in response to the modification of the text analytics rule, providing a modified list of matching documents or hits via the GUI, the modified list of matching documents or hits comprising a list of text components from the corpus of documents that are associated with tokens that comply with the modified text analytics rule. 5. The method of claim 4 , wherein the facilitation of a text analytics rule modification via the GUI further comprises: facilitating construction of a new token that combines a plurality of tokens of the token parse tree so as to create a text analytics rule that combines a plurality of text components as a single phrase. 6. The method of claim 1 , further comprising: facilitating, via the GUI, selection of a hit from the list of matching documents or hits that is associated with a second selected document within the corpus of documents that is different from the selected document; and in response to selection of the hit from the list of matching documents or hits, opening the second selected document within the GUI so as to display text of the second selected document and also a token parse tree that lists tokens associated with text components of the second selected document. 7. The method of claim 6 , wherein a plurality of selected documents are simultaneously open in the GUI to facilitate construction of a plurality of text analytics rules utilizing the token parse trees associated with the select documents such that a plurality of lists of matching documents or hits are provided, wherein each list matching documents or hits comprises a list of text components from of the corpus of documents that are associated with tokens that comply with the constructed text analytics rule of a corresponding selected document. 8. The method of claim 7 , further comprising: applying a constructed text analytics rule to the corpus of documents utilizing a token parse tree associated with one of the selected documents, wherein the application of the constructed text analytics rule to the corpus of documents results in a modification to a number of hits in at least one list of matching documents or hits for a constructed text analytics rule associated with another selected document. 9. The method of claim 1 , further comprising: providing, via the GUI, a performance indication in relation to an ability of the constructed text analytics rule to provide hits within the list of matching documents or hits that are designated as valid for the document corpus. 10. The method of claim 9 , wherein the performance indication is determined by a comparison of a number of tokens in the list of matching documents or hits compared to a number of tokens from the corpus of documents that do not currently conform to any text analytics rule. 11. A system for applying text analytics rules to a corpus of documents, the system comprising a processor configured with logic to: facilitate selection of a document from the corpus of documents within a graphical user interface (GUI), wherein the selected document is opened to display text of the selected document within the GUI and to further display a token parse tree that lists tokens corresponding to text components of the document, wherein each token identifies an individual word, one or more numbers, or a punctuation mark and wherein each token is individually selectable via the GUI; facilitate construction of a text analytics rule, via interaction with the GUI, by user selection of at least one token and at least one corresponding token component from the token parse tree; in response to a user selecting the at least one token and the at least one corresponding token component from the token parse tree to facilitate construction of the text analytics rule, apply the constructed text analytics rule to the corpus of documents and provide a list of matching documents or hits via the GUI, the list of matching documents or hits including a listing of matching text components from the corpus of documents that are associated with tokens that comply with the constructed text analytics rule, the listing of matching text components including, on each line, the identified text component that matches or complies with the text analytics rule as well as preceding text components and post text components in relation to the matching text component; and present text of the selected document within a first window of the GUI, the token parse tree for the selected document in a second window of the GUI

Assignees

Inventors

Classifications

  • G06F16/34Primary

    Browsing; Visualisation therefor (browsing or visualisation for clustering or classification G06F16/358) · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Parsing · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9519706B2 cover?
Methods, computer program products and systems are provided for applying text analytics rules to a corpus of documents. The embodiments facilitate selection of a document from the corpus within a graphical user interface (GUI), where the GUI opens the selected document to display text of the selected document and also a token parse tree that lists tokens associated with text components of the d…
Who is the assignee on this patent?
Luke James S, IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/34. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).