Identifying unchecked criteria in unstructured and semi-structured data

US9430464B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9430464-B2
Application numberUS-201314136314-A
CountryUS
Kind codeB2
Filing dateDec 20, 2013
Priority dateDec 20, 2013
Publication dateAug 30, 2016
Grant dateAug 30, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system and computer-usable medium are disclosed for identifying unchecked criteria in unstructured and semi-structured data within a form. Text spans representing unchecked criteria within unstructured text in a form are detected and classified to facilitate accurate interpretation of the text. Section identification and annotation operations are then performed to identify and categorize sections within the form. Checklist sections within the form, along with associated checkmarks and boxes, are then identified, followed by the identification of checked item, criteria scope, and previously undetected checklist sections. Once all checklist sections and checked criteria have been identified, remaining text spans within a checklist section are annotated as unchecked criteria.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a processor; a data bus coupled to the processor; and a computer-usable medium embodying computer program code, the computer-usable medium being coupled to the data bus, the computer program code used for identifying unchecked criteria in unstructured data within a form and comprising instructions executable by the processor and configured for: identifying checked data in a form; identifying a first set of unstructured data as pertinent to the checked data in the form; and, identifying unchecked criteria in unstructured data within the form based upon the identifying the first set of unstructured data, the identifying comprising detecting and classifying text spans representing the unchecked criteria within the form to facilitate accurate interpretation of the text spans, the unchecked criteria comprising a discrete item that represents a question in the form lacking a response. 2. The system of claim 1 , further comprising: identifying a first checklist section in the form, the first checklist section containing a first set of checklist data pertinent to the checked data in the form. 3. The system of claim 2 , further comprising: re-categorizing a non-checklist section as a second checklist section based upon a criteria, the re-categorizing performed after identifying a second set of unstructured data contained in the non-checklist section as pertinent to the checked data in the form. 4. The system of claim 3 , further comprising: processing the second set of unstructured data to generate a second set of checklist data; and associating the second set of checklist data with the second checklist section. 5. The system of claim 4 , further comprising: processing the non-checklist section, the first checklist section, and the second checklist section to identify unchecked criteria. 6. The system of claim 5 , further comprising: using a first natural language process on the non-checklist section and a second natural language process on the first and second checklist sections of the form to identify the unchecked criteria. 7. A non-transitory, computer-readable storage medium embodying computer program code, the computer program code comprising computer executable instructions configured for: identifying checked data in a form; identifying a first set of unstructured data as pertinent to the checked data in the form; and, identifying unchecked criteria in unstructured data within the form based upon the identifying the first set of unstructured data, the identifying comprising detecting and classifying text spans representing the unchecked criteria within the form to facilitate accurate interpretation of the text spans, the unchecked criteria comprising a discrete item that represents a question in the form lacking a response. 8. The non-transitory, computer-readable storage medium of claim 7 , further comprising: identifying a first checklist section in the form, the first checklist section containing a first set of checklist data pertinent to the checked data in the form. 9. The non-transitory, computer-readable storage medium of claim 8 , further comprising: re-categorizing a non-checklist section as a second checklist section based upon a criteria, the re-categorizing performed after identifying a second set of unstructured data contained in the non-checklist section as pertinent to the checked data in the form. 10. The non-transitory, computer-readable storage medium of claim 9 , further comprising: processing the second set of unstructured data to generate a second set of checklist data; and associating the second set of checklist data with the second checklist section. 11. The non-transitory, computer-readable storage medium of claim 10 , further comprising: processing the non-checklist section, the first checklist section, and the second checklist section to identify unchecked criteria. 12. The non-transitory, computer-readable storage medium of claim 11 , further comprising: using a first natural language process on the non-checklist section and a second natural language process on the first and second checklist sections of the form to identify the unchecked criteria. 13. The non-transitory, computer-readable storage medium of claim 7 , wherein the computer executable instructions are deployable to a client system from a server system at a remote location. 14. The non-transitory, computer-readable storage medium of claim 7 , wherein the computer executable instructions are provided by a service provider to a user on an on-demand basis.

Assignees

Inventors

Classifications

  • G06F40/40Primary

    Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Ensuring data consistency and integrity · CPC title

  • Clustering; Classification · CPC title

  • Presentation of query results · CPC title

  • G06F17/28Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9430464B2 cover?
A method, system and computer-usable medium are disclosed for identifying unchecked criteria in unstructured and semi-structured data within a form. Text spans representing unchecked criteria within unstructured text in a form are detected and classified to facilitate accurate interpretation of the text. Section identification and annotation operations are then performed to identify and categor…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).