Segmenting and interpreting a document, and relocating document fragments to corresponding sections

US10176889B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10176889-B2
Application numberUS-201715428429-A
CountryUS
Kind codeB2
Filing dateFeb 9, 2017
Priority dateFeb 9, 2017
Publication dateJan 8, 2019
Grant dateJan 8, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer program product comprising a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to receive a document having multiple section headers, segment the document into at least first and second sections based on the section headers, segment items in the first section into fragments and identify a section type for each of the fragments, determine that the identified section type for at least one of the fragments better matches a type of the second section than it matches a type of the first section, and re-locate the at least one of the fragments to the second section.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product comprising a non-transitory computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: receive a document having section headers; segment the document into at least first and second sections based on the section headers; segment items in the first section into fragments including a first fragment and a second fragment; identify a section type for each of the fragments using multiple section type-specific lexicons that include a first section type-specific lexicon that corresponds to a section type of the first section and a second section type-specific lexicon that corresponds to a section type of the second section, wherein the first fragment is identified as corresponding to a different section type than the second fragment; determine a first quantity of first fragments of the multiple fragments and a second quantity of second fragments of the multiple fragments, wherein the first fragments correspond to a first section type of the first section and the second fragments correspond to a second section type of a second section of the document; determine that the first quantity of the first fragments exceeds the second quantity of the second fragments by a predetermined quantity; and based on exceeding the predetermined quantity, re-locate the second fragments to the second section in the document or reclassify the second fragments to correspond to the first section type. 2. The computer program product of claim 1 , wherein the document comprises an electronic health record. 3. The computer program product of claim 1 , wherein the items include non-alphabetical and non-numerical symbols. 4. The computer program product of claim 1 , wherein, to segment the items into fragments, the program instructions are executable by the processor to cause the processor to identify a portion of the items that match one of the multiple section type-specific lexicons. 5. The computer program product of claim 1 , wherein, to segment the items into fragments, the program instructions are executable by the processor to identify individual words in the first section that correspond to one of the multiple section type-specific lexicons. 6. The computer program product of claim 1 , wherein, to segment the items into fragments, the program instructions are executable by the processor to identify phrases and sentences in the first section that correspond to one of the multiple section type-specific lexicons. 7. A computer program product comprising a non-transitory computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: segment items in a first section of a document into multiple fragments; determine a section type of each of the multiple fragments; determine a first quantity of first fragments of the multiple fragments and a second quantity of second fragments of the multiple fragments, wherein the first fragments correspond to a first section type of the first section and the second fragments correspond to a second section type of a second section of the document; determine that the first quantity of the first fragments exceeds the second quantity of the second fragments by a predetermined quantity; and based on exceeding the predetermined quantity, re-locate the second fragments to the second section in the document or reclassify the second fragments to correspond to the first section type. 8. The computer program product of claim 7 , wherein the program instructions executable by the processor cause the processor to reclassify the second fragments to correspond to the first section type when the first quantity of the first fragments exceeds the second quantity of the second fragments by the predetermined quantity, and wherein the program instructions are further executable by the processor to cause the processor to update a section type-specific lexicon corresponding to the first section type to reflect the reclassification. 9. The computer program product of claim 7 , wherein the program instructions are executable to cause the processor to generate the second section based on exceeding the predetermined quantity. 10. The computer program product of claim 7 , wherein the document comprises a written representation of an oral document. 11. A computer program product comprising a non-transitory computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: segment the document into multiple sections, wherein each of the multiple sections corresponds to a respective section type of multiple section types; segment items in a first section of the multiple sections into multiple fragments, wherein the first section corresponds to a first section type of the multiple section types; determine a section type of each of the multiple fragments in the first section; determine whether the multiple fragments include fragments that correspond to different section types and that are interspersed among each other in even proportions; and based on the multiple fragments in the first section including fragments that correspond to different section types and that are interspersed among each other in even proportions: determine that the fragments that correspond to different section types and that are interspersed among each other in even proportions do not belong in the first section; generate a new section corresponding to a section type that corresponds to a section type that is different than the multiple section types; and re-locate the fragments that correspond to different section types and that are interspersed among each other in even proportions to the new section. 12. The computer program product of claim 11 , wherein the program instructions are executable by the processor to cause the processor to determine that the multiple fragments include fragments that correspond to different section types and that are interspersed among each other in even proportions when the fragments that correspond to different section types and that are interspersed among each other in even proportions are in an alternating fashion.

Assignees

Inventors

Classifications

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Parsing · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Display of layout of documents; Previewing · CPC title

  • Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10176889B2 cover?
A computer program product comprising a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to receive a document having multiple section headers, segment the document into at least first and second sections based on the section headers, segment items in the first section into fragments and id…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G16H10/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).