Ingestion plan based on table uniqueness

US9720945B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9720945-B2
Application numberUS-201615140976-A
CountryUS
Kind codeB2
Filing dateApr 28, 2016
Priority dateNov 5, 2015
Publication dateAug 1, 2017
Grant dateAug 1, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present invention disclose a method for processing tabular data. In various embodiments, an electronic document is received through a network, along with associated metadata. A plurality of table markers, or tabular data markers, are identified, in response to analyzing the received electronic document for said markers. References and citations associated with the plurality of tabular data markers are identified. A graphical representation of the relationship between identified tabular data markers and the identified references is generated. A uniqueness score is calculated, based on the generated graph and an ingestion plan is generated for the received electronic documents based on the calculated uniqueness score value.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for processing tabular data, the method comprising: receiving a plurality of electronic documents on a computer through a network, the plurality of electronic documents being stored on a remote server, the network being an internet connection; receiving a plurality of metadata from the remote server through the network, the metadata being a plurality of identifying information associated with the received plurality of electronic documents; indexing the received plurality of electronic documents and the received plurality of metadata in a data store; identifying a plurality of tabular data markers, in response to analyzing the received electronic document and associated metadata; identifying references for association with the identified plurality of tabular data markers by natural language analysis; generating a graphical representation of the relationship between the identified tabular data markers and identified references, the graphical representation comprising a plurality of inbound directional edges and a plurality of vertices, wherein the directional edges are based on the identified references having an amplitude based on a count of identified references, and the vertices of the plurality of vertices are tabular data of the identified references; calculating a uniqueness score value based on the generated graphical representation, the uniqueness score comprising a first value based on the plurality of inbound directional edges and a second value based on the plurality of vertices; modifying the calculated uniqueness score based on one or more of: a first count based on the directional edges to a vertex in the graphical representation; multiplying the uniqueness score by zero, in response to the count of direction edges not exceeding a threshold; and in response to input by a user, a second count based on the of vertexes in the graphical representation exceeding a threshold; and generating an ingestion plan for the received electronic documents for display based on the calculated uniqueness score value, the ingestion plan comprising an ordered list of the received plurality of electronic documents.

Assignees

Inventors

Classifications

  • Indexing structures · CPC title

  • G06V30/412Primary

    Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title

  • based on graph theory, e.g. minimum spanning trees [MST] or graph cuts · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9720945B2 cover?
Embodiments of the present invention disclose a method for processing tabular data. In various embodiments, an electronic document is received through a network, along with associated metadata. A plurality of table markers, or tabular data markers, are identified, in response to analyzing the received electronic document for said markers. References and citations associated with the plurality o…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06V30/412. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).