Document decomposition based on determined logical visual layering of document content
US-2024403543-A1 · Dec 5, 2024 · US
US2020104414A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020104414-A1 |
| Application number | US-201816146698-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 28, 2018 |
| Priority date | Sep 28, 2018 |
| Publication date | Apr 2, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A question answering (QA) system comprising memory for storing instructions, and a processor configured to execute the instructions to ingest source documents that include structured data and unstructured data to create a knowledge base, wherein the unstructured data includes table data; create table annotations to represent the table data; store the ingested structured data, unstructured data, and the table annotations in the knowledge base; and determine answers to questions using the knowledge base.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for utilizing table data in a question answering (QA) system, the computer-implemented method comprising: ingesting, by the QA system, source documents that include structured data and unstructured data to create a knowledge base, wherein the unstructured data includes table data; creating, by the QA system, table annotations to represent the table data; storing, by the QA system, the ingested structured data, unstructured data, and the table annotations in the knowledge base; and determining, by the QA system, answers to questions using the knowledge base. 2 . The computer-implemented method of claim 1 , wherein determining, by the QA system, answers to questions using the knowledge base comprises performing a looping cells position mapping and folding method that loops through each cell data of a first table annotation until a keynote words search match is found, recording a cell position number of a cell matching the keynote words search, and retrieving data in a corresponding cell position number from a second table annotation. 3 . The computer-implemented method of claim 1 , wherein an answer to a question is not found directly in the knowledge base. 4 . The computer-implemented method of claim 1 , further comprising: extracting the table data found in the source documents; parsing a table structure of a table that is part of the table data found in the source documents to identify table headers and content of table cells of the table; and determining annotation types of the table headers. 5 . The computer-implemented method of claim 2 , wherein the looping cells position mapping and folding method further comprises retrieving data in the corresponding cell position number from a third table annotation. 6 . The computer-implemented method of claim 3 , wherein the answer is determined, by the QA system, by performing a curve fitting with graph axes intersection and folding method that plots one of a data cell position or a data cell content value to determine a function that is used to determine the answer. 7 . The computer-implemented method of claim 4 , further comprising identifying units of measurement associated with the content of the table cells. 8 . The computer-implemented method of claim 4 , wherein the table annotations links a table identifier of the table with a table column identifier associated with a table column of the table, an annotation type of a table header of the table column, a canonical name of the table header of the table column, and the content of the table cells of the table column. 9 . The computer-implemented method of claim 4 , wherein the content of all the table cells of a table column are linked in a single table annotation. 10 . The computer-implemented method of claim 4 , wherein the content of the table cells of a table column are each linked in a separate table annotation. 11 . The computer-implemented method of claim 4 , wherein the table annotations links a table identifier of the table with a table row identifier associated with a table row of the table, an annotation type of a table header of the table row, a canonical name of the table header of the table row, and the content of the table cells of the table row. 12 . The computer-implemented method of claim 4 , wherein the content of all the table cells of a table row are linked in a single table annotation. 13 . The computer-implemented method of claim 4 , wherein the content of the table cells of a table row are each linked in a separate table annotation. 14 . A question answering (QA) system comprising memory for storing instructions, and a processor configured to execute the instructions to: ingest source documents that include structured data and unstructured data to create a knowledge base, wherein the unstructured data includes table data; create table annotations to represent the table data; store the ingested structured data, unstructured data, and the table annotations in the knowledge base; and determine answers to questions using the knowledge base. 15 . The QA system of claim 14 , wherein determining answers to questions using the knowledge base comprises performing a looping cells position mapping and folding method that loops through each cell data of a first table annotation until a keynote words search match is found, recording a cell position number of a cell matching the keynote words search, and retrieving data in a corresponding cell position number from at least one additional table annotation. 16 . The QA system of claim 14 , wherein an answer to a question is not found directly in the knowledge base, and wherein the answer is determined by performing a curve fitting with graph axes intersection and folding method that plots one of a data cell position or a data cell content value to determine a function that is used to determine the answer. 17 . The QA system of claim 14 , wherein the creating table annotations to represent the table data comprises: extracting the table data found in the source documents; parsing a table structure of a table that is part of the table data found in the source documents to identify table headers and content of table cells of the table; determining annotation types of the table headers; and identifying units of measurement associated with the content of the table cells. 18 . The QA system of claim 14 , wherein the table annotations links a table identifier of the table with a table column identifier associated with a table column of the table, an annotation type of a table header of the table column, a canonical name of the table header of the table column, and the content of the table cells of the table column. 19 . A computer program product for utilizing table data in a question answering (QA) system, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: ingest source documents that include structured data and unstructured data to create a knowledge base, wherein the unstructured data includes table data; extract the table data found in the source documents; parse a table structure of a table that is part of the table data found in the source documents to identify table headers and content of table cells of the table; determine annotation types of the table headers; create table annotations to represent the table data by linking a table identifier of the table with a table column identifier associated with a table column of the table, an annotation type of a table header of the table column, a canonical name of the table header of the table column, and the content of the table cells of the table column; store the ingested structured data, unstructured data, and the table annotations in the knowledge base; and determine answers to questions using the knowledge base. 20 . The computer program product of claim 19 , wherein the program instructions for determining an answer comprises: a looping cells position mapping and folding method that loops through each cell data of a first table annotation until a keynote words search match is found, recording a cell position number of a cell matching the keynote words search, and retrieving data in a corresponding cell position number from at least one additional table annotation; and a curve fitting with graph axes intersection and folding method that plots one of a
Translation of natural language queries to structured queries · CPC title
of tables; using ruled lines · CPC title
Annotation, e.g. comment data or footnotes · CPC title
Computing arrangements using knowledge-based models · CPC title
Indexing structures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.