Method and apparatus for managing recommendation models
US-9218605-B2 · Dec 22, 2015 · US
US2016104077A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016104077-A1 |
| Application number | US-201514879349-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 9, 2015 |
| Priority date | Oct 10, 2014 |
| Publication date | Apr 14, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for extracting table data from text documents using machine learning are provided. The systems and methods comprise electronically receiving at a computer system a document having one or more tables, each table having one or more whitespace features, processing the document using a first computer model executed by the computer system to classify each row of the one or more tables as a header row or a data row, processing the document using a second computer model executed by the computer system to classify each whitespace feature in each row conditional on classification of each row by the first computer model, the second computer model identifying whether a whitespace feature corresponds to information missing from the one or more tables, and generating an output of the classified whitespace features and storing the output in a digital file.
Opening claim text (preview).
What is claimed is: 1 . A method for electronically extracting table data from text documents using machine learning, comprising: electronically receiving at a computer system a document having one or more tables, each table having one or more whitespace features; processing the document using a first computer model executed by the computer system to classify each row of the one or more tables as a header row or a data row; processing the document using a second computer model executed by the computer system to classify each whitespace feature in each row conditional on classification of each row by the first computer model, the second computer model identifying whether a whitespace feature corresponds to information missing from the one or more tables; and generating an output of the classified whitespace features and storing the output in a digital file. 2 . The method of claim 1 , wherein the first computer model comprises a random fields classifier. 3 . The method of claim 2 , wherein the random fields classifier is trained using a set of training tables. 4 . The method of claim 1 , wherein the second computer model comprises a multinomial logistic classifier. 5 . The method of claim 4 , wherein the multinomial logistic classifier is trained using a set of training tables. 6 . The method of claim 1 , wherein the information missing comprises a missing cell. 7 . A non-transitory computer-readable medium having computer-readable instructions stored thereon which, when executed by a computer system, cause the computer system to perform the steps of: electronically receiving at a computer system a document having one or more tables, each table having one or more whitespace features; processing the document using a first computer model executed by the computer system to classify each row of the one or more tables as a header row or a data row; processing the document using a second computer model executed by the computer system to classify each whitespace feature in each row conditional on classification of each row by the first computer model, the second computer model identifying whether a whitespace feature corresponds to information missing from the one or more tables; and generating an output of the classified whitespace features and storing the output in a digital file. 8 . The non-transitory computer-readable medium of claim 7 , wherein the first computer model comprises a random fields classifier. 9 . The non-transitory computer-readable medium of claim 8 , wherein the random fields classifier is trained using a set of training tables. 10 . The non-transitory computer-readable medium of claim 7 , wherein the second computer model comprises a multinomial logistic classifier. 11 . The non-transitory computer-readable medium of claim 10 , wherein the multinomial logistic classifier is trained using a set of training tables. 12 . The non-transitory computer-readable medium of claim 7 , wherein the information missing comprises a missing cell. 13 . A system for electronically extracting table data from text documents using machine learning, comprising: a computer system for electronically receiving a document having one or more tables, each table having one or more whitespace features; an engine executed by the computer system, the engine: processing the document using a first computer model executed by the computer system to classify each row of the one or more tables as a header row or a data row; processing the document using a second computer model executed by the computer system to classify each whitespace feature in each row conditional on classification of each row by the first computer model, the second computer model identifying whether a whitespace feature corresponds to information missing from the one or more tables; and generating an output of the classified whitespace features and storing the output in a digital file. 14 . The system of claim 13 , wherein the first computer model comprises a random fields classifier. 15 . The system of claim 14 , wherein the random fields classifier is trained using a set of training tables. 16 . The system of claim 13 , wherein the second computer model comprises a multinomial logistic classifier. 17 . The system of claim 16 , wherein the multinomial logistic classifier is trained using a set of training tables. 18 . The system of claim 13 , wherein the information missing comprises a missing cell.
Probabilistic graphical models, e.g. probabilistic networks · CPC title
of tables; using ruled lines · CPC title
Handling of whitespace · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.