Processing structured documents using convolutional neural networks
US-10387531-B1 · Aug 20, 2019 · US
US11550871B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11550871-B1 |
| Application number | US-201916544717-A |
| Country | US |
| Kind code | B1 |
| Filing date | Aug 19, 2019 |
| Priority date | Aug 18, 2015 |
| Publication date | Jan 10, 2023 |
| Grant date | Jan 10, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Structured documents are processed using convolutional neural networks. For example, the processing can include receiving a rendered form of a structured document; mapping a grid of cells to the rendered form; assigning a respective numeric embedding to each cell in the grid, comprising, for each cell: identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell, mapping the identified content to a numeric embedding for the identified content, and assigning the numeric embedding for the identified content to the cell; generating a matrix representation of the structured document from the numeric embeddings assigned to the cells of the grids; and generating neural network features of the structured document by processing the matrix representation of the structured document through a subnetwork comprising one or more convolutional neural network layers.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, by one or more computers, a rendered form of a structured document, wherein the structured document comprises image content and text content; mapping a grid of cells to the rendered form; assigning a respective numeric embedding to each cell in the grid, comprising, for each cell: identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell, identifying, from a plurality of types of content that includes at least one image content type and at least one text content type, a type of content associated with the identified content; mapping the identified content to a numeric embedding for the identified content using an embedding function that is specific to the type of content associated with the identified content, and assigning the numeric embedding for the identified content to the cell; generating a matrix representation of the structured document from the numeric embeddings assigned to the cells of the grids, wherein the matrix representation reflects the relative location of content from the structured document when the structured document is in the rendered form, wherein the matrix representation comprises a plurality of entries, each entry corresponding to a different location in the rendered form of the structured document, and the generating comprising: mapping each numeric embedding to an entry that corresponds to the location of the corresponding identified content in the rendered form of the structure document; and generating neural network features of the structured document by processing the matrix representation of the structured document through a subnetwork comprising one or more convolutional neural network layers, wherein the neural network features reflect the relative location of content from the structured document when the structured document is in the rendered form. 2. The method of claim 1 , further comprising: processing the neural network features through one or more additional neural network layers to generate classification data for the structured document. 3. The method of claim 1 , wherein identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell comprises: identifying one or more pieces of content that are at least partially displayed in the portion of the rendered form that is mapped to the cell and that are of one of the plurality of content types; and selecting a piece of content of the one or more pieces of content that makes up a largest proportion of the portion of the rendered form. 4. The method of claim 3 , wherein mapping the identified content to a numeric embedding for the identified content comprises: selecting an embedding function from a plurality of embedding functions that are each specific to content of a corresponding content type, wherein each embedding function is configured to receive content of the corresponding content type or an identifier for the content of the corresponding content type and to map the content of the corresponding content type to a numeric embedding for the content of the corresponding content type in accordance with current values of a set of parameters for the embedding function; and mapping the identified content to the numeric embedding for the identified content by applying the selected embedding function to the identified content. 5. The method of claim 3 , wherein the structured document is associated with a known classification, the method further comprising: processing the neural network features through one or more additional neural network layers to generate classification data for the structured document; determining an error between the generated classification data and the known classification; backpropagating the error through all of the additional neural network layers and all of the layers of the convolutional subnetwork; and adjusting the current values of the parameters of the embedding function using the backpropagated error. 6. The method of claim 1 , wherein each numeric embedding comprises N values, wherein N is an integer greater than one. 7. The method of claim 6 , wherein the matrix representation comprises N matrices, and wherein each matrix includes a respective value from each numeric embedding, and wherein the respective values are each in the same position in the numeric embeddings. 8. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: receiving, by one or more computers, a rendered form of a structured document, wherein the structured document comprises image content and text content; mapping a grid of cells to the rendered form; assigning a respective numeric embedding to each cell in the grid, comprising, for each cell: identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell, identifying, from a plurality of types of content that includes at least one image content type and at least one text content type, a type of content associated with the identified content; mapping the identified content to a numeric embedding for the identified content using an embedding function that is specific to the type of content associated with the identified content, and assigning the numeric embedding for the identified content to the cell; generating a matrix representation of the structured document from the numeric embeddings assigned to the cells of the grids, wherein the matrix representation reflects the relative location of content from the structured document when the structured document is in the rendered form, wherein the matrix representation comprises a plurality of entries, each entry corresponding to a different location in the rendered form of the structured document, and the generating comprising: mapping each numeric embedding to an entry that corresponds to the location of the corresponding identified content in the rendered form of the structure document; and generating neural network features of the structured document by processing the matrix representation of the structured document through a subnetwork comprising one or more convolutional neural network layers, wherein the neural network features reflect the relative location of content from the structured document when the structured document is in the rendered form. 9. The system of claim 8 , the operations further comprising: processing the neural network features through one or more additional neural network layers to generate classification data for the structured document. 10. The system of claim 8 , wherein identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell comprises: identifying one or more pieces of content that are at least partially displayed in the portion of the rendered form that is mapped to the cell and that are of one of the plurality of content types; and selecting a piece of content of the one or more pieces of content that makes up a largest proportion of the portion of the rendered form. 11. The system of claim 10 , wherein mapping the identified content to a numeric embedding for the identified content comprises: selecting an embedding function from a plurality of embedding functions that are each specific to content of a corresponding content type, wherein each embedding function is configured to receive content of the corresponding content type or an identifier for the content of the correspond
Backpropagation, e.g. using gradient descent · CPC title
Classification techniques · CPC title
using statistical methods · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Document-oriented image-based pattern recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.