Processing structured documents using convolutional neural networks

US11550871B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11550871-B1
Application numberUS-201916544717-A
CountryUS
Kind codeB1
Filing dateAug 19, 2019
Priority dateAug 18, 2015
Publication dateJan 10, 2023
Grant dateJan 10, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Structured documents are processed using convolutional neural networks. For example, the processing can include receiving a rendered form of a structured document; mapping a grid of cells to the rendered form; assigning a respective numeric embedding to each cell in the grid, comprising, for each cell: identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell, mapping the identified content to a numeric embedding for the identified content, and assigning the numeric embedding for the identified content to the cell; generating a matrix representation of the structured document from the numeric embeddings assigned to the cells of the grids; and generating neural network features of the structured document by processing the matrix representation of the structured document through a subnetwork comprising one or more convolutional neural network layers.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, by one or more computers, a rendered form of a structured document, wherein the structured document comprises image content and text content; mapping a grid of cells to the rendered form; assigning a respective numeric embedding to each cell in the grid, comprising, for each cell: identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell, identifying, from a plurality of types of content that includes at least one image content type and at least one text content type, a type of content associated with the identified content; mapping the identified content to a numeric embedding for the identified content using an embedding function that is specific to the type of content associated with the identified content, and assigning the numeric embedding for the identified content to the cell; generating a matrix representation of the structured document from the numeric embeddings assigned to the cells of the grids, wherein the matrix representation reflects the relative location of content from the structured document when the structured document is in the rendered form, wherein the matrix representation comprises a plurality of entries, each entry corresponding to a different location in the rendered form of the structured document, and the generating comprising: mapping each numeric embedding to an entry that corresponds to the location of the corresponding identified content in the rendered form of the structure document; and generating neural network features of the structured document by processing the matrix representation of the structured document through a subnetwork comprising one or more convolutional neural network layers, wherein the neural network features reflect the relative location of content from the structured document when the structured document is in the rendered form. 2. The method of claim 1 , further comprising: processing the neural network features through one or more additional neural network layers to generate classification data for the structured document. 3. The method of claim 1 , wherein identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell comprises: identifying one or more pieces of content that are at least partially displayed in the portion of the rendered form that is mapped to the cell and that are of one of the plurality of content types; and selecting a piece of content of the one or more pieces of content that makes up a largest proportion of the portion of the rendered form. 4. The method of claim 3 , wherein mapping the identified content to a numeric embedding for the identified content comprises: selecting an embedding function from a plurality of embedding functions that are each specific to content of a corresponding content type, wherein each embedding function is configured to receive content of the corresponding content type or an identifier for the content of the corresponding content type and to map the content of the corresponding content type to a numeric embedding for the content of the corresponding content type in accordance with current values of a set of parameters for the embedding function; and mapping the identified content to the numeric embedding for the identified content by applying the selected embedding function to the identified content. 5. The method of claim 3 , wherein the structured document is associated with a known classification, the method further comprising: processing the neural network features through one or more additional neural network layers to generate classification data for the structured document; determining an error between the generated classification data and the known classification; backpropagating the error through all of the additional neural network layers and all of the layers of the convolutional subnetwork; and adjusting the current values of the parameters of the embedding function using the backpropagated error. 6. The method of claim 1 , wherein each numeric embedding comprises N values, wherein N is an integer greater than one. 7. The method of claim 6 , wherein the matrix representation comprises N matrices, and wherein each matrix includes a respective value from each numeric embedding, and wherein the respective values are each in the same position in the numeric embeddings. 8. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: receiving, by one or more computers, a rendered form of a structured document, wherein the structured document comprises image content and text content; mapping a grid of cells to the rendered form; assigning a respective numeric embedding to each cell in the grid, comprising, for each cell: identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell, identifying, from a plurality of types of content that includes at least one image content type and at least one text content type, a type of content associated with the identified content; mapping the identified content to a numeric embedding for the identified content using an embedding function that is specific to the type of content associated with the identified content, and assigning the numeric embedding for the identified content to the cell; generating a matrix representation of the structured document from the numeric embeddings assigned to the cells of the grids, wherein the matrix representation reflects the relative location of content from the structured document when the structured document is in the rendered form, wherein the matrix representation comprises a plurality of entries, each entry corresponding to a different location in the rendered form of the structured document, and the generating comprising: mapping each numeric embedding to an entry that corresponds to the location of the corresponding identified content in the rendered form of the structure document; and generating neural network features of the structured document by processing the matrix representation of the structured document through a subnetwork comprising one or more convolutional neural network layers, wherein the neural network features reflect the relative location of content from the structured document when the structured document is in the rendered form. 9. The system of claim 8 , the operations further comprising: processing the neural network features through one or more additional neural network layers to generate classification data for the structured document. 10. The system of claim 8 , wherein identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell comprises: identifying one or more pieces of content that are at least partially displayed in the portion of the rendered form that is mapped to the cell and that are of one of the plurality of content types; and selecting a piece of content of the one or more pieces of content that makes up a largest proportion of the portion of the rendered form. 11. The system of claim 10 , wherein mapping the identified content to a numeric embedding for the identified content comprises: selecting an embedding function from a plurality of embedding functions that are each specific to content of a corresponding content type, wherein each embedding function is configured to receive content of the corresponding content type or an identifier for the content of the correspond

Assignees

Inventors

Classifications

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Classification techniques · CPC title

  • using statistical methods · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Document-oriented image-based pattern recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11550871B1 cover?
Structured documents are processed using convolutional neural networks. For example, the processing can include receiving a rendered form of a structured document; mapping a grid of cells to the rendered form; assigning a respective numeric embedding to each cell in the grid, comprising, for each cell: identifying content in the structured document that corresponds to a portion of the rendered …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 10 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).