Enhanced document visual question answering system via hierarchical attention
US-2023153531-A1 · May 18, 2023 · US
US12443790B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12443790-B2 |
| Application number | US-202318446765-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 9, 2023 |
| Priority date | Aug 9, 2023 |
| Publication date | Oct 14, 2025 |
| Grant date | Oct 14, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are disclosed for reflowing an infographic image for display in a mobile device using machine learning models. In particular, in one or more embodiments, the method may include receiving a document for display in a user device, the document including an infographic image. The method may further include identifying, using a convolutional neural network, visual components of the infographic image. The method may further include determining, using an encoder-decoder network, an ordered sequence of the identified visual components. A generative adversarial network then generates a modified visual representation of the infographic image based on the identified visual components and the determined ordered sequence of the identified visual components. The modified visual of representation of the infographic image is then presented for display in a viewing pane of a user device in place of the infographic image.
Opening claim text (preview).
We claim: 1. A method comprising: receiving, by a processing device, a document for display in a user device, the document including an infographic image; identifying, using a component extraction module, visual components of the infographic image, wherein the component extraction module includes an object detection model that generates bounding box data for candidate elements in the infographic image and an image segmentation algorithm that analyzes pixels of the infographic image to identify candidate regions of the infographic image, and wherein for each candidate region of the candidate regions: determining a candidate region maximally overlapping with one or more of the candidate elements of the infographic image, and identifying the candidate region as a visual component of the infographic image; determining, using an encoder-decoder network, an ordered sequence of the identified visual components; rendering a modified visual representation of the infographic image based on the identified visual components and the determined ordered sequence of the identified visual components; and presenting the document, including the modified visual representation of the infographic image in place of the infographic image, for display in a viewing pane of a user device, wherein the modified visual representation of the infographic image is resized to fit a width of the viewing pane of the user device. 2. The method of claim 1 , wherein identifying the candidate regions of the infographic image comprises: segmenting the infographic image into a number of components equal to a number of pixels in the infographic image; and iteratively merging components when a distance between a first component and a second component of a pair of components is less than a minimum of an internal distance of each of the first component and the second component. 3. The method of claim 1 , wherein determining the ordered sequence of the identified visual components comprises: generating, by encoders, feature embeddings for the identified visual components; concatenating the generated feature embeddings; receiving, by a transformer, the concatenated feature embeddings and coordinates for the identified visual components and the candidate elements of the infographic image; and generating sequence indices for each of the identified visual components of the infographic image based on the concatenated feature embeddings and the coordinates for the identified visual components and the candidate elements of the infographic image. 4. The method of claim 1 , wherein rendering the modified visual representation of the infographic image based on the identified visual components and the determined ordered sequence of the identified visual components comprises: generating, using heuristics, a candidate layout for the modified visual representation of the infographic image by placing the identified visual components into a layout template based on the determined ordered sequence of the identified visual components. 5. The method of claim 1 , wherein rendering the modified visual representation of the infographic image based on the identified visual components and the determined ordered sequence of the identified visual components comprises: providing, to a generative adversarial network, coordinates for the identified visual components and the determined ordered sequence of the identified visual components; generating, by the generative adversarial network, one or more candidate layouts for the modified visual representation of the infographic image based on the identified visual components, the determined ordered sequence of the identified visual components, and a set of constraints; and providing the one or more candidate layouts. 6. The method of claim 5 , wherein the set of constraints include alignment constraint, non-overlap constraint, relational constraint, and a reading order constraint. 7. The method of claim 1 , wherein rendering the modified visual representation of the infographic image based on the identified visual components and the determined ordered sequence of the identified visual components comprises: vertically distributing the identified visual components based on the determined ordered sequence. 8. A non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising: receiving, by a processing device, a document for display in a user device, the document including an infographic image; identifying, using a component extraction module, visual components of the infographic image, wherein the component extraction module includes an object detection model that generates bounding box data and object type data for candidate elements in the infographic image and an image segmentation algorithm that analyzes pixels of the infographic image to identify candidate regions of the infographic image; determining, using an encoder-decoder network, an ordered sequence of the identified visual components; rendering a modified visual representation of the infographic image based on the identified visual components and the determined ordered sequence of the identified visual components; and presenting the document, including the modified visual representation of the infographic image in place of the infographic image, for display in a viewing pane of a user device, wherein the modified visual representation of the infographic image is resized to fit a width of the viewing pane of the user device. 9. The non-transitory computer-readable medium of claim 8 , wherein the instructions to identify the candidate regions of the infographic image further cause the processing device to perform operations comprising: segmenting the infographic image into a number of components equal to a number of pixels in the infographic image; and iteratively merging components when a distance between a first component and a second component of a pair of components is less than a minimum of an internal distance of each of the first component and the second component. 10. The non-transitory computer-readable medium of claim 8 , wherein the instructions to determine the ordered sequence of the identified visual components further cause the processing device to perform operations comprising: generating, by encoders, feature embeddings for the identified visual components; concatenating the generated feature embeddings; receiving, by a transformer, the concatenated feature embeddings and coordinates for the identified visual components and the candidate elements of the infographic image; and generating sequence indices for each of the identified visual components of the infographic image based on the concatenated feature embeddings and the coordinates for the identified visual components and the candidate elements of the infographic image. 11. The non-transitory computer-readable medium of claim 8 , wherein the instructions to render the modified visual representation of the infographic image based on the identified visual components and the determined ordered sequence of the identified visual components further cause the processing device to perform operations comprising: generating, using heuristics, a candidate layout for the modified visual representation of the infographic image by placing the identified visual components into a layout template based on the determined ordered sequence of the identified visual components. 12. The non-transitory computer-readable medium of claim 8 , wherein the instructions to render the modified visual representation of the infographic image based on the ide
Segmentation of character regions · CPC title
Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation · CPC title
Display of layout of documents; Previewing · CPC title
based on markings or identifiers characterising the document or the area · CPC title
Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.