Identifying image aesthetics using region composition graphs
US-2020151546-A1 · May 14, 2020 · US
US11657629B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11657629-B2 |
| Application number | US-202017077211-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 22, 2020 |
| Priority date | Oct 22, 2020 |
| Publication date | May 23, 2023 |
| Grant date | May 23, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems are presented for extracting categorizable information from an image using a graph that models data within the image. Upon receiving an image, a data extraction system identifies characters in the image. The data extraction system then generates bounding boxes that enclose adjacent characters that are related to each other in the image. The data extraction system also creates connections between the bounding boxes based on locations of the bounding boxes. A graph is generated based on the bounding boxes and the connections such that the graph can accurately represent the data in the image. The graph is provided to a graph neural network that is configured to analyze the graph and produce an output. The data extraction system may categorize the data in the image based on the output.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a non-transitory memory; and one or more hardware processors coupled with the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations comprising: deriving text data from an image of a shipment label; analyzing text characteristics of the text data in the image; determining, from the image, a plurality of bounding boxes corresponding to a plurality of distinct text groups in the image based on the analyzing the text characteristics of the text data; determining, for each of the plurality of bounding boxes, features comprising at least a location of a corresponding bounding box, a size of the corresponding bounding box, and a text arrangement within the corresponding bounding box; constructing a graph comprising a plurality of nodes and a plurality of edges based on the features determined for each of the plurality of bounding boxes, wherein the constructing comprises generating a node for each bounding box in the plurality of bounding boxes and generating an edge between a pair of nodes based on locations of a corresponding pair of bounding boxes; determining, from the plurality of bounding boxes, a particular bounding box corresponding to a tracking number of a shipment by providing attributes of the graph as inputs to a graph neural network; and performing an action in association with a user account based on a shipment status associated with the tracking number. 2. The system of claim 1 , wherein the constructing the graph further comprises generating, for each node in the graph, at least two edges connecting to two neighboring nodes based on locations of the corresponding bounding boxes in the image. 3. The system of claim 2 , wherein the at least two edges are substantially perpendicular to each other in the graph. 4. The system of claim 1 , wherein the determining the features for each of the plurality of bounding boxes comprises determining an alpha-numeric arrangement associated with a text within a corresponding bounding box. 5. The system of claim 1 , wherein the text characteristics comprise at least one of a spacing characteristic or a font characteristic. 6. The system of claim 1 , wherein the determining the features for each of the plurality of bounding boxes comprises determining a size of a corresponding bounding box and a location of the corresponding bounding box within the image. 7. The system of claim 1 , wherein the performing the action comprises transmitting the shipment status to a user device. 8. The system of claim 1 , wherein the operations further comprise determining whether the tracking number conforms with a set of formatting rules associated with a courier. 9. A method, comprising: identifying, by one or more hardware processors, a plurality of text characters in an image; dividing, by the one or more hardware processors, the plurality of text characters into groups of text characters based on text characteristics of the plurality of text characters; determining, by the one or more hardware processors, connections between the groups of text characters based on locations of the groups of text characters in the image; generating, by the one or more hardware processors, a graph based on the groups of text characters in the image and the connections between the groups of text characters, wherein the generating comprises creating a node based on attributes associated with each group in the groups of text characters and creating an edge between two nodes based on attributes of a connection between two groups of text characters corresponding to the two nodes; determining, by the one or more hardware processors based on providing the graph as an input to a graph neural network, a particular node in the graph corresponding to a tracking number of a shipment; extracting the tracking number corresponding to the particular node from the image; and performing, by the one or more hardware processors, an action in association with a user account based on shipment data associated with the tracking number. 10. The method of claim 9 , further comprising determining an identity of the courier based on a text arrangement of the tracking number. 11. The method of claim 9 , further comprising verifying that the tracking number satisfies a particular alpha-numerical arrangement corresponding to the courier. 12. The method of claim 9 , wherein the image is received in connection with a purchase transaction, wherein the performing the action comprises: transmitting the shipment data to a user device associated with a buyer of the purchase transaction. 13. The method of claim 12 , further comprising: determining whether the shipment data is consistent with requirements of the purchase transaction, wherein the action is performed in response to determining that the shipment data is inconsistent with the requirements of the purchase transaction. 14. The method of claim 13 , wherein the action comprises at least one of initiating a dispute for the purchase transaction or performing a refund transaction for the purchase transaction. 15. The method of claim 9 , further comprising: obtaining a plurality of images, wherein each image in the plurality of images comprises a corresponding tracking number; labeling, for each of the plurality of images, a location of the corresponding tracking number on a corresponding image; and training the neural network based on the plurality of labeled images. 16. A non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising: receiving an image of a shipping label in association with a purchase transaction; deriving text data from the image; analyzing text characteristics associated with the text data, wherein the text characteristics comprise at least one of a font type, a font size, or a text arrangement; determining a plurality of bounding boxes on the image based on the text characteristics associated with the text data, wherein each of the plurality of bounding boxes enclose a group of related text in the image; determining connections between pairs of bounding boxes from the plurality of bounding boxes based on locations of the plurality of bounding boxes within the image; generating, a graph based on the plurality of bounding boxes and the connections, wherein the generating comprises creating a node based on attributes associated with each bounding box in the image and creating an edge between two nodes based on attributes of a connection between two corresponding bounding boxes; determining, based on feeding the graph as an input to a graph neural network, a particular node in the graph corresponding to a tracking number associated with the shipping label; extracting the tracking number from a particular bounding box in the image that corresponds to the particular node; and performing an action in association with a user account based on shipment data associated with the tracking number. 17. The non-transitory machine-readable medium of claim 16 , wherein the determining the plurality of bounding boxes comprises: determining a plurality of groups of adjacent characters based on the text data in the image, wherein each group in the plurality of groups of adjacent characters include characters having a common text characteristic. 18. The non-transitory machine-readable medium of claim 16 , wherein the shipment data comprises a shipment status and an estimated delivery
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Historical data · CPC title
Tracking · CPC title
Graphical representation, e.g. directed attributed graph · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.