Polar relative distance transformer

US12159478B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12159478-B2
Application numberUS-202117547680-A
CountryUS
Kind codeB2
Filing dateDec 10, 2021
Priority dateDec 10, 2021
Publication dateDec 3, 2024
Grant dateDec 3, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system can comprise a processor that can facilitate performance of operations, comprising accessing a document comprising a plurality of text bounding boxes, wherein each respective text bounding box of the plurality of text bounding boxes comprises respective text, for each respective text bounding box, determining respective text bounding box coordinates and respective text bounding box input embeddings, based on the respective text bounding box coordinates, determining respective text bounding box positional encodings for each respective text bounding box, based on a transformer-based deep learning model applied to the respective text bounding box input embeddings, respective text bounding box coordinates, respective text bounding box positional encodings, and bias information representative of a modification to an attention weight of the transformer-based deep learning model, determining respective output embeddings for each respective text bounding box, and based on the respective output embeddings, generating respective bounding box labels for each respective bounding box.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a processor; and a non-transitory computer-readable medium having stored thereon computer-executable instructions that are executable by the system to cause the system to perform operations comprising: accessing a document comprising a plurality of text bounding boxes, wherein each respective text bounding box of the plurality of text bounding boxes comprises respective text; for each respective text bounding box, determining respective text bounding box coordinates and respective text bounding box input embeddings; based on the respective text bounding box coordinates, determining respective text bounding box positional encodings for each respective text bounding box; based on a transformer-based deep learning model applied to the respective text bounding box input embeddings, respective text bounding box coordinates, respective text bounding box positional encodings, and bias information representative of a modification to an attention weight of the transformer-based deep learning model, determining respective output embeddings for each respective text bounding box; and based on the respective output embeddings, generating respective bounding box labels for each respective bounding box. 2. The system of claim 1 , wherein determining respective bounding box labels for each respective bounding box comprises determining, using a neural network, one or more bounding box label probabilities for each respective text bounding box, and wherein the operations further comprise: based on the one or more bounding box label probabilities for each respective text bounding box, generating the respective bounding box labels. 3. The system of claim 2 , wherein the respective bounding box labels are generated using Hungarian matching applied to the one or more bounding box label probabilities for each respective text bounding box. 4. The system of claim 2 , wherein the neural network is trained based on one or more template documents associated with the document. 5. The system of claim 1 , wherein the bias information comprises a conditional attention bias based on a graph structure representative of relative positions of the plurality of text bounding boxes. 6. The system of claim 1 , wherein the bias information comprises a conditional attention bias based on polar coordinates of the plurality of text bounding boxes. 7. The system of claim 1 , wherein the operations further comprise: determining angle information representative of angular relationships between the plurality of text bounding boxes, wherein the determining respective output embeddings for each respective text bounding box is further based on the transformer-based deep learning model applied to the angle information. 8. The system of claim 1 , wherein the operations further comprise: determining polar relative distances between the plurality of text bounding boxes, wherein the determining respective output embeddings for each respective text bounding box is further based on the transformer-based deep learning model applied to the polar relative distances. 9. The system of claim 1 , wherein the transformer-based deep learning model is generated based on using machine learning applied to past documents comprising a past plurality of text bounding boxes, other than the document. 10. The system of claim 1 , wherein the respective text is identified using electronic optical character recognition. 11. The system of claim 1 , wherein the document comprises a proof of identity document. 12. A computer-implemented method, comprising: generating, by a computer system comprising a processor, input embeddings representative of text bounding box coordinates of a document comprising a plurality of text bounding boxes and respective text identified using electronic optical character recognition; based on the input embeddings, determining, by the computer system, text bounding box positional encodings for each respective text bounding box; determining, by the computer system, bias information representative of a modification to an attention weight of a transformer-based machine learning model; determining, by the computer system, output embeddings for each text bounding box using the transformer-based machine learning model applied to the input embeddings, text bounding box coordinates, text bounding box positional encodings, and the bias information; generating, by the computer system and using machine learning applied to the output embeddings, a plurality of bounding box label probabilities for each text bounding box; and generating, by the computer system and based on the plurality of bounding box label probabilities for each text bounding box, a respective bounding box label for each text bounding box, wherein each bounding box label comprises a respective type of field in the document. 13. The computer-implemented method of claim 12 , wherein the plurality of bounding box label probabilities for each text bounding box are representative of a probability of each possible bounding box label for each text bounding box. 14. The computer-implemented method of claim 12 , wherein the respective bounding box label for each text bounding box are generated, by the computer system, using Hungarian matching applied to the plurality of bounding box label probabilities for each text bounding box of the plurality of text bounding boxes. 15. The computer-implemented method of claim 12 , wherein the bias information comprises a conditional attention bias based on a graph structure representative of relative positions of the plurality of text bounding boxes. 16. The computer-implemented method of claim 15 , wherein the relative positions of the plurality of text bounding boxes comprise neighbor information representative of a degree to which each text bounding box of the plurality of text bounding boxes is a neighbor to another text bounding box of the plurality of text bounding boxes. 17. The computer-implemented method of claim 12 , wherein the bias information comprises a conditional attention bias based on polar coordinates of the plurality of text bounding boxes. 18. The computer-implemented method of claim 12 , wherein the transformer-based machine learning model is generated, by the computer system, based on machine learning applied to past documents comprising a past plurality of text bounding boxes, other than the document, and to one or more template documents associated with the document. 19. A system comprising: a processor that executes computer executable components stored in memory; an optical character recognition component that determines bounding boxes and respective text of a proof of identity document using electronic optical character recognition applied to the proof of identity document; an input vector component that generates input embeddings for each bounding box and respective text of the proof of identity document; a position component that determines absolute coordinates of the bounding boxes; a positional encoding component that determines positional encodings for each bounding box; a graph component that determines a degree to which degree bounding box pairs of the bounding boxes are neighbors; a distance component that determines polar relative distances for the bounding box pairs; a transformer component that determines output embeddings for each bounding box based on a transformer-based deep learning model applied to the input embeddings, bounding box coordinates, positional encodings, and bias information representa

Assignees

Inventors

Classifications

  • using recognition of characters or words · CPC title

  • G06F40/106Primary

    Display of layout of documents; Previewing · CPC title

  • Pagination · CPC title

  • G06V30/414Primary

    Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title

  • Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12159478B2 cover?
A system can comprise a processor that can facilitate performance of operations, comprising accessing a document comprising a plurality of text bounding boxes, wherein each respective text bounding box of the plurality of text bounding boxes comprises respective text, for each respective text bounding box, determining respective text bounding box coordinates and respective text bounding box inp…
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/106. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 03 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).