Presenting Search Results in a Dynamically Formatted Graphical User Interface
US-2024420206-A1 · Dec 19, 2024 · US
US12159478B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12159478-B2 |
| Application number | US-202117547680-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 10, 2021 |
| Priority date | Dec 10, 2021 |
| Publication date | Dec 3, 2024 |
| Grant date | Dec 3, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system can comprise a processor that can facilitate performance of operations, comprising accessing a document comprising a plurality of text bounding boxes, wherein each respective text bounding box of the plurality of text bounding boxes comprises respective text, for each respective text bounding box, determining respective text bounding box coordinates and respective text bounding box input embeddings, based on the respective text bounding box coordinates, determining respective text bounding box positional encodings for each respective text bounding box, based on a transformer-based deep learning model applied to the respective text bounding box input embeddings, respective text bounding box coordinates, respective text bounding box positional encodings, and bias information representative of a modification to an attention weight of the transformer-based deep learning model, determining respective output embeddings for each respective text bounding box, and based on the respective output embeddings, generating respective bounding box labels for each respective bounding box.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a processor; and a non-transitory computer-readable medium having stored thereon computer-executable instructions that are executable by the system to cause the system to perform operations comprising: accessing a document comprising a plurality of text bounding boxes, wherein each respective text bounding box of the plurality of text bounding boxes comprises respective text; for each respective text bounding box, determining respective text bounding box coordinates and respective text bounding box input embeddings; based on the respective text bounding box coordinates, determining respective text bounding box positional encodings for each respective text bounding box; based on a transformer-based deep learning model applied to the respective text bounding box input embeddings, respective text bounding box coordinates, respective text bounding box positional encodings, and bias information representative of a modification to an attention weight of the transformer-based deep learning model, determining respective output embeddings for each respective text bounding box; and based on the respective output embeddings, generating respective bounding box labels for each respective bounding box. 2. The system of claim 1 , wherein determining respective bounding box labels for each respective bounding box comprises determining, using a neural network, one or more bounding box label probabilities for each respective text bounding box, and wherein the operations further comprise: based on the one or more bounding box label probabilities for each respective text bounding box, generating the respective bounding box labels. 3. The system of claim 2 , wherein the respective bounding box labels are generated using Hungarian matching applied to the one or more bounding box label probabilities for each respective text bounding box. 4. The system of claim 2 , wherein the neural network is trained based on one or more template documents associated with the document. 5. The system of claim 1 , wherein the bias information comprises a conditional attention bias based on a graph structure representative of relative positions of the plurality of text bounding boxes. 6. The system of claim 1 , wherein the bias information comprises a conditional attention bias based on polar coordinates of the plurality of text bounding boxes. 7. The system of claim 1 , wherein the operations further comprise: determining angle information representative of angular relationships between the plurality of text bounding boxes, wherein the determining respective output embeddings for each respective text bounding box is further based on the transformer-based deep learning model applied to the angle information. 8. The system of claim 1 , wherein the operations further comprise: determining polar relative distances between the plurality of text bounding boxes, wherein the determining respective output embeddings for each respective text bounding box is further based on the transformer-based deep learning model applied to the polar relative distances. 9. The system of claim 1 , wherein the transformer-based deep learning model is generated based on using machine learning applied to past documents comprising a past plurality of text bounding boxes, other than the document. 10. The system of claim 1 , wherein the respective text is identified using electronic optical character recognition. 11. The system of claim 1 , wherein the document comprises a proof of identity document. 12. A computer-implemented method, comprising: generating, by a computer system comprising a processor, input embeddings representative of text bounding box coordinates of a document comprising a plurality of text bounding boxes and respective text identified using electronic optical character recognition; based on the input embeddings, determining, by the computer system, text bounding box positional encodings for each respective text bounding box; determining, by the computer system, bias information representative of a modification to an attention weight of a transformer-based machine learning model; determining, by the computer system, output embeddings for each text bounding box using the transformer-based machine learning model applied to the input embeddings, text bounding box coordinates, text bounding box positional encodings, and the bias information; generating, by the computer system and using machine learning applied to the output embeddings, a plurality of bounding box label probabilities for each text bounding box; and generating, by the computer system and based on the plurality of bounding box label probabilities for each text bounding box, a respective bounding box label for each text bounding box, wherein each bounding box label comprises a respective type of field in the document. 13. The computer-implemented method of claim 12 , wherein the plurality of bounding box label probabilities for each text bounding box are representative of a probability of each possible bounding box label for each text bounding box. 14. The computer-implemented method of claim 12 , wherein the respective bounding box label for each text bounding box are generated, by the computer system, using Hungarian matching applied to the plurality of bounding box label probabilities for each text bounding box of the plurality of text bounding boxes. 15. The computer-implemented method of claim 12 , wherein the bias information comprises a conditional attention bias based on a graph structure representative of relative positions of the plurality of text bounding boxes. 16. The computer-implemented method of claim 15 , wherein the relative positions of the plurality of text bounding boxes comprise neighbor information representative of a degree to which each text bounding box of the plurality of text bounding boxes is a neighbor to another text bounding box of the plurality of text bounding boxes. 17. The computer-implemented method of claim 12 , wherein the bias information comprises a conditional attention bias based on polar coordinates of the plurality of text bounding boxes. 18. The computer-implemented method of claim 12 , wherein the transformer-based machine learning model is generated, by the computer system, based on machine learning applied to past documents comprising a past plurality of text bounding boxes, other than the document, and to one or more template documents associated with the document. 19. A system comprising: a processor that executes computer executable components stored in memory; an optical character recognition component that determines bounding boxes and respective text of a proof of identity document using electronic optical character recognition applied to the proof of identity document; an input vector component that generates input embeddings for each bounding box and respective text of the proof of identity document; a position component that determines absolute coordinates of the bounding boxes; a positional encoding component that determines positional encodings for each bounding box; a graph component that determines a degree to which degree bounding box pairs of the bounding boxes are neighbors; a distance component that determines polar relative distances for the bounding box pairs; a transformer component that determines output embeddings for each bounding box based on a transformer-based deep learning model applied to the input embeddings, bounding box coordinates, positional encodings, and bias information representa
using recognition of characters or words · CPC title
Display of layout of documents; Previewing · CPC title
Pagination · CPC title
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.