Machine-learning models for image processing

US12525048B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12525048-B2
Application numberUS-202519217461-A
CountryUS
Kind codeB2
Filing dateMay 23, 2025
Priority dateApr 8, 2024
Publication dateJan 13, 2026
Grant dateJan 13, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Presented herein are systems and methods for the employment of machine learning models for image processing. A mobile application for client-side image processing and validation, which interacts with and leverages native image processing software of the client device, where the image processing software and the mobile application include any number of machine-learning models for identifying a document and attributes of the document for recognition and validation. This mobile application uses the image processing software from a client operating system to control the camera. The image processing software generates various types of information about a video frame and the document, and the mobile application invokes APIs or software libraries of the image processing software to access the information and validate the frame and document.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for client-side processing and validation of document imagery, the method comprising: obtaining, by a computing device associated with an end-user, video data comprising a plurality of frames having image data containing a document from a camera of the computing device using an imaging software program locally executed on the computing device; obtaining, by the computing device from the imaging software program, first textual content of the document from a first of the plurality of frames and second textual content of the document in a second of the plurality of frames; identifying, by the computing device, a front of the document based on the first textual content received from the imaging software program and a back of the document based on the second textual content received from the imaging software program; generating, by the computing device, a first annotation label for first image data of the first frame to indicate an identification of the front of the document in response to identifying the front of the document in the first image data using the first textual content; and generating, by the computing device, a second annotation label to indicate an identification of the back of the document for second image data of the second frame in response to identifying the back of the document in the second image data using the second textual content. 2 . The method of claim 1 , wherein the imaging software program is native to an operating system of a mobile device of the computing device. 3 . The method of claim 1 , further comprising: generating, by the computing device, a first prompt indicating the front of the document has been identified; and generating, by the computing device, a second prompt to capture the back of the document in response to identifying the front of the document, wherein a sequence of a presentation of the first prompt and the second prompt is based on a sequence of the identification of the front of the document and the back of the document. 4 . The method of claim 1 , further comprising generating, by the computing device, a halt instruction to halt a capture of the video data upon a receipt of an integer number of predetermined intervals including multiple of the plurality of frames. 5 . The method of claim 1 , further comprising generating, by the computing device, a start instruction to start a capture of the video data upon a receipt of an indication of a presence of the document from a graphical user interface of the computing device. 6 . The method of claim 1 , further comprising presenting, by the computing device, a confirmation prompt having an indication that the first image data of the first frame includes the front of the document or the second image data of the second frame includes the back of the document. 7 . The method of claim 1 , further comprising presenting, by the computing device, an adjustment prompt having an indication to adjust the document in a field of view of the camera. 8 . The method of claim 7 , further comprising entering, by the computing device, a manual entry mode in response to determining that a quantity of the adjustment prompts satisfies an adjustment threshold. 9 . The method of claim 1 , further comprising presenting, by the computing device, a capture prompt having user instructions to capture one of the front of the document or the back of the document. 10 . The method of claim 1 , further comprising transmitting, by the computing device to a back-end server, the first image data with the first annotation label, the second image data with the second annotation label, and device metadata identifying the computing device. 11 . A system comprising: a computing device associated with an end-user comprising at least one processor, configured to: obtain video data comprising a plurality of frames having image data containing a document from a camera of the computing device using an imaging software program locally executed on the computing device; obtain, from the imaging software program, first textual content of the document from a first of the plurality of frames and second textual content of the document in a second of the plurality of frames; identify a front of the document based on the first textual content received from the imaging software program and a back of the document based on the second textual content received from the imaging software program; generate a first annotation label for first image data of the first frame to indicate an identification of the front of the document in response to identifying the front of the document in the first image data using the first textual content; and generate a second annotation label for second image data of the second frame to indicate an identification of the back of the document in response to identifying the back of the document in the second image data using the second textual content. 12 . The system of claim 11 , wherein the imaging software program is native to an operating system of a mobile device of the computing device. 13 . The system of claim 12 , wherein the at least one processor is configured to: generate, in response to identifying the front of the document: a first prompt indicating the front of the document has been identified in the first image data, and a second prompt to capture the back of the document; and generate, in response to identifying the back of the document: a third prompt to indicate that the back of the document has been identified in the first image data; and a fourth prompt to capture the front of the document. 14 . The system of claim 11 , wherein the at least one processor is configured to generate a halt instruction to halt a capture of the video data upon a receipt of an integer number of predetermined intervals including multiple of the plurality of frames. 15 . The system of claim 14 , wherein the at least one processor is configured to generate a start instruction to start a capture of the video data upon a receipt of an indication of a presence of the document from a graphical user interface of the computing device. 16 . The system of claim 11 , wherein the at least one processor is configured to present a confirmation prompt having an indication that the first image data of the first frame includes the front of the document or the second image data of the second frame includes the back of the document. 17 . The system of claim 11 , wherein the at least one processor is configured to present an adjustment prompt having an indication to adjust the document in a field of view of the camera. 18 . The system of claim 17 , wherein the at least one processor is configured to enter a manual entry mode in response to determining that a quantity of the adjustment prompts satisfies an adjustment threshold. 19 . The system of claim 11 , wherein the at least one processor is configured to present a capture prompt having user instructions to capture one of the front of the document or the back of the document. 20 . The system of claim 11 , wherein the at least one processor is configured to transmit, to a back-end server, the first image data with the first annotation label, the second image data with the second annotation label, and device metadata identifying the computing device.

Assignees

Inventors

Classifications

  • Classification of content, e.g. text, photographs or tables · CPC title

  • Aligning or centring of the image pick-up or image-field · CPC title

  • based on the type of document · CPC title

  • G06V30/416Primary

    Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title

  • Target detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12525048B2 cover?
Presented herein are systems and methods for the employment of machine learning models for image processing. A mobile application for client-side image processing and validation, which interacts with and leverages native image processing software of the client device, where the image processing software and the mobile application include any number of machine-learning models for identifying a d…
Who is the assignee on this patent?
Citibank Na
What technology area does this patent fall under?
Primary CPC classification G06V30/416. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).