Robotic process automation using enhanced object detection to provide resilient playback capabilities

US12423118B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12423118-B2
Application numberUS-202017139842-A
CountryUS
Kind codeB2
Filing dateDec 31, 2020
Priority dateAug 3, 2020
Publication dateSep 23, 2025
Grant dateSep 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Robotic process automation (RPA) systems with improved playback capabilities. Certain embodiments can provide resilient playback of software automation processes by providing the capability to record, compute and store parameters for user interface controls detected from a screen image of a user interface. These parameters can be used to assist in locating correct corresponding user interface controls within a screen image presented at playback of a software automation process. Other embodiments can, additionally or alternatively, provide resilient playback of software automation processes by providing enhanced capability to locate user interface controls within a screen image of a user interface. In some embodiments, one or more of the user interface controls located within the screen image of the user interface can be used to manipulate the user interface so that other user interface controls become visible within the screen image. Advantageously, embodiments disclosed herein allow software automation processes to operate with greater reliability and flexibility.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for facilitating execution of a robotic process automation, the method comprising: capturing an initial image of an initial user interface (UI) presented on a display device by an application program operating on a computing device; detecting a plurality of UI controls of the initial UI within the captured initial image by programmatic examination of the captured initial image; detecting positioning data and sizing data for each of the UI controls of the initial UI; detecting a plurality of UI text labels of the initial UI within the captured initial image by programmatic examination of the captured initial image; detecting positioning data and sizing data for each of the UI text labels; associating the UI text labels to the UI controls based on the positioning data and the sizing data for the UI text labels and for UI controls of the initial UI; recording a series of user interactions with a plurality of different ones of the UI controls of the initial UI; subsequently capturing a subsequent image of a subsequent UI presented on a display device, the subsequent UI being generated by the same application program as the application program generating the initial UI, with the same application program operating on a computing device and presenting the subsequent UI on a display device; detecting a plurality of UI controls of the subsequent UI within the captured subsequent image by programmatic examination of the captured subsequent image; detecting positioning data and sizing data for each of the UI controls of the subsequent UI; detecting a plurality of UI text labels of the subsequent UI within the captured subsequent image by programmatic examination of the captured subsequent image; detecting positioning data and sizing data for each of the UI text labels of the subsequent UI; associating the UI text labels to the UI controls based on the positioning data and the sizing data for the UI text labels and for UI controls of the subsequent UI; matching the UI controls of the subsequent UI to the UI controls of the initial UI, the matching using the UI text labels that are associated with the respective UI controls of the initial UI and the subsequent UI; and programmatically causing one or more interactions of the series of user interactions previously recorded with respect to at least one of the UI controls of the initial UI to be provided to and induced on at least one of the UI controls of the subsequent UI that has been matched to the at least one of the UI controls of the initial UI, wherein the detecting positioning data and sizing data for the UI controls of the subsequent UI identifies at least a boundary box for each of the UI controls, wherein the associating of the UI text labels to the UI controls is based on a separation distance and a direction from each of one or more of the UI controls to each of one or more of the UI text labels, and wherein at least one of the UI text labels being associated with at least one of the UI controls is positioned external to the boundary box for each of the corresponding one or more of the UI controls. 2. The computer-implemented method as recited in claim 1 , wherein the associating is based on a separation distance between the UI text labels and the UI controls. 3. The computer-implemented method as recited in claim 1 , wherein the method comprises: recording a series of user interactions with a plurality of different ones of the UI controls of the UI. 4. The computer-implemented method as recited in claim 1 , wherein a plurality of the UI controls are chosen from the group consisting of: TEXTBOX, COMBOBOX, TEXT BUTTON, CHECKBOX, and RADIO BUTTON. 5. The computer-implemented method as recited in claim 1 , wherein the detecting positioning data and sizing data for each of the UI controls is by programmatic examination of the captured image. 6. The computer-implemented method as recited in claim 1 , wherein the detecting positioning data and sizing data for each of the UI text labels is by programmatic examination of the captured image. 7. The computer-implemented method as recited in claim 1 , wherein the detecting positioning data and sizing data for each of the UI controls comprises geometry data. 8. The computer-implemented method as recited in claim 7 , wherein the detecting positioning data and sizing data for each of the UI text labels comprises geometry data. 9. The computer-implemented method as recited in claim 7 , wherein the geometry data for at least a plurality of the UI controls includes at least coordinates defining a boundary box. 10. The computer-implemented method as recited in claim 1 , wherein the captured image is a screen image. 11. The computer-implemented method as recited in claim 1 , wherein the method comprises: manipulating the subsequent UI if none of the UI controls of the subsequent UI match the at least one of the UI controls of the initial UI. 12. The computer-implemented method as recited in claim 11 , wherein the manipulating induces a scrolling action to the subsequent UI. 13. The computer-implemented method as recited in claim 1 , wherein the matching of the UI controls of the subsequent UI to the UI controls of the initial UI is facilitated by the associated UI text labels respectively associated to the UI controls of the subsequent UI to the UI controls of the initial UI. 14. The computer-implemented method as recited in claim 1 , wherein the matching comprises: determining a plurality of candidate matching UI controls in the subsequent UI that potentially match a particular UI control of the UI controls of the initial UI used by a particular interaction of the series of user interactions previously recorded; and selecting one of the candidate matching UI controls as a matching UI control. 15. The computer-implemented method as recited in claim 1 , wherein the matching comprises: determining a plurality of candidate matching UI controls in the subsequent UI that potentially match a particular UI control of the UI controls of the initial UI used by a particular interaction of the series of user interactions previously recorded; determining a confidence value for each of the candidate matching UI controls; and selecting one of the candidate matching UI controls as a matching UI control based on the confidence value. 16. The computer-implemented method as recited in claim 1 , wherein the matching comprises: determining a plurality of candidate matching UI controls in the subsequent UI that potentially match a particular UI control of the UI controls of the initial UI used by a particular interaction of the series of user interactions previously recorded; determining a confidence value for each of the candidate matching UI controls; eliminating those of the candidate matching UI controls that have a confidence value less than a predetermined threshold; and selecting one of the candidate matching UI controls that remain after the eliminating as a matching UI control. 17. A non-transitory computer readable medium including at least computer program code tangibly stored thereon for facilitating execution of a robotic process automation, the computer readable medium comprising: computer program code for capturing an image of a user interface (UI) presented on a display device by an application program operating on a computing device; computer program code for detecting a plurality of UI controls within the captured image of the UI by programmatic examination of the captured image, the detecting of the UI controls

Assignees

Inventors

Classifications

  • Robot · CPC title

  • Gui graphical user interface · CPC title

  • Control stands, e.g. consoles, switchboards · CPC title

  • Scrolling or panning · CPC title

  • for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12423118B2 cover?
Robotic process automation (RPA) systems with improved playback capabilities. Certain embodiments can provide resilient playback of software automation processes by providing the capability to record, compute and store parameters for user interface controls detected from a screen image of a user interface. These parameters can be used to assist in locating correct corresponding user interface c…
Who is the assignee on this patent?
Automation Anywhere Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/451. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).