Detection of cyber attacks driven by compromised large language model applications

US12591673B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12591673-B2
Application numberUS-202318478939-A
CountryUS
Kind codeB2
Filing dateSep 29, 2023
Priority dateSep 29, 2023
Publication dateMar 31, 2026
Grant dateMar 31, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method including receiving, at a large language model, a prompt injection cyberattack. The method also includes executing the large language model. The large language model takes, as input, the prompt injection cyberattack and generates a first output. The method also includes receiving, by a guardian controller, the first output of the large language model. The guardian controller includes a machine learning model and a security application. The method also includes determining a probability that the first output of the large language model is poisoned by the prompt injection cyberattack. The method also includes determining whether the probability satisfies a threshold. The method also includes enforcing, by the guardian controller and responsive to the probability satisfying the threshold, a security scheme on use of the first output of the large language model by a control application. Enforcing the security scheme mitigates the prompt injection cyberattack.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving, at a large language model, a prompt injection cyberattack; executing the large language model, wherein the large language model takes, as input, the prompt injection cyberattack and generates a first output; receiving, by a guardian controller, the first output of the large language model, wherein the guardian controller comprises a classification machine learning model and a security application; determining a probability that the first output of the large language model is poisoned by the prompt injection cyberattack, wherein determining the probability comprises: providing the first output of the large language model as input to the classification machine learning model, and executing the classification machine learning model to generate the probability; determining whether the probability satisfies a threshold; enforcing, by the security application and responsive to the probability satisfying the threshold, a security scheme on use of the first output of the large language model by a control application, wherein enforcing the security scheme mitigates the prompt injection cyberattack; generating a second output of a controlled application by executing the controlled application using the first output of the large language model; and returning, after enforcing the security scheme, the second output of the controlled application to the control application. 2 . The method of claim 1 , further comprising: coordinating, prior to receiving the first output and by the control application, a first input and the first output of the large language model. 3 . The method of claim 2 , further comprising: receiving, by the control application, a user request; converting, by the control application, the user request into the first input of the large language model; and executing the large language model on the first input, together with the prompt injection cyberattack, to generate the first output of the large language model. 4 . The method of claim 1 , wherein returning comprises transmitting the first output of the large language model to the control application via the guardian controller. 5 . The method of claim 1 wherein enforcing the security scheme comprises: limiting the use of the first output of the large language model by the control application. 6 . The method of claim 5 , wherein limiting comprises at least one of: preventing the control application from receiving the first output of the large language model; modifying the first output of the classification machine learning model to comply with an enforcement profile; forcing the control application to use a hard-coded list of actions or recipients; and requiring user authentication to execute the control application. 7 . The method of claim 1 , wherein enforcing comprises a step related to executing, by the guardian controller, a computer executed action performed at least in part by the controlled application, wherein the step is selected from the group consisting of: permitting transmission of an email, or blocking transmission of the email, by the guardian controller, wherein the controlled application comprises an email generation program and wherein the computer executed action comprises generating the email; granting or denying the control application access to a network, wherein the network comprises the controlled application; granting or denying the control application access to a database, wherein the database comprises the controlled application; providing or denying access to executable code, wherein the executable code is generated by or retrieved from the controlled application; granting or restricting access of the control application to network content, wherein the network content is generated by or retrieved from the controlled application; and enforcing at least one of a whitelist and a blacklist on the first output of the large language model. 8 . A system comprising: a processor; a data repository in communication with the processor; a large language model which, when executed by the processor, generates a first output from a first input comprising at least a prompt injection cyberattack; a control application which, when executed by the processor, is programmed to coordinate the first input and the first output of the large language model; a controlled application which, when executed by the processor, is programmed to receive, as a second input, the first output from the large language model and to generate a second output using the first output from the large language model; and a guardian controller comprising: a classification machine learning model which, when executed by the processor, is programmed to determine a probability that the first input comprises the prompt injection cyberattack, and a security application which, when executed by the processor, enforces a security scheme, wherein, when executed by the processor, the guardian controller is programmed to: monitor the first output of the large language model; determine the probability that the first output of the large language model is poisoned by the prompt injection cyberattack by providing the first output of the large language model to the classification machine learning model, and executing the classification machine learning model to generate the probability; determine whether the probability satisfies a threshold; enforce, responsive to the probability satisfying the threshold, the security scheme on use of the first output of the large language model by the control application, wherein enforcing the security scheme mitigates the prompt injection cyberattack; generate a third output of the controlled application by executing the controlled application using the first output of the large language model; and return, after enforcing the security scheme, a third output of the controlled application to the control application. 9 . The system of claim 8 , further comprising: a training controller which, when executed by the processor, is configured to train the classification machine learning model. 10 . The system of claim 8 , wherein the control application is programmed, when executed by the processor, to create the first input of the large language model based on a user query submitted by a user. 11 . The system of claim 8 , wherein a second third output of the controlled application is transmitted to at least one of the large language model and the guardian controller. 12 . The system of claim 8 , wherein the controlled application comprises at least one of: an email application executable by the processor; a connection to an external network; a second data repository readable by the processor; a code generation application executable by the processor; and a network content regulation application executable by the processor. 13 . A method of training a classification machine learning model comprising: generating, by a control application, a plurality of queries to a large language model, wherein at least some of the plurality of queries comprise known prompt injection cyberattacks; generating a plurality of first outputs of the large language model by executing the large language model on at least a subset of the plurality of queries; training, iteratively, the classification machine learning model using the plurality of first outputs and a second subset of the plurality of queries until convergence to generate a trained classification machine learning model which, when executed, is trained to detect unknown prompt injection cyberattacks in a plurality of mon

Assignees

Inventors

Classifications

  • by adding security routines or objects to programs · CPC title

  • G06F21/554Primary

    involving event detection and direct action · CPC title

  • G06F21/566Primary

    Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12591673B2 cover?
A method including receiving, at a large language model, a prompt injection cyberattack. The method also includes executing the large language model. The large language model takes, as input, the prompt injection cyberattack and generates a first output. The method also includes receiving, by a guardian controller, the first output of the large language model. The guardian controller includes a…
Who is the assignee on this patent?
Intuit Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/554. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).