Machine learning model based ranking of generated code

US12566593B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12566593-B2
Application numberUS-202318464536-A
CountryUS
Kind codeB2
Filing dateSep 11, 2023
Priority dateJun 28, 2023
Publication dateMar 3, 2026
Grant dateMar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A generative AI based pipeline has been created that ranks generated responses that are candidate software patches. The ranking is based on predicted quality measures of code fragments within a corresponding prompt to a generated AI model. The predicted quality measures are generated by a machine learning model that has been trained based on features that are values/measures of similarity metrics between code fragments, between code fragment changes, between code structures, and/or between changes of code structures.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A method comprising: obtaining a plurality of code fragments generated from a generative artificial intelligence (AI) model, wherein the plurality of code fragments corresponds to a set of one or more prompts input into the generative AI model; determining values of features for input to a machine learning model trained to predict values of generated code fragments, wherein the features are metrics of similarity among code fragments in the set of one or more prompts and the generated code fragments and the metrics of similarity measure at least one of similarity of code fragments and similarity of changes to code fragments and wherein the metrics of similarity correspond to at least two of ratios of lengths among parts of a prompt, ratio of lengths between a generated code fragment and a part of a corresponding prompt, ratio of edit distances between different parts of a prompt and between a part of a prompt and a corresponding generated code fragment, ratio of edit operations between different parts of a prompt and between a part of a prompt and a corresponding generated code fragment, and measure of correlation of positions of edits between different parts of a prompt; and ranking the generated code fragments based on the predicted values from the machine learning model. 2 . The method of claim 1 , wherein determining the values of features for input to the machine learning model comprises calculating, for each prompt, at least two of similarity of text of a code fragment in the prompt and text of a corresponding one of the generated code fragments, similarity of code structure between the code fragment in the prompt and the corresponding one of the generated code fragment, and similarity of lengths of the code fragment in the prompt and the corresponding one of the generated code fragment. 3 . The method of claim 1 , wherein determining the values of features for input to the machine learning model comprises calculating, for each prompt, at least two of similarity of textual changes between a pair of reference code fragments in the prompt and textual changes between a code fragment in the prompt and a corresponding one of the generated code fragments, similarity of structural changes between the pair of reference code fragments and structural changes between the code fragment in the prompt and the corresponding one of the generated code fragments, similarity of text of the code fragment in the prompt and a first of the pair of reference code fragments, similarity of text of a second of the pair of reference code fragments and the generated code fragment, similarity of code structure of the code fragment in the prompt and the first of the pair of reference code fragments, and similarity of code structure of the second of the pair of reference code fragments and the corresponding one of the generated code fragment. 4 . The method of claim 1 , further comprising, for each prompt, generating a code structure signature for each code fragment in the prompt and for the corresponding one of the generated code fragments, wherein determining the values of features is based, at least in part, on the code structure signatures. 5 . The method of claim 4 , wherein generating the code structure signature comprises generating a representation of a code fragment without variability of names. 6 . The method of claim 4 , wherein generating the code structure signature comprises generating a representation of a code fragment that replaces each identifier name with a representative token for identifiers and each variable name with a representative token for variables. 7 . The method of claim 4 , wherein determining the values of features based, at least in part, on the code structure signatures comprises calculating values for a subset of the similarity metrics of code fragments as represented by the code structure signatures. 8 . The method of claim 1 , wherein the machine learning model is an ensemble of weak prediction models. 9 . The method of claim 1 , wherein the machine learning model is one or more regression models. 10 . The method of claim 1 , wherein the generative AI model is a language model with a transformer architecture. 11 . A non-transitory, machine-readable medium having program code stored thereon, the program code comprising instructions to: obtain a plurality of code fragments generated from a generative artificial intelligence (AI) model, wherein the plurality of code fragments corresponds to a set of one or more prompts input into the generative AI model; determine values of features for input to a machine learning model trained to predict values of the generated code fragments, wherein the features are metrics of similarity among the code fragments in the set of one or more prompts and the generated code fragments and the metrics of similarity measure at least one of similarity of code fragments and similarity of changes to code fragments and wherein the metrics of similarity correspond to at least two of ratios of lengths among parts of a prompt, ratio of lengths between a generated code fragment and a part of a corresponding prompt, ratio of edit distances between different parts of a prompt and between a part of a prompt and a corresponding generated code fragment, ratio of edit operations between different parts of a prompt and between a part of a prompt and a corresponding generated code fragment, and measure of correlation of positions of edits between different parts of a prompt; and rank the generated code fragments based on the predicted values output from the machine learning model. 12 . The non-transitory, machine-readable medium of claim 11 , wherein the instructions to determine the values of features for input to the machine learning model comprise instructions to calculate, for each prompt, at least two of similarity of text of a code fragment in the prompt and text of a corresponding one of the generated code fragments which corresponds to the prompt, similarity of code structure between the code fragment in the prompt and the corresponding one of the generated code fragments, similarity of lengths of the code fragment in the prompt and the corresponding one of the generated code fragment. 13 . The non-transitory, machine-readable medium of claim 11 , wherein the program code further has stored thereon instructions to, for each prompt, generate a code structure signature for each code fragment in the prompt and for the corresponding one of the generated code fragments, wherein the instructions to determine the values of features is based, at least in part, on the code structure signatures. 14 . The non-transitory, machine-readable medium of claim 13 , wherein the instructions to generate the code structure signature comprise instructions to generate a representation of a code fragment without variability of names. 15 . The non-transitory, machine-readable medium of claim 13 , wherein the instructions to generate the code structure signature comprise instructions to generate a representation of a code fragment that replaces each identifier name with a representative token for identifiers and each variable name with a representative token for variables. 16 . The non-transitory, machine-readable medium of claim 13 , wherein the instructions to determine the values of features based, at least in part, on the code structure signatures comprise instructions to calculate values for a subset of the similarity metrics of code fragments as represented by the code structure signatures. 17 . The non-transitory, machine-

Assignees

Inventors

Classifications

  • Intelligent editors · CPC title

  • Software maintenance or management · CPC title

  • Creation or generation of source code · CPC title

  • Program documentation · CPC title

  • G06F8/35Primary

    model driven · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12566593B2 cover?
A generative AI based pipeline has been created that ranks generated responses that are candidate software patches. The ranking is based on predicted quality measures of code fragments within a corresponding prompt to a generated AI model. The predicted quality measures are generated by a machine learning model that has been trained based on features that are values/measures of similarity metri…
Who is the assignee on this patent?
Veracode Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).