Programmatically generating evaluation data sets for code generation models

US12141553B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12141553-B2
Application numberUS-202217847113-A
CountryUS
Kind codeB2
Filing dateJun 22, 2022
Priority dateJun 22, 2022
Publication dateNov 12, 2024
Grant dateNov 12, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Evaluation data sets may be programmatically generated for code generation models. An evaluation data set is obtained that includes items that correspond to different evaluation tests for a code generation system. The individual items of the evaluation data set maybe converted, including the conversion of a function signature for the items, the test statements for the items and using a code generation system to generate the body of the function.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: at least one processor; and a memory, storing program instructions that when executed by the at least one processor, cause the at least one processor to implement a programming language conversion system, configured to: receive, via an interface of the programming language conversion system, a request to convert an evaluation data set specified in a first programming language, wherein different items of the evaluation data set correspond to different respective evaluation tests for a code generation system; convert individual ones of the different items of the data set into a second programming language: convert a function signature of the item in the first programming language to the second programming language; convert one or more test statements of the item in the first programming language to the second programming language; send a request to the code generation system to generate a body of the converted function signature in the second programming language according to a prompt in the item; and receive the body of the function signature in the second programming language from the code generation system; and store the converted individual ones of the different items of the evaluation data set as part of a new evaluation data set. 2. The system of claim 1 , wherein the second programming language is specified in the request to convert the evaluation data set. 3. The system of claim 1 , wherein to convert the function signature of the item in the first programming language to the second programming language, the programming language conversion system is further configured to identify respective types in the first programming language of one or more parameters in the function signature that are mapped to corresponding types in the second programming language. 4. The system of claim 1 , wherein to convert the one or more test statements of the item in the first programming language to the second programming language, the programming language conversion system is further configured to identify respective types in the first programming language of one or more parameters in the one or more test statements that are mapped to corresponding types in the second programming language. 5. A method, comprising: receiving, via an interface of a programming language conversion system, an evaluation data set specified in a first programming language, wherein different items of the evaluation data set correspond to different respective evaluation tests for a code generation system; converting individual ones of the different items of the data set into a second programming language: converting, by the programming language conversion system, a function signature of the item in the first programming language to the second programming language; converting, by the programming language conversion system, one or more test statements of the item in the first programming language to the second programming language; and causing, by the programming language conversion system, a body of the converted function signature to be generated in the second programming language according to a prompt in the item used as input to a machine learning model trained to generate code in the second programming language; and storing the converted individual ones of the different items of the evaluation data set as part of a new evaluation data set. 6. The method of claim 5 , further comprising receiving a request to convert the evaluation data set that specifies the second programming language. 7. The method of claim 5 , further comprising: converting the individual ones of the different items of the data set into a third programming language: converting, by the programming language conversion system, the function signature of the item in the first programming language to the third programming language; converting, by the programming language conversion system, the one or more test statements of the item in the first programming language to the third programming language; and causing, by the programming language conversion system, the body of the converted function signature in the third programming language to be generated in the third programming language according to the prompt in the item used as input to a second machine learning model trained to generate code in the third programming language; and storing the converted individual ones of the different items of the evaluation data set in the third programming language as part of a second new evaluation data set. 8. The method of claim 5 , wherein converting the function signature of the item in the first programming language to the second programming language comprises identifying respective types in the first programming language of one or more parameters in the function signature that are mapped to corresponding types in the second programming language. 9. The method of claim 5 , wherein converting the one or more test statements of the item in the first programming language to the second programming language comprises identifying respective types in the first programming language of one or more parameters in the test statements that are mapped to corresponding types in the second programming language. 10. The method of claim 5 , wherein the programming language conversion system is implemented as part of a code development service offered by a provider network and wherein a request to perform the conversion is received from a client of the provider network. 11. The method of claim 5 , further comprising performing natural language conversion on a portion of the prompt according to the second programming language. 12. The method of claim 5 , wherein causing the body of the converted function signature to be generated in the second programming language comprises sending a request to a code generation system implemented as part of a code development service offered by a provider network. 13. One or more non-transitory, computer-readable storage media, storing program instructions that when executed on or across one or more computing devices cause the one or more computing devices to implement: receiving, via an interface of a programming language conversion system, an evaluation data set specified in a first programming language, wherein different items of the evaluation data set correspond to different respective evaluation tests for a code generation system; converting individual ones of the different items of the data set into a second programming language: converting, by the programming language conversion system, a function signature of the item in the first programming language to the second programming language; converting, by the programming language conversion system, one or more test statements of the item in the first programming language to the second programming language; and causing, by the programming language conversion system, a body of the converted function signature to be generated in the second programming language according to a prompt in the item used as input to a machine learning model trained to generate code in the second programming language; and storing the converted individual ones of the different items of the evaluation data set as part of a new evaluation data set. 14. The one or more non-transitory, computer-readable storage media of claim 13 , storing further programming instructions that when executed on or across the one or more computing devices, cause the one or more computing devices to further implement receiving a request to convert the evaluation data set that specifies the second programming language.

Assignees

Inventors

Classifications

  • using formal methods, e.g. model checking, abstract interpretation (theorem proving G06N5/013) · CPC title

  • Target code generation · CPC title

  • Source to source · CPC title

  • for test version control, e.g. updating test cases to a new software version · CPC title

  • G06F8/33Primary

    Intelligent editors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12141553B2 cover?
Evaluation data sets may be programmatically generated for code generation models. An evaluation data set is obtained that includes items that correspond to different evaluation tests for a code generation system. The individual items of the evaluation data set maybe converted, including the conversion of a function signature for the items, the test statements for the items and using a code gen…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/33. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).