Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06F8/33. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jun 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Code generation through reinforcement learning using code-quality rewards

US2024192927A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2024192927-A1
Application number	US-202418582248-A
Country	US
Kind code	A1
Filing date	Feb 20, 2024
Priority date	Dec 17, 2021
Publication date	Jun 13, 2024
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deep learning model trained to learn to predict source code is tuned for a target source code generation task through reinforcement learning using a reward score that considers the quality of the source code predicted during the tuning process. The reward score is adjusted to consider code-quality factors and source code metrics. The code-quality factors account for the predicted source code having syntactic correctness, successful compilation, successful execution, successful invocation, readability, functional correctness, and coverage. The source code metrics generate a score based on how close the predicted source code is to a ground truth code.

First claim

Opening claim text (preview).

What is claimed: 1 . A system comprising: a processor; and a memory that stores a program configured to be executed by the processor, the program comprising instructions to perform actions that: access a first deep learning model previously trained to generate source code for a first source code task, wherein the first deep learning model comprises parameters learned through cross-entropy loss; tune the parameters of the first deep learning model to train a second deep learning model to learn to generate source code for a second source code task, wherein tune the parameters of the first deep learning model to train the second deep learning model comprises instructions to perform actions that: input a training sample to the first deep learning model and to the second deep learning model, wherein the first deep learning model predicts a first predicted source code snippet over T timesteps, wherein the second deep learning model predicts a second predicted source code snippet over T timesteps; compute a code-quality reward for the second predicted source code snippet, wherein the code-quality reward is based on syntax correctness of the second predicted source code snippet, successful execution of the second predicted source code snippet, successful compilation of the second predicted source code snippet, and successful invocation of the second predicted source code snippet; compute a reward for the second predicted source code snippet at each timestep t based on a divergence between an output distribution from the first deep learning model at each time step t and an output distribution from the second deep learning model at each time step t; add the code-quality reward to the reward of the last timestep; compute a policy loss based on the rewards of each timestep t; and backpropagate the policy loss to the second deep learning model to adjust the parameters of the second deep learning model; and deploy the second deep learning model in an inference system to generate source code for the second source code task. 2 . The system of claim 1 , wherein the code-quality reward further comprises a metric score based on a similarity between the second predicted source code snippet and a ground truth source code snippet associated with the training sample. 3 . The system of claim 2 , wherein the metric score is based on a Bilingual Evaluation Understudy (BLEU) score and/or a Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score. 4 . The system of claim 1 , wherein the code-quality reward further comprises a score for functional correctness and readability of the second predicted source code snippet. 5 . The system of claim 1 , wherein compute the policy loss based on the rewards of each timestep t further comprises instructions to perform actions that: compute a generalized advantage estimation at each timestep t based on the reward at each respective timestep t and a value function output at the respective timestep t from the second deep learning model. 6 . The system of claim 5 , wherein the program comprises further instructions to perform actions that: compute a state-value function at each timestep t based on the generalized advantage estimation at each respective timestep t and the value function output at each respective timestep t. 7 . The system of claim 6 , wherein the program comprises further instructions to perform actions that: apply a clipped surrogate objective function to the generalized advantage estimation at each timestep t; and compute a value estimate error loss for each value function output at each timestep t. 8 . The system of claim 6 , wherein the program comprises further instructions to perform actions that: compute the policy loss as a sum of the clipped surrogate objective function overall T timesteps and the value estimate error loss overall T timesteps. 9 . The system of claim 1 , wherein the first deep learning model is a neural transformer model with attention. 10 . The system of claim 1 , wherein the first deep learning model is a decoder-only neural transformer model with attention. 11 . A computer-implemented method, comprising: selecting a first deep learning model trained to generate source code for a first source code generation task, wherein parameters of the first deep learning model are determined from a cross-entropy loss; generating a second deep learning model having parameters of the first deep learning model; updating the parameters of the second deep learning model for the second deep leaning model to learn to generate source code for a second code generation task, wherein updating the parameters of the second deep learning model further comprises: applying a training sample to the first deep learning model for the first deep learning model to predict a first source code snippet over T timesteps; applying the training sample to the second deep learning model for the second deep learning model to predict a second source code snippet over T timesteps; generating a code reward score for the second source code snippet based on syntax correctness of the second predicted source code snippet, successful execution of the second predicted source code snippet, successful compilation of the second predicted source code snippet, and/or successful invocation of the second predicted source code snippet; determining a reward at each timestep t of the T timesteps, wherein at each timestep t of the T timesteps, the second deep learning model predicts a token of the second predicted source code snippet; augmenting the reward at a last timestep with the code reward score; computing a policy loss based on the rewards for each timestep t of the T timesteps; and backpropagating the policy loss to the second deep learning model to adjust the parameters of the second deep learning model; and deploying the second deep learning model to perform the second code generation task. 12 . The computer-implemented method of claim 11 , comprising: computing a similarity score for the second predicted source code snippet with respect to a ground truth code snippet of the training sample; and incorporating the similarity score into the code reward score. 13 . The computer-implemented method of claim 11 , wherein the similarity score is based on a BiLingual Evaluation Understudy (BLEU) metric or a Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric. 14 . The computer-implemented method of claim 11 , further comprising: computing a generalized advantage estimation at each timestep t of the T timesteps based on a reward at each respective timestep t of the T timesteps and a value function output at each respective timestep t of the T timesteps from the second deep learning model. 15 . The computer-implemented method of claim 11 , further comprising: computing a state-value function at each timestep t of the T timesteps based on the generalized advantage estimation at each respective timestep t of the T timesteps and the value function output at each respective timestep t of the T timesteps. 16 . The computer-implemented method of claim 12 , further comprising: applying a clipped surrogate objective function to the generalized advantage estimation at each respective timestep t of the T timesteps; and computing a value estimate error loss for each value function output at each respective timestep t of the T timesteps. 17 . The computer-implemented method of claim 16 , further comprising: computing the policy loss as a sum of the clipped surrogate objective function overall the

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/04
Architecture, e.g. interconnection topology · CPC title
G06F18/217
Validation; Performance evaluation; Active pattern learning techniques · CPC title
G06F8/447
Target code generation · CPC title
G06F8/77
Software metrics · CPC title
G06N3/088
Non-supervised learning, e.g. competitive learning · CPC title

Patent family

Related publications grouped by family.

View patent family 83995546

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024192927A1 cover?: A deep learning model trained to learn to predict source code is tuned for a target source code generation task through reinforcement learning using a reward score that considers the quality of the source code predicted during the tuning process. The reward score is adjusted to consider code-quality factors and source code metrics. The code-quality factors account for the predicted source code …
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06F8/33. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jun 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Training data augmentation via program simplification

Provisional selection drives edit suggestion generation

Artificial intelligence engine for mixing and enhancing features from one or more trained pre-existing machine-learning models

Transfer learning system for automated software engineering tasks

Frequently asked questions