Automated program repair tool

US11977474B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11977474-B2
Application numberUS-202217994185-A
CountryUS
Kind codeB2
Filing dateNov 25, 2022
Priority dateMay 15, 2020
Publication dateMay 7, 2024
Grant dateMay 7, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An automated program repair tool utilizes a neural transformer model with attention to predict the contents of a bug repair in the context of source code having a bug of an identified bug type. The neural transformer model is trained on a large unsupervised corpus of source code using a span-masking denoising optimization objective, and fine-tuned on a large supervised dataset of triplets containing a bug-type annotation, software bug, and repair. The bug-type annotation is derived from an interprocedural static code analyzer. A bug type edit centroid is computed for each bug type and used in the inference decoding phase to generate the bug repair.

First claim

Opening claim text (preview).

What is claimed: 1. A computer-implemented method, comprising: accessing a neural transformer model with attention having an edit encoder, a context encoder and a decoder; obtaining a training dataset including a plurality of triplets, a triplet comprising a source code snippet with a software bug, a repaired source code, and a bug type of the software bug; training the neural transformer model with attention with the training dataset to learn to predict repaired source code for a given buggy source code snippet and a given bug type, wherein during the training, the edit encoder learns to predict an encoding for each bug type of the training dataset; and upon completion of the training of the neural transformer model with attention, transforming the encodings of each bug type into a bug-type edit centroid. 2. The computer-implemented method of claim 1 , further comprising deploying the neural transformer model with attention in an inference system to predict repaired code for a given source code snippet having a software bug of a specified bug type, wherein the edit encoder receives the bug-type edit centroid of the specified bug type. 3. The computer-implemented method of claim 1 , further comprising: forming a cluster for each bug type, wherein the cluster includes encodings having a same bug type; and computing the bug-type edit centroid for each cluster from the encodings of each bug type in a respective cluster. 4. The computer-implemented method of claim 1 , wherein during the training, the context encoder learns to predict an encoding of a context, and wherein the context includes a source code snippet with a software bug and a corresponding bug type. 5. The computer-implemented method of claim 4 , wherein during the training, the decoder receives a concatenation of the encoding of the context and the encoding of the bug-type edit centroid. 6. The computer-implemented method of claim 1 , further comprising: prior to training the neural transformer model with attention with the training dataset to learn to predict repaired source code for a given buggy source code snippet and a given bug type, pre-training the neural transformer model with attention on unsupervised source code snippets. 7. The computer-implemented method of claim 1 , wherein the bug type includes null dereference, immutable cast, empty vector access, memory leak and/or thread-safety violation. 8. A computer-implemented method, comprising: accessing a source code snippet with a software bug, wherein the software bug is associated with a bug type; obtaining a bug-type edit centroid of the bug type; and performing a beam search to generate repaired source code for the source code snippet with the software bug, wherein the repaired source code comprises a sequence of source code tokens, wherein the beam search generates the repaired source code using a neural transformer model with attention given the source code snippet with the software bug, the bug type, and the bug-type edit centroid, to determine each token of the sequence of source code tokens based on an output probability distribution. 9. The computer-implemented method of claim 8 , further comprising: performing an interprocedural static analysis on the source code snippet to detect the software bug and the bug type. 10. The computer-implemented method of claim 8 , wherein the bug type includes null dereference, immutable cast, empty vector access, memory leak and/or thread-safety violation. 11. The computer-implemented method of claim 8 , wherein the bug-type centroid is a vector representation of edits that transform the source code with the software bug of the bug type into source code without the software bug. 12. A system comprising: a processor; and a memory that stores a program configured to be executed by the processor; wherein the program comprises instructions to perform actions that: obtain a source code snippet with a software bug associated with a bug type; obtain a bug-type edit centroid for the bug type; access a neural transformer model with attention having at least one edit encoder block, at least one context encoder block, and at least one decoder block; and perform a beam search to generate a repaired source code snippet for the source code snippet with the software bug, wherein the beam search invokes the neural transformer model with attention given the bug-type edit centroid and the source code snippet with the software bug to predict each token of the repaired source code snippet autoregressively, wherein the at least one edit encoder block generates an edit embedding for the bug-type edit centroid, wherein the at least one context encoder generates a context embedding for the source code snippet with the software bug, wherein the at least one decoder block generates an output probability distribution given the edit embedding and the context embedding, wherein the output probability distribution associates a probability of a token following a preceding sequence of tokens. 13. The system of claim 12 , wherein the program comprises instructions to perform actions that: analyze statically the source code snippet with the software bug to determine the bug type of the software bug. 14. The system of claim 13 , wherein the program comprises instructions to perform actions that: concatenate the edit embedding and the context embedding; and wherein the at least one decoder block receives the concatenated embeddings. 15. The system of claim 12 , wherein the bug-type edit centroid is a vector representation of edits associated with the bug type. 16. The system of claim 12 , wherein the obtained source code snippet is extracted from a source code program of an integrated development environment (IDE); and wherein the program comprises instructions to perform actions that: execute the beam search within the IDE to generate the repaired source code for the obtained source code snippet with the software bug. 17. The system of claim 12 , wherein the bug type includes null dereference, immutable cast, empty vector access, memory leak and/or thread-safety violation.

Assignees

Inventors

Classifications

  • Transfer learning · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Supervised learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • G06F11/362Primary

    Debugging of software · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11977474B2 cover?
An automated program repair tool utilizes a neural transformer model with attention to predict the contents of a bug repair in the context of source code having a bug of an identified bug type. The neural transformer model is trained on a large unsupervised corpus of source code using a span-masking denoising optimization objective, and fine-tuned on a large supervised dataset of triplets conta…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/362. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 07 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).