Refactoring and/or rearchitecting source code using machine learning

US11893384B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11893384-B2
Application numberUS-202217668974-A
CountryUS
Kind codeB2
Filing dateFeb 10, 2022
Priority dateFeb 10, 2022
Publication dateFeb 6, 2024
Grant dateFeb 6, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations are described herein for leveraging machine learning to automate source code refactoring and/or rearchitecting. In various implementations, one or more ground truth boundaries may be removed from one or more boundaried source code files to produce one or more boundary-less source code files. One or more of the boundary-less source code files may be processed using a machine learning model to predict one or more candidate boundaries for reintroduction into the one or more boundary-less source code files. The one or more ground truth boundaries may be compared with the one or more predicted candidate boundaries. The machine learning model may be trained based on the comparing.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented using one or more processors and comprising: creating a training set that includes boundaried source code files and corresponding boundary-less source code files, wherein the creating includes removing ground truth boundaries from the boundaried source code files to produce the boundary-less source code files; processing the training set using a machine learning model to predict candidate boundaries for reintroduction into the boundary-less source code files; comparing the ground truth boundaries of the boundaried source code files with the predicted candidate boundaries to determine one or more errors; and training the machine learning model based on the one or more errors to minimize a loss function of the machine learning model. 2. The method of claim 1 , wherein the removing comprises inlining one or more shorthand source code snippets contained in one or more of the boundaried source code files to generate one or more longhand source code snippets. 3. The method of claim 2 , wherein the one or more predicted candidate boundaries comprise a candidate micro service application programming interface (API) to replace one or more of the longhand source code snippets. 4. The method of claim 2 , wherein the one or more shorthand source code snippets include a function call, a preprocessor macro, or a template function call. 5. The method of claim 2 , wherein the one or more predicted candidate boundaries comprise, as a replacement of one or more of the longhand source code snippets, a candidate function call, a candidate preprocessor macro, or a candidate template function call. 6. The method of claim 1 , further comprising: executing a binary compiled from one or more original source code files to generate one or more execution traces; based on one or more of the execution traces, identifying lines of one or more of the original source code files that are suitable for synthetic boundary creation; and replacing the lines that are suitable for boundary creation with, as one or more of the ground truth boundaries, one or more synthetic boundaries to create one or more of the boundaried source code files. 7. The method of claim 1 , wherein the machine learning model comprises a transformer machine learning model. 8. The method of claim 1 , wherein the method further comprises generating a graph from one or more of the boundary-less source code files, and the processing includes processing the graph using a graph neural network. 9. A method for predicting one or more candidate boundaries for incorporation into source code, the method implemented using one or more processors and comprising: processing one or more boundary-deficient source code files using a machine learning model to predict the one or more candidate boundaries for introduction into the one or more boundary-deficient source code files, wherein the machine learning model comprises an encoder-decoder model that was trained previously using training examples comprising source code with boundaries removed, wherein the removed boundaries were used during training of the machine learning model as labels to determine error(s) and wherein the error(s) was used to train the machine learning model to minimize a loss function associated with the encoder-decoder model; and providing output indicative of one or more of the predicted candidate boundaries. 10. The method of claim 9 , wherein the one or more predicted candidate boundaries comprise a candidate function call, a candidate preprocessor macro, or a candidate template function call. 11. The method of claim 9 , wherein the encoder-decoder model comprises a transformer machine learning model. 12. The method of claim 9 , wherein the method further comprises generating a graph from one or more of the boundary-deficient source code files, and the processing includes processing the graph using a graph neural network as the encoder-decoder model. 13. A system comprising one or more processors and memory storing instructions that, in response to execution of the instructions, cause the one or more processors to: create a training set that includes boundaried source code files and corresponding boundary-less source code files, wherein the instructions to create include instructions to remove ground truth boundaries from the more boundaried source code files to produce the boundary-less source code files; process the training set using a machine learning model to predict candidate boundaries for reintroduction into the boundary-less source code files; compare the ground truth boundaries of the boundaried source code files with the predicted candidate boundaries to determine an error; and train the machine learning model based on the error to minimize a loss function of the machine learning model. 14. The system of claim 13 , wherein the removing comprises inlining one or more shorthand source code snippets contained in one or more of the boundaried source code files to generate one or more longhand source code snippets. 15. The system of claim 14 , wherein the one or more predicted candidate boundaries comprise a candidate micro service application programming interface (API) to replace one or more of the longhand source code snippets. 16. The system of claim 14 , wherein the one or more shorthand source code snippets include a function call, a preprocessor macro, or a template function call. 17. The system of claim 14 , wherein the one or more predicted candidate boundaries comprise, as a replacement of one or more of the longhand source code snippets, a candidate function call, a candidate preprocessor macro, or a candidate template function call. 18. The system of claim 13 , further comprising instructions to: execute a binary compiled from one or more original source code files to generate one or more execution traces; based on one or more of the execution traces, identify lines of one or more of the original source code files that are suitable for synthetic boundary creation; and replace the lines that are suitable for boundary creation with, as one or more of the ground truth boundaries, one or more synthetic boundaries to create one or more of the boundaried source code files. 19. The system of claim 13 , wherein the machine learning model comprises a transformer machine learning model or a graph neural network.

Assignees

Inventors

Classifications

  • G06F8/72Primary

    Code refactoring · CPC title

  • Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11893384B2 cover?
Implementations are described herein for leveraging machine learning to automate source code refactoring and/or rearchitecting. In various implementations, one or more ground truth boundaries may be removed from one or more boundaried source code files to produce one or more boundary-less source code files. One or more of the boundary-less source code files may be processed using a machine lear…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F8/72. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).