What technology area does this patent fall under?

Primary CPC classification G06F8/72. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Refactoring and/or rearchitecting source code using machine learning

US11893384B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11893384-B2
Application number	US-202217668974-A
Country	US
Kind code	B2
Filing date	Feb 10, 2022
Priority date	Feb 10, 2022
Publication date	Feb 6, 2024
Grant date	Feb 6, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations are described herein for leveraging machine learning to automate source code refactoring and/or rearchitecting. In various implementations, one or more ground truth boundaries may be removed from one or more boundaried source code files to produce one or more boundary-less source code files. One or more of the boundary-less source code files may be processed using a machine learning model to predict one or more candidate boundaries for reintroduction into the one or more boundary-less source code files. The one or more ground truth boundaries may be compared with the one or more predicted candidate boundaries. The machine learning model may be trained based on the comparing.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented using one or more processors and comprising: creating a training set that includes boundaried source code files and corresponding boundary-less source code files, wherein the creating includes removing ground truth boundaries from the boundaried source code files to produce the boundary-less source code files; processing the training set using a machine learning model to predict candidate boundaries for reintroduction into the boundary-less source code files; comparing the ground truth boundaries of the boundaried source code files with the predicted candidate boundaries to determine one or more errors; and training the machine learning model based on the one or more errors to minimize a loss function of the machine learning model. 2. The method of claim 1 , wherein the removing comprises inlining one or more shorthand source code snippets contained in one or more of the boundaried source code files to generate one or more longhand source code snippets. 3. The method of claim 2 , wherein the one or more predicted candidate boundaries comprise a candidate micro service application programming interface (API) to replace one or more of the longhand source code snippets. 4. The method of claim 2 , wherein the one or more shorthand source code snippets include a function call, a preprocessor macro, or a template function call. 5. The method of claim 2 , wherein the one or more predicted candidate boundaries comprise, as a replacement of one or more of the longhand source code snippets, a candidate function call, a candidate preprocessor macro, or a candidate template function call. 6. The method of claim 1 , further comprising: executing a binary compiled from one or more original source code files to generate one or more execution traces; based on one or more of the execution traces, identifying lines of one or more of the original source code files that are suitable for synthetic boundary creation; and replacing the lines that are suitable for boundary creation with, as one or more of the ground truth boundaries, one or more synthetic boundaries to create one or more of the boundaried source code files. 7. The method of claim 1 , wherein the machine learning model comprises a transformer machine learning model. 8. The method of claim 1 , wherein the method further comprises generating a graph from one or more of the boundary-less source code files, and the processing includes processing the graph using a graph neural network. 9. A method for predicting one or more candidate boundaries for incorporation into source code, the method implemented using one or more processors and comprising: processing one or more boundary-deficient source code files using a machine learning model to predict the one or more candidate boundaries for introduction into the one or more boundary-deficient source code files, wherein the machine learning model comprises an encoder-decoder model that was trained previously using training examples comprising source code with boundaries removed, wherein the removed boundaries were used during training of the machine learning model as labels to determine error(s) and wherein the error(s) was used to train the machine learning model to minimize a loss function associated with the encoder-decoder model; and providing output indicative of one or more of the predicted candidate boundaries. 10. The method of claim 9 , wherein the one or more predicted candidate boundaries comprise a candidate function call, a candidate preprocessor macro, or a candidate template function call. 11. The method of claim 9 , wherein the encoder-decoder model comprises a transformer machine learning model. 12. The method of claim 9 , wherein the method further comprises generating a graph from one or more of the boundary-deficient source code files, and the processing includes processing the graph using a graph neural network as the encoder-decoder model. 13. A system comprising one or more processors and memory storing instructions that, in response to execution of the instructions, cause the one or more processors to: create a training set that includes boundaried source code files and corresponding boundary-less source code files, wherein the instructions to create include instructions to remove ground truth boundaries from the more boundaried source code files to produce the boundary-less source code files; process the training set using a machine learning model to predict candidate boundaries for reintroduction into the boundary-less source code files; compare the ground truth boundaries of the boundaried source code files with the predicted candidate boundaries to determine an error; and train the machine learning model based on the error to minimize a loss function of the machine learning model. 14. The system of claim 13 , wherein the removing comprises inlining one or more shorthand source code snippets contained in one or more of the boundaried source code files to generate one or more longhand source code snippets. 15. The system of claim 14 , wherein the one or more predicted candidate boundaries comprise a candidate micro service application programming interface (API) to replace one or more of the longhand source code snippets. 16. The system of claim 14 , wherein the one or more shorthand source code snippets include a function call, a preprocessor macro, or a template function call. 17. The system of claim 14 , wherein the one or more predicted candidate boundaries comprise, as a replacement of one or more of the longhand source code snippets, a candidate function call, a candidate preprocessor macro, or a candidate template function call. 18. The system of claim 13 , further comprising instructions to: execute a binary compiled from one or more original source code files to generate one or more execution traces; based on one or more of the execution traces, identify lines of one or more of the original source code files that are suitable for synthetic boundary creation; and replace the lines that are suitable for boundary creation with, as one or more of the ground truth boundaries, one or more synthetic boundaries to create one or more of the boundaried source code files. 19. The system of claim 13 , wherein the machine learning model comprises a transformer machine learning model or a graph neural network.

Assignees

Google Llc

Inventors

Classifications

G06F8/72Primary
Code refactoring · CPC title
G06N3/10
Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title
G06N3/084
Backpropagation, e.g. using gradient descent · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

View patent family 87520930

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11893384B2 cover?: Implementations are described herein for leveraging machine learning to automate source code refactoring and/or rearchitecting. In various implementations, one or more ground truth boundaries may be removed from one or more boundaried source code files to produce one or more boundary-less source code files. One or more of the boundary-less source code files may be processed using a machine lear…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06F8/72. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and device for encoding/decoding video, and recording medium for storing bitstream

Translating text encodings of machine learning models to executable code

Systems and methods for modernizing and optimizing legacy source code

Software refactoring systems and methods

Semantic-aware and self-corrective re-architecting system

Systems and methods for automatically generating code for deep learning systems

Frequently asked questions