Mechanisms for Continuous Improvement of Automated Machine Learning
US-2021304055-A1 · Sep 30, 2021 · US
US11604640B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11604640-B2 |
| Application number | US-202017119769-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 11, 2020 |
| Priority date | Dec 11, 2020 |
| Publication date | Mar 14, 2023 |
| Grant date | Mar 14, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An approach to code refactor renaming may be provided. Source code with a naming convention for functions and classes can be presented to a machine learning model. The model may identify the names for functions and classes. The identified names may be tokenized. Docstrings associated with functions and classes may be identified. Code for the identified functions and classes and associated may be input into a feature vector generation mechanism. A model may be trained mapping the generated feature vectors to tokenized identified names, via regression. The model can be utilized to analyze input code with the same naming convention to predict names for functions and classes, allowing for the recommendation of function and class names in accordance with the programming code naming convention.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for code refactor renaming, the method comprising: receiving, by one or more processors, a source code dataset, wherein the dataset is comprised of a plurality of functions and a plurality of classes; identifying, by the one or more processors, function names and class names from the plurality of functions and the plurality of classes of the source code dataset; tokenizing, by the one or more processors, each of the identified function names and identified class names; generating, by the one or more processors, feature vectors for the source code of the plurality of functions and the plurality of classes; generating, by the one or more processors, feature vectors for the docstrings associated with the plurality of functions and the plurality of classes; combining, by the one or more processors, each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class; training, by the one or more processors, a machine learning model through regression to map the combined feature vectors to the corresponding tokenized function names; receiving, by the one or more processors, a programming code with the same naming convention as the determined naming convention at the trained machine learning model; generating, by the one or more processors, a name recommendation for one or more functions with the trained machine learning model; generating, by the one or more processors, a confidence score for the name recommendation, and responsive to the confidence score being above a predetermined threshold, automatically renaming, by the one or more processors, the one or more functions with the name recommendation. 2. The computer-implemented method of claim 1 , wherein identifying function names and class names is based on an abstract syntax tree. 3. The computer-implemented method of claim 1 , wherein tokenizing function names and class names is performed by an n-gram generator model. 4. The computer-implemented method of claim 1 , wherein the machine learning model is a sequence to sequence model. 5. The computer-implemented method of claim 1 , wherein generating feature vectors for the source code of the plurality of functions and the plurality of classes is performed by a code encoder. 6. The computer-implemented method of claim 1 , generating feature vectors for the docstrings associated with the plurality of functions and the plurality of classes is performed by a sentence encoder. 7. The computer-implemented method of claim 1 , wherein combining each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class is taking the average of the corresponding feature vector. 8. A system for code refactor renaming, the system comprising: one or more computer processors; one or more computer readable storage media; and computer program instructions to: receive a source code dataset, wherein the dataset is comprised of a plurality of functions and a plurality of classes; identify function names and class names from the plurality of functions and the plurality of classes of the source code dataset; tokenize each of the identified function and identified class names; generate feature vectors for the source code of the plurality of functions and the plurality of classes; generate feature vectors for the docstrings associated with the plurality of functions and the plurality of classes; combine each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class; train a machine learning model through regression to map the combined feature vectors to the corresponding tokenized function names; receiving, by the one or more processors, a programming code with the same naming convention as the determined naming convention at the trained machine learning model; and generate a name recommendation for one or more functions with the trained machine learning model; calculate a confidence score for the name recommendation, and responsive to the confidence score being above a predetermined threshold, instructions to automatically rename the one or more functions with the name recommendation. 9. The system of claim 8 , wherein identifying function names and class names is based on an abstract syntax tree. 10. The system of claim 8 , wherein tokenizing function names and class names is performed by an n-gram generator model. 11. The system of claim 8 , wherein the machine learning model is a sequence to sequence model. 12. The system of claim 8 , wherein generating feature vectors for the source code of the plurality of functions and the plurality of classes is performed by a code encoder. 13. The system of claim 8 , wherein generating feature vectors for the docstrings associated with the plurality of functions and the plurality of classes is performed by a sentence encoder. 14. The system of claim 8 , wherein combining each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class is taking the average of the corresponding feature vector. 15. A computer program product for code refactor renaming comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processors to perform a function, the function comprising: receive a source code dataset, wherein the dataset is comprised of a plurality of functions and a plurality of classes; identify function names and class names from the plurality of functions and the plurality of classes of the source code dataset; tokenize each of the identified function and class names; generate feature vectors for the source code of the plurality of functions and the plurality of classes; generate feature vectors for the docstrings associated with the plurality of functions and the plurality of classes; combine each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class; train a machine learning model through regression to map the combined feature vectors to the corresponding tokenized function names; receiving, by the one or more processors, a programming code with the same naming convention as the determined naming convention at the trained machine learning model; and generate a name recommendation for one or more functions with the trained machine learning model; calculate a confidence score for the name recommendation, and responsive to the confidence score being above a predetermined threshold, instructions to automatically rename the one or more functions with the name recommendation. 16. The computer program product of claim 15 , wherein identifying function names and class names is based on an abstract syntax tree. 17. The computer program product of claim 15 , wherein tokenizing function names and class names is performed by an n-gram generator model. 18. The computer program product of claim 15 , wherein the machine learning model is a sequence-to-sequence model. 19. The computer program product of claim 15 , wherein generating feature vectors for the source code of the plurality of functions and the plurality of classes is performed by a code encoder. 20. The computer program product of claim 15 , wherein generating featur
Supervised learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Learning methods · CPC title
Inference or reasoning models · CPC title
Code refactoring · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.