Code refactor renaming recommender

US11604640B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11604640-B2
Application numberUS-202017119769-A
CountryUS
Kind codeB2
Filing dateDec 11, 2020
Priority dateDec 11, 2020
Publication dateMar 14, 2023
Grant dateMar 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach to code refactor renaming may be provided. Source code with a naming convention for functions and classes can be presented to a machine learning model. The model may identify the names for functions and classes. The identified names may be tokenized. Docstrings associated with functions and classes may be identified. Code for the identified functions and classes and associated may be input into a feature vector generation mechanism. A model may be trained mapping the generated feature vectors to tokenized identified names, via regression. The model can be utilized to analyze input code with the same naming convention to predict names for functions and classes, allowing for the recommendation of function and class names in accordance with the programming code naming convention.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for code refactor renaming, the method comprising: receiving, by one or more processors, a source code dataset, wherein the dataset is comprised of a plurality of functions and a plurality of classes; identifying, by the one or more processors, function names and class names from the plurality of functions and the plurality of classes of the source code dataset; tokenizing, by the one or more processors, each of the identified function names and identified class names; generating, by the one or more processors, feature vectors for the source code of the plurality of functions and the plurality of classes; generating, by the one or more processors, feature vectors for the docstrings associated with the plurality of functions and the plurality of classes; combining, by the one or more processors, each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class; training, by the one or more processors, a machine learning model through regression to map the combined feature vectors to the corresponding tokenized function names; receiving, by the one or more processors, a programming code with the same naming convention as the determined naming convention at the trained machine learning model; generating, by the one or more processors, a name recommendation for one or more functions with the trained machine learning model; generating, by the one or more processors, a confidence score for the name recommendation, and responsive to the confidence score being above a predetermined threshold, automatically renaming, by the one or more processors, the one or more functions with the name recommendation. 2. The computer-implemented method of claim 1 , wherein identifying function names and class names is based on an abstract syntax tree. 3. The computer-implemented method of claim 1 , wherein tokenizing function names and class names is performed by an n-gram generator model. 4. The computer-implemented method of claim 1 , wherein the machine learning model is a sequence to sequence model. 5. The computer-implemented method of claim 1 , wherein generating feature vectors for the source code of the plurality of functions and the plurality of classes is performed by a code encoder. 6. The computer-implemented method of claim 1 , generating feature vectors for the docstrings associated with the plurality of functions and the plurality of classes is performed by a sentence encoder. 7. The computer-implemented method of claim 1 , wherein combining each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class is taking the average of the corresponding feature vector. 8. A system for code refactor renaming, the system comprising: one or more computer processors; one or more computer readable storage media; and computer program instructions to: receive a source code dataset, wherein the dataset is comprised of a plurality of functions and a plurality of classes; identify function names and class names from the plurality of functions and the plurality of classes of the source code dataset; tokenize each of the identified function and identified class names; generate feature vectors for the source code of the plurality of functions and the plurality of classes; generate feature vectors for the docstrings associated with the plurality of functions and the plurality of classes; combine each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class; train a machine learning model through regression to map the combined feature vectors to the corresponding tokenized function names; receiving, by the one or more processors, a programming code with the same naming convention as the determined naming convention at the trained machine learning model; and generate a name recommendation for one or more functions with the trained machine learning model; calculate a confidence score for the name recommendation, and responsive to the confidence score being above a predetermined threshold, instructions to automatically rename the one or more functions with the name recommendation. 9. The system of claim 8 , wherein identifying function names and class names is based on an abstract syntax tree. 10. The system of claim 8 , wherein tokenizing function names and class names is performed by an n-gram generator model. 11. The system of claim 8 , wherein the machine learning model is a sequence to sequence model. 12. The system of claim 8 , wherein generating feature vectors for the source code of the plurality of functions and the plurality of classes is performed by a code encoder. 13. The system of claim 8 , wherein generating feature vectors for the docstrings associated with the plurality of functions and the plurality of classes is performed by a sentence encoder. 14. The system of claim 8 , wherein combining each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class is taking the average of the corresponding feature vector. 15. A computer program product for code refactor renaming comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processors to perform a function, the function comprising: receive a source code dataset, wherein the dataset is comprised of a plurality of functions and a plurality of classes; identify function names and class names from the plurality of functions and the plurality of classes of the source code dataset; tokenize each of the identified function and class names; generate feature vectors for the source code of the plurality of functions and the plurality of classes; generate feature vectors for the docstrings associated with the plurality of functions and the plurality of classes; combine each feature vector of the docstrings with the corresponding feature vector of the source code in the at least one of the functions and the class; train a machine learning model through regression to map the combined feature vectors to the corresponding tokenized function names; receiving, by the one or more processors, a programming code with the same naming convention as the determined naming convention at the trained machine learning model; and generate a name recommendation for one or more functions with the trained machine learning model; calculate a confidence score for the name recommendation, and responsive to the confidence score being above a predetermined threshold, instructions to automatically rename the one or more functions with the name recommendation. 16. The computer program product of claim 15 , wherein identifying function names and class names is based on an abstract syntax tree. 17. The computer program product of claim 15 , wherein tokenizing function names and class names is performed by an n-gram generator model. 18. The computer program product of claim 15 , wherein the machine learning model is a sequence-to-sequence model. 19. The computer program product of claim 15 , wherein generating feature vectors for the source code of the plurality of functions and the plurality of classes is performed by a code encoder. 20. The computer program product of claim 15 , wherein generating featur

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Learning methods · CPC title

  • Inference or reasoning models · CPC title

  • G06F8/72Primary

    Code refactoring · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11604640B2 cover?
An approach to code refactor renaming may be provided. Source code with a naming convention for functions and classes can be presented to a machine learning model. The model may identify the names for functions and classes. The identified names may be tokenized. Docstrings associated with functions and classes may be identified. Code for the identified functions and classes and associated may b…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F8/72. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).