Systems and methods for auto-captioning repositories from source code

US12039296B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12039296-B2
Application numberUS-202218054684-A
CountryUS
Kind codeB2
Filing dateNov 11, 2022
Priority dateNov 11, 2022
Publication dateJul 16, 2024
Grant dateJul 16, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for auto-captioning repositories from source code are disclosed. A method for code repository embedding may include a computer program executed by an electronic device: (1) extracting docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; (2) applying a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; (3) concatenating the vectors into an output embedding vector; (4) weighting the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; (5) generating a plurality of tags for the repository representation or the output embedding vector representing the weights of the tags, using a trained neural network; (6) applying a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and (7) outputting the caption.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for code repository embedding, comprising: extracting, by a computer program executed by an electronic device, docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; applying, by the computer program, a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; concatenating, by the computer program, the vectors into an output embedding vector; weighting, by the computer program, the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; generating, by the computer program, a plurality of tags for the repository representation or the output embedding vector representing the weights of the tags, using a trained neural network; applying, by the computer program, a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and outputting, by the computer program, the caption. 2. The method of claim 1 , further comprising: identifying, by the computer program, the scripts in a code repositories database. 3. The method of claim 1 , wherein the computer program identifies the docstring embeddings, the code embeddings, and the dependency embeddings using a Bi-directional Encoder Representations from Transformers. 4. The method of claim 1 , wherein the machine learning algorithm comprises a gated recurrent unit (GRU) that is part of a Recurrent Neural Network (RNN). 5. The method of claim 1 , wherein the attention weights are trained from a curated dataset collected from a labelled dataset. 6. The method of claim 1 , wherein the repository representation comprises a numerical representation of the repository. 7. A method for code repository embedding, comprising: extracting, by a computer program executed by an electronic device, docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; applying, by the computer program, a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; concatenating, by the computer program, the vectors into an output embedding vector; weighting, by the computer program, the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; selecting, by the computer program, a plurality of tags for the repository representation using a reinforcement learning agent; applying, by the computer program, a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and outputting, by the computer program, the caption. 8. The method of claim 7 , further comprising: identifying, by the computer program, the scripts in a code repositories database. 9. The method of claim 7 , wherein the computer program identifies the docstring embeddings, the code embeddings, and the dependency embeddings using a Bi-directional Encoder Representations from Transformers. 10. The method of claim 7 , wherein the machine learning algorithm comprises a gated recurrent unit (GRU) that is part of a Recurrent Neural Network (RNN). 11. The method of claim 7 , wherein the reinforcement learning agent comprises a deep neural network. 12. The method of claim 7 , wherein the attention weights are trained from a curated dataset collected from a labelled dataset. 13. The method of claim 7 , wherein the repository representation comprises a numerical representation of the repository. 14. The method of claim 7 , further comprising: reward shaping the reinforcement learning agent with feedback on the tag selection. 15. A non-transitory computer readable storage medium, including instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to perform steps comprising: extracting docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; applying a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; concatenating the vectors into an output embedding vector; weighting the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; generating a plurality of tags for the repository representation or the output embedding vector representing the weights of the tags, using a trained neural network; applying a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and outputting the caption. 16. The non-transitory computer readable storage medium of claim 15 , wherein the docstring embeddings, the code embeddings, and the dependency embeddings are identified using a Bi-directional Encoder Representations from Transformers. 17. The non-transitory computer readable storage medium of claim 15 , wherein the machine learning algorithm comprises a gated recurrent unit (GRU) that is part of a Recurrent Neural Network (RNN). 18. The non-transitory computer readable storage medium of claim 15 , wherein tags are generated using reinforcement learning agent to select the tags. 19. The non-transitory computer readable storage medium of claim 18 , wherein the reinforcement learning agent comprises a deep neural network. 20. The non-transitory computer readable storage medium of claim 18 , further including instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to reward shape the reinforcement learning agent with feedback on the tag selection.

Assignees

Inventors

Classifications

  • Functional or applicative languages; Rewrite languages · CPC title

  • Command shells · CPC title

  • Version control (security arrangements therefor G06F21/57); Configuration management · CPC title

  • G06F8/33Primary

    Intelligent editors · CPC title

  • G06F8/30Primary

    Creation or generation of source code · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12039296B2 cover?
Systems and methods for auto-captioning repositories from source code are disclosed. A method for code repository embedding may include a computer program executed by an electronic device: (1) extracting docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; (2) applying a machine learning algorithm to each of the docstring embeddings, the code embeddings…
Who is the assignee on this patent?
Jpmorgan Chase Bank Na
What technology area does this patent fall under?
Primary CPC classification G06F8/33. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).