What technology area does this patent fall under?

Primary CPC classification G06F8/33. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for auto-captioning repositories from source code

US12039296B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12039296-B2
Application number	US-202218054684-A
Country	US
Kind code	B2
Filing date	Nov 11, 2022
Priority date	Nov 11, 2022
Publication date	Jul 16, 2024
Grant date	Jul 16, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for auto-captioning repositories from source code are disclosed. A method for code repository embedding may include a computer program executed by an electronic device: (1) extracting docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; (2) applying a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; (3) concatenating the vectors into an output embedding vector; (4) weighting the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; (5) generating a plurality of tags for the repository representation or the output embedding vector representing the weights of the tags, using a trained neural network; (6) applying a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and (7) outputting the caption.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for code repository embedding, comprising: extracting, by a computer program executed by an electronic device, docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; applying, by the computer program, a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; concatenating, by the computer program, the vectors into an output embedding vector; weighting, by the computer program, the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; generating, by the computer program, a plurality of tags for the repository representation or the output embedding vector representing the weights of the tags, using a trained neural network; applying, by the computer program, a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and outputting, by the computer program, the caption. 2. The method of claim 1 , further comprising: identifying, by the computer program, the scripts in a code repositories database. 3. The method of claim 1 , wherein the computer program identifies the docstring embeddings, the code embeddings, and the dependency embeddings using a Bi-directional Encoder Representations from Transformers. 4. The method of claim 1 , wherein the machine learning algorithm comprises a gated recurrent unit (GRU) that is part of a Recurrent Neural Network (RNN). 5. The method of claim 1 , wherein the attention weights are trained from a curated dataset collected from a labelled dataset. 6. The method of claim 1 , wherein the repository representation comprises a numerical representation of the repository. 7. A method for code repository embedding, comprising: extracting, by a computer program executed by an electronic device, docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; applying, by the computer program, a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; concatenating, by the computer program, the vectors into an output embedding vector; weighting, by the computer program, the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; selecting, by the computer program, a plurality of tags for the repository representation using a reinforcement learning agent; applying, by the computer program, a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and outputting, by the computer program, the caption. 8. The method of claim 7 , further comprising: identifying, by the computer program, the scripts in a code repositories database. 9. The method of claim 7 , wherein the computer program identifies the docstring embeddings, the code embeddings, and the dependency embeddings using a Bi-directional Encoder Representations from Transformers. 10. The method of claim 7 , wherein the machine learning algorithm comprises a gated recurrent unit (GRU) that is part of a Recurrent Neural Network (RNN). 11. The method of claim 7 , wherein the reinforcement learning agent comprises a deep neural network. 12. The method of claim 7 , wherein the attention weights are trained from a curated dataset collected from a labelled dataset. 13. The method of claim 7 , wherein the repository representation comprises a numerical representation of the repository. 14. The method of claim 7 , further comprising: reward shaping the reinforcement learning agent with feedback on the tag selection. 15. A non-transitory computer readable storage medium, including instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to perform steps comprising: extracting docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; applying a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; concatenating the vectors into an output embedding vector; weighting the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; generating a plurality of tags for the repository representation or the output embedding vector representing the weights of the tags, using a trained neural network; applying a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and outputting the caption. 16. The non-transitory computer readable storage medium of claim 15 , wherein the docstring embeddings, the code embeddings, and the dependency embeddings are identified using a Bi-directional Encoder Representations from Transformers. 17. The non-transitory computer readable storage medium of claim 15 , wherein the machine learning algorithm comprises a gated recurrent unit (GRU) that is part of a Recurrent Neural Network (RNN). 18. The non-transitory computer readable storage medium of claim 15 , wherein tags are generated using reinforcement learning agent to select the tags. 19. The non-transitory computer readable storage medium of claim 18 , wherein the reinforcement learning agent comprises a deep neural network. 20. The non-transitory computer readable storage medium of claim 18 , further including instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to reward shape the reinforcement learning agent with feedback on the tag selection.

Assignees

Jpmorgan Chase Bank Na

Inventors

Classifications

G06F8/311
Functional or applicative languages; Rewrite languages · CPC title
G06F9/45512
Command shells · CPC title
G06F8/71
Version control (security arrangements therefor G06F21/57); Configuration management · CPC title
G06F8/33Primary
Intelligent editors · CPC title
G06F8/30Primary
Creation or generation of source code · CPC title

Patent family

Related publications grouped by family.

View patent family 91028093

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12039296B2 cover?: Systems and methods for auto-captioning repositories from source code are disclosed. A method for code repository embedding may include a computer program executed by an electronic device: (1) extracting docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; (2) applying a machine learning algorithm to each of the docstring embeddings, the code embeddings…
Who is the assignee on this patent?: Jpmorgan Chase Bank Na
What technology area does this patent fall under?: Primary CPC classification G06F8/33. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Dynamic code snippet promotion

Mitigating software-update risks for end users

Context-based metadata generation and automatic annotation of electronic media in a computer network

Frequently asked questions