Dynamic code snippet promotion
US-2022391180-A1 · Dec 8, 2022 · US
US12039296B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12039296-B2 |
| Application number | US-202218054684-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 11, 2022 |
| Priority date | Nov 11, 2022 |
| Publication date | Jul 16, 2024 |
| Grant date | Jul 16, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for auto-captioning repositories from source code are disclosed. A method for code repository embedding may include a computer program executed by an electronic device: (1) extracting docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; (2) applying a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; (3) concatenating the vectors into an output embedding vector; (4) weighting the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; (5) generating a plurality of tags for the repository representation or the output embedding vector representing the weights of the tags, using a trained neural network; (6) applying a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and (7) outputting the caption.
Opening claim text (preview).
What is claimed is: 1. A method for code repository embedding, comprising: extracting, by a computer program executed by an electronic device, docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; applying, by the computer program, a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; concatenating, by the computer program, the vectors into an output embedding vector; weighting, by the computer program, the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; generating, by the computer program, a plurality of tags for the repository representation or the output embedding vector representing the weights of the tags, using a trained neural network; applying, by the computer program, a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and outputting, by the computer program, the caption. 2. The method of claim 1 , further comprising: identifying, by the computer program, the scripts in a code repositories database. 3. The method of claim 1 , wherein the computer program identifies the docstring embeddings, the code embeddings, and the dependency embeddings using a Bi-directional Encoder Representations from Transformers. 4. The method of claim 1 , wherein the machine learning algorithm comprises a gated recurrent unit (GRU) that is part of a Recurrent Neural Network (RNN). 5. The method of claim 1 , wherein the attention weights are trained from a curated dataset collected from a labelled dataset. 6. The method of claim 1 , wherein the repository representation comprises a numerical representation of the repository. 7. A method for code repository embedding, comprising: extracting, by a computer program executed by an electronic device, docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; applying, by the computer program, a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; concatenating, by the computer program, the vectors into an output embedding vector; weighting, by the computer program, the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; selecting, by the computer program, a plurality of tags for the repository representation using a reinforcement learning agent; applying, by the computer program, a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and outputting, by the computer program, the caption. 8. The method of claim 7 , further comprising: identifying, by the computer program, the scripts in a code repositories database. 9. The method of claim 7 , wherein the computer program identifies the docstring embeddings, the code embeddings, and the dependency embeddings using a Bi-directional Encoder Representations from Transformers. 10. The method of claim 7 , wherein the machine learning algorithm comprises a gated recurrent unit (GRU) that is part of a Recurrent Neural Network (RNN). 11. The method of claim 7 , wherein the reinforcement learning agent comprises a deep neural network. 12. The method of claim 7 , wherein the attention weights are trained from a curated dataset collected from a labelled dataset. 13. The method of claim 7 , wherein the repository representation comprises a numerical representation of the repository. 14. The method of claim 7 , further comprising: reward shaping the reinforcement learning agent with feedback on the tag selection. 15. A non-transitory computer readable storage medium, including instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to perform steps comprising: extracting docstring embeddings, code embeddings, and dependency embeddings from scripts in a repository; applying a machine learning algorithm to each of the docstring embeddings, the code embeddings, and the dependency embeddings, wherein outputs of each of the machine learning algorithms comprises a vector; concatenating the vectors into an output embedding vector; weighting the output embedding vector using an attention mechanism, resulting in a repository representation comprising an abstract vector; generating a plurality of tags for the repository representation or the output embedding vector representing the weights of the tags, using a trained neural network; applying a tags-to-caption transformer to the tags or the output embedding vector, resulting in a caption; and outputting the caption. 16. The non-transitory computer readable storage medium of claim 15 , wherein the docstring embeddings, the code embeddings, and the dependency embeddings are identified using a Bi-directional Encoder Representations from Transformers. 17. The non-transitory computer readable storage medium of claim 15 , wherein the machine learning algorithm comprises a gated recurrent unit (GRU) that is part of a Recurrent Neural Network (RNN). 18. The non-transitory computer readable storage medium of claim 15 , wherein tags are generated using reinforcement learning agent to select the tags. 19. The non-transitory computer readable storage medium of claim 18 , wherein the reinforcement learning agent comprises a deep neural network. 20. The non-transitory computer readable storage medium of claim 18 , further including instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to reward shape the reinforcement learning agent with feedback on the tag selection.
Related publications grouped by family.
Answers are generated from the same data shown on this page.