Type inference in dynamic languages
US-2023029250-A1 · Jan 26, 2023 · US
US12008341B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12008341-B2 |
| Application number | US-202318321921-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 23, 2023 |
| Priority date | Jul 14, 2022 |
| Publication date | Jun 11, 2024 |
| Grant date | Jun 11, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are methods, systems, and computer-readable media for generating natural language based on computer code input. In an embodiment, a method may comprise one or more of: accessing a docstring generation model configured to generate docstrings from computer code; receiving one or more computer code samples; generating, using the docstring generation model and based on the received one or more computer code samples, one or more candidate docstrings representing natural language text, each of the one or more candidate docstrings being associated with at least a portion of the one or more computer code samples; identifying at least one of the one or more candidate docstrings that provides an intent of the at least a portion of the one or more computer code samples; and/or outputting, via a user interface, the at least one identified docstring with the at least a portion of the one or more computer code samples.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: training a machine learning model to generate natural language docstrings from computer code; receiving one or more computer code samples at the trained machine learning model; generating, via the trained machine learning model and based on the received one or more computer code samples, one or more candidate natural language docstrings representing natural language text, each of the one or more candidate natural language docstrings being associated with at least a portion of the one or more computer code samples; identifying at least one of the one or more candidate natural language docstrings that provides an intent of the at least a portion of the one or more computer code samples; outputting from the trained machine learning model the at least one identified natural language docstring with the at least a portion of the one or more computer code samples; and receiving, at the machine learning model, a selection of the one or more computer code samples, wherein the machine learning model provides an automatic description of the selection and generates a template for building an additional machine learning model. 2. The method of claim 1 , wherein the machine learning model further generates a similarity between the intent and an additional natural language text. 3. The method of claim 2 , wherein the machine learning model is further trained using the outputted at least one identified natural language docstring in association with the at least a portion of the one or more computer code samples. 4. The method of claim 1 , wherein the machine learning model is trained using concatenated strings, each concatenated string comprising at least two of a function signature, a reference solution, or a docstring. 5. The method of claim 1 , wherein identifying at least one of the one or more candidate natural language docstrings is based on a correctness score computed for at least one natural language candidate docstring, the correctness score indicating a proportion of correctly classified instances compared to a total number of candidate natural language docstrings generated. 6. The method of claim 1 , further comprising verifying each of the one or more natural language candidate docstrings, wherein verifying includes determining a correctness score for each of the one or more candidate natural language docstrings, wherein the identifying at least one of the one or more candidate natural language docstrings is based on the determined correctness score. 7. The method of claim 6 , wherein the machine learning model is fine-tuned based on verified candidate natural language docstrings. 8. The method of claim 6 , further comprising ranking the one or more candidate natural language docstrings based on the determined correctness score, wherein identifying one of the one or more candidate natural language docstrings is based on selecting a top-k candidate natural language docstring. 9. The method of claim 1 , wherein the at least one identified natural language docstring and the at least a portion of the one or more computer code samples are output via an application programming interface (API). 10. The method of claim 1 , wherein the trained machine learning model has between 10 billion and 14 billion parameters. 11. The method of claim 1 , wherein the trained machine learning model comprises a plurality of layers, at least one of the layers having a transformer decoder architecture. 12. The method of claim 1 , wherein the machine learning model further suggests a change to improve existing code within the received one or more computer code samples. 13. The method of claim 1 , wherein the machine learning model is fine-tuned based on at least one public web source or software repository. 14. The method of claim 13 , wherein the machine learning model is fine-tuned based on a set of training data constructed from examples within the at least one public web source or software repository. 15. The method of claim 1 , wherein identifying at least one of the one or more candidate natural language docstrings is further based on a mean-log probability. 16. The method of claim 1 , wherein the machine learning model is developed by applying training data comprising annotated computer code to a precursor model comprising a machine learning model trained on natural language prom pts. 17. The method of claim 1 , further comprising training a machine learning model used for generating computer code based on natural language input using training data comprising the outputted at least one identified natural language docstring in association with the at least a portion of the one or more computer code samples. 18. A system comprising: at least one memory storing instructions; at least one processor configured to execute the instructions to perform operations comprising: training a machine learning model to generate natural language docstrings from computer code; receiving one or more computer code samples at the trained machine learning model; generating, via the trained machine learning model and based on the received one or more computer code samples, one or more candidate natural language docstrings representing natural language text, each of the one or more candidate natural language docstrings being associated with at least a portion of the one or more computer code samples; identifying at least one of the one or more candidate natural language docstrings that provides an intent of the at least a portion of the one or more computer code samples; outputting from the trained machine learning model the at least one identified natural language docstring with the at least a portion of the one or more computer code samples; and receiving, at the machine learning model, a selection of the one or more computer code samples, wherein the machine learning model provides an automatic description of the selection and generates a template for building an additional machine learning model. 19. A non-transitory computer-readable medium including instructions that are executable by one or more processors to perform operations comprising: training a machine learning model to generate natural language docstrings from computer code; receiving one or more computer code samples at the trained machine learning model; generating, via the trained machine learning model and based on the received one or more computer code samples, one or more candidate natural language docstrings representing natural language text, each of the one or more candidate natural language docstrings being associated with at least a portion of the one or more computer code samples; identifying at least one of the one or more candidate natural language docstrings that provides an intent of the at least a portion of the one or more computer code samples; outputting from the trained machine learning model the at least one identified natural language docstring with the at least a portion of the one or more computer code samples; and receiving, at the machine learning model, a selection of the one or more computer code samples, wherein the machine learning model provides an automatic description of the selection and generates a template for building an additional machine learning model.
Program documentation · CPC title
Intelligent editors · CPC title
Combinations of networks · CPC title
Templates · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.