Automated notebook completion using sequence-to-sequence transformer
US-2023177261-A1 · Jun 8, 2023 · US
US12061880B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12061880-B2 |
| Application number | US-202318321852-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 23, 2023 |
| Priority date | Jul 14, 2022 |
| Publication date | Aug 13, 2024 |
| Grant date | Aug 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are methods, systems, and computer-readable media for generating computer code based on natural language input. In an embodiment, a method may comprise one or more of: receiving a docstring representing natural language text specifying a digital programming result; generating, using a trained machine learning model, and based on the docstring, a computer code sample configured to produce respective candidate results; causing the computer code sample to be executed; identifying, based on the executing, a computer code sample configured to produce a particular candidate result associated with the digital programming result; performing at least one of outputting, via a user interface, the identified computer code sample, compiling the identified computer code sample, transmitting the identified computer code sample to a recipient device, storing the identified computer code sample, and/or re-executing the identified computer code sample.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: receiving a docstring representing natural language text specifying a digital programming result; generating, using a trained machine learning model and based on the docstring, one or more computer code samples configured to produce respective candidate results; causing each of the one or more computer code samples to be executed in a testing environment associated with the trained machine learning model, wherein each of the one or more computer code samples are evaluated based on at least one unit test, the at least one unit test being generated by the machine learning model; identifying, based on a result of the executing in the testing environment, at least one of the computer code samples which produces a particular candidate result associated with the digital programming result; generating, using the trained machine learning model, natural language text associated with the at least one identified computer code sample; verifying each of the one or more executed computer code samples; and outputting the at least one identified computer code sample and the natural language text associated with the at least one identified computer code sample; wherein: verifying includes computing a functional correctness score for each of the executed one or more computer code samples; the identifying at least one of the computer code samples is based on the functional correctness score; and the trained machine learning model is fine-tuned based on verified computer code samples. 2. The method of claim 1 , wherein each of the one or more computer code samples are evaluated based further on a time-related threshold associated with the at least one unit test. 3. The method of claim 1 , wherein identifying at least one of the computer code samples comprises identifying at least one of the computer code samples that passes the at least one unit test and discarding at least one of the computer code samples that fails the at least one unit test. 4. The method of claim 1 , wherein the trained machine learning model is fine-tuned based on the evaluated computer code samples. 5. The method of claim 2 , wherein the time-related threshold is used to classify each of the one or more computer code samples into different categories. 6. The method of claim 1 , wherein each of the one or more generated computer code samples is associated with at least one text token. 7. The method of claim 6 , wherein each of the one or more generated computer code samples is further associated with at least one whitespace token. 8. The method of claim 1 , further comprising outputting, via the user interface, the particular candidate result of the at least one identified computer code sample. 9. The method of claim 1 , wherein the trained machine learning model is fine-tuned based on at least one of a public web source or software repository. 10. The method of claim 9 , wherein the trained machine learning model is fine-tuned based on a set of training problems constructed from examples within the at least one public web source or software repository. 11. The method of claim 1 , wherein identifying at least one of the computer code samples is further based on a mean-log probability. 12. The method of claim 1 , further comprising: compiling the at least one identified computer code sample; transmitting the at least one identified computer code sample to a recipient device; storing the at least one identified computer code sample; and re-executing the at least one identified computer code sample. 13. The method of claim 1 , wherein the natural language text associated with the at least one identified computer code sample includes a definition of a function, method, class, or module associated with the outputted at least one identified computer code sample. 14. The method of claim 1 , wherein the trained machine learning model is developed by applying training data comprising annotated computer code to a precursor model comprising a machine learning model trained on natural language prompts. 15. The method of claim 1 , wherein the trained machine learning model generates training data based on the result of the executing, wherein the trained machine learning model is further trained using the generated training data. 16. The method of claim 1 , wherein the trained machine learning model comprises a plurality of layers, at least one of the layers having a transformer decoder architecture. 17. A system comprising: at least one memory storing instructions; at least one processor configured to execute the instructions to perform operations comprising: receiving a docstring representing natural language text specifying a digital programming result; generating, using a trained machine learning model and based on the docstring, one or more computer code samples configured to produce respective candidate results; causing each of the one or more computer code samples to be executed in a testing environment associated with the trained machine learning model, wherein each of the one or more computer code samples are evaluated based on at least one unit test, the at least one unit test being generated by the machine learning model; identifying, based on a result of the executing in the testing environment, at least one of the computer code samples which produces a particular candidate result associated with the digital programming result; generating, using the trained machine learning model, a natural language text associated with the at least one identified computer code sample; verifying each of the one or more executed computer code samples; and outputting the at least one identified computer code sample and the natural language text associated with the at least one identified computer code sample; wherein: verifying includes computing a functional correctness score for each of the executed one or more computer code samples; the identifying at least one of the computer code samples is based on the functional correctness score; and the trained machine learning model is fine-tuned based on verified computer code samples. 18. A non-transitory computer-readable medium including instructions that are executable by one or more processors to perform operations comprising: receiving a docstring representing natural language text specifying a digital programming result; generating, using a trained machine learning model and based on the docstring, one or more computer code samples configured to produce respective candidate results; causing each of the one or more computer code samples to be executed in a testing environment associated with the trained machine learning model, wherein each of the one or more computer code samples are evaluated based on at least one unit test, the at least one unit test being generated by the machine learning model; identifying, based on a result of the executing in the testing environment, at least one of the computer code samples which produces a particular candidate result associated with the digital programming result; generating, using the trained machine learning model, a natural language text associated with the at least one identified computer code sample; verifying each of the one or more executed computer code samples; and outputting the at least one identified computer code sample and the natural language text associated with the at least one identified computer code sample; wherein: verifying includes computing a functional correctness score for each of the executed one
Program documentation · CPC title
Intelligent editors · CPC title
Combinations of networks · CPC title
Templates · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.