What technology area does this patent fall under?

Primary CPC classification G06F8/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for generating natural language using language models trained on computer code

US12008341B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12008341-B2
Application number	US-202318321921-A
Country	US
Kind code	B2
Filing date	May 23, 2023
Priority date	Jul 14, 2022
Publication date	Jun 11, 2024
Grant date	Jun 11, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are methods, systems, and computer-readable media for generating natural language based on computer code input. In an embodiment, a method may comprise one or more of: accessing a docstring generation model configured to generate docstrings from computer code; receiving one or more computer code samples; generating, using the docstring generation model and based on the received one or more computer code samples, one or more candidate docstrings representing natural language text, each of the one or more candidate docstrings being associated with at least a portion of the one or more computer code samples; identifying at least one of the one or more candidate docstrings that provides an intent of the at least a portion of the one or more computer code samples; and/or outputting, via a user interface, the at least one identified docstring with the at least a portion of the one or more computer code samples.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: training a machine learning model to generate natural language docstrings from computer code; receiving one or more computer code samples at the trained machine learning model; generating, via the trained machine learning model and based on the received one or more computer code samples, one or more candidate natural language docstrings representing natural language text, each of the one or more candidate natural language docstrings being associated with at least a portion of the one or more computer code samples; identifying at least one of the one or more candidate natural language docstrings that provides an intent of the at least a portion of the one or more computer code samples; outputting from the trained machine learning model the at least one identified natural language docstring with the at least a portion of the one or more computer code samples; and receiving, at the machine learning model, a selection of the one or more computer code samples, wherein the machine learning model provides an automatic description of the selection and generates a template for building an additional machine learning model. 2. The method of claim 1 , wherein the machine learning model further generates a similarity between the intent and an additional natural language text. 3. The method of claim 2 , wherein the machine learning model is further trained using the outputted at least one identified natural language docstring in association with the at least a portion of the one or more computer code samples. 4. The method of claim 1 , wherein the machine learning model is trained using concatenated strings, each concatenated string comprising at least two of a function signature, a reference solution, or a docstring. 5. The method of claim 1 , wherein identifying at least one of the one or more candidate natural language docstrings is based on a correctness score computed for at least one natural language candidate docstring, the correctness score indicating a proportion of correctly classified instances compared to a total number of candidate natural language docstrings generated. 6. The method of claim 1 , further comprising verifying each of the one or more natural language candidate docstrings, wherein verifying includes determining a correctness score for each of the one or more candidate natural language docstrings, wherein the identifying at least one of the one or more candidate natural language docstrings is based on the determined correctness score. 7. The method of claim 6 , wherein the machine learning model is fine-tuned based on verified candidate natural language docstrings. 8. The method of claim 6 , further comprising ranking the one or more candidate natural language docstrings based on the determined correctness score, wherein identifying one of the one or more candidate natural language docstrings is based on selecting a top-k candidate natural language docstring. 9. The method of claim 1 , wherein the at least one identified natural language docstring and the at least a portion of the one or more computer code samples are output via an application programming interface (API). 10. The method of claim 1 , wherein the trained machine learning model has between 10 billion and 14 billion parameters. 11. The method of claim 1 , wherein the trained machine learning model comprises a plurality of layers, at least one of the layers having a transformer decoder architecture. 12. The method of claim 1 , wherein the machine learning model further suggests a change to improve existing code within the received one or more computer code samples. 13. The method of claim 1 , wherein the machine learning model is fine-tuned based on at least one public web source or software repository. 14. The method of claim 13 , wherein the machine learning model is fine-tuned based on a set of training data constructed from examples within the at least one public web source or software repository. 15. The method of claim 1 , wherein identifying at least one of the one or more candidate natural language docstrings is further based on a mean-log probability. 16. The method of claim 1 , wherein the machine learning model is developed by applying training data comprising annotated computer code to a precursor model comprising a machine learning model trained on natural language prom pts. 17. The method of claim 1 , further comprising training a machine learning model used for generating computer code based on natural language input using training data comprising the outputted at least one identified natural language docstring in association with the at least a portion of the one or more computer code samples. 18. A system comprising: at least one memory storing instructions; at least one processor configured to execute the instructions to perform operations comprising: training a machine learning model to generate natural language docstrings from computer code; receiving one or more computer code samples at the trained machine learning model; generating, via the trained machine learning model and based on the received one or more computer code samples, one or more candidate natural language docstrings representing natural language text, each of the one or more candidate natural language docstrings being associated with at least a portion of the one or more computer code samples; identifying at least one of the one or more candidate natural language docstrings that provides an intent of the at least a portion of the one or more computer code samples; outputting from the trained machine learning model the at least one identified natural language docstring with the at least a portion of the one or more computer code samples; and receiving, at the machine learning model, a selection of the one or more computer code samples, wherein the machine learning model provides an automatic description of the selection and generates a template for building an additional machine learning model. 19. A non-transitory computer-readable medium including instructions that are executable by one or more processors to perform operations comprising: training a machine learning model to generate natural language docstrings from computer code; receiving one or more computer code samples at the trained machine learning model; generating, via the trained machine learning model and based on the received one or more computer code samples, one or more candidate natural language docstrings representing natural language text, each of the one or more candidate natural language docstrings being associated with at least a portion of the one or more computer code samples; identifying at least one of the one or more candidate natural language docstrings that provides an intent of the at least a portion of the one or more computer code samples; outputting from the trained machine learning model the at least one identified natural language docstring with the at least a portion of the one or more computer code samples; and receiving, at the machine learning model, a selection of the one or more computer code samples, wherein the machine learning model provides an automatic description of the selection and generates a template for building an additional machine learning model.

Assignees

Openai Opco Llc

Inventors

Classifications

G06F8/73
Program documentation · CPC title
G06F8/33
Intelligent editors · CPC title
G06N3/045
Combinations of networks · CPC title
G06F40/186
Templates · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title

Patent family

Related publications grouped by family.

View patent family 89509811

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12008341B2 cover?: Disclosed herein are methods, systems, and computer-readable media for generating natural language based on computer code input. In an embodiment, a method may comprise one or more of: accessing a docstring generation model configured to generate docstrings from computer code; receiving one or more computer code samples; generating, using the docstring generation model and based on the received…
Who is the assignee on this patent?: Openai Opco Llc
What technology area does this patent fall under?: Primary CPC classification G06F8/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Type inference in dynamic languages

Utilizing machine learning models for automated software code modification

Source code generation using code templates with neural transformers

Neural method completion based on natural language and source code

Automatic generation of code documentation

Frequently asked questions