What technology area does this patent fall under?

Primary CPC classification G06F8/75. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Semantic code search based on augmented programming language corpus

US11609748B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11609748-B2
Application number	US-202117161545-A
Country	US
Kind code	B2
Filing date	Jan 28, 2021
Priority date	Jan 28, 2021
Publication date	Mar 21, 2023
Grant date	Mar 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method may include obtaining machine-readable source code. The method may include parsing the source code for one or more code descriptions and identifying a section of the source code corresponding to each of the code descriptions. The method may include determining a description-code pair including a first element representing the code description and a second element representing the section of the source code corresponding to the code description. The method may include generating an augmented programming language corpus based on the description-code pair, the one or more code descriptions, and the source code. The method may include receiving a natural language search query for source-code recommendations, identifying source code from the augmented programming language corpus responsive to the natural language search query, and responding to the natural language search query with the identified source code.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: obtaining, by a processor, machine-readable source code; parsing, by the processor, the source code for one or more code descriptions; identifying, by the processor, a section of the source code corresponding to a code description of the one or more code descriptions; determining, by the processor, a description-code pair, the description-code pair including a first element representing the code description and a second element representing the section of the source code corresponding to the code description; generating, by the processor, an augmented programming language corpus using the description-code pair, the one or more code descriptions, and the source code; training, by the processor, a machine learning model to provide source-code recommendations based on the augmented programming language corpus; receiving, by the processor, a natural language search query for a source-code recommendation; identifying, by the processor using the machine learning model, the source code responsive to the search query; and responding, by the processor, to the natural language search query with the source code identified from the augmented programming language corpus. 2. The method of claim 1 , wherein the one or more code descriptions are code comments and identifying the section of the source code corresponding to a code description of the one or more code descriptions comprises: determining, by the processor, one or more heuristics relating a location of the code comment in a piece of source code to the section of the source code; determining, by the processor, the location of the code comment in the piece of source code; and locating, by the processor, the section of the source code to which the code comment corresponds based on the one or more heuristics and the location of the code comment in the piece of source code. 3. The method of claim 1 , wherein the natural language search query is received via a text-input field in an integrated development environment (IDE), the IDE including an interface for software development. 4. The method of claim 1 , wherein obtaining source code comprises: obtaining, by the processor, a source-code package; parsing, by the processor, the source-code package to identify one or more files, each file of the one or more files including at least a portion of the source code; and parsing, by the processor, the one or more files to identify files written in a target programming language. 5. The method of claim 1 , further comprising: generating, by the processor, a negatively classified example based on the description-code pair; and training, by the processor, the machine learning model to provide source-code recommendations based on the augmented programming language corpus and the negatively classified example. 6. The method of claim 1 , wherein responding to the natural language search query with the source code identified from the augmented programming language corpus comprises: mapping, by the processor, the natural language search query to a search vector; comparing, by the processor, the search vector to each description-code pair; determining, by the processor, a similarity score between the search vector and each description-code pair based on a cosine similarity between the search vector and each description-code pair; and returning, by the processor, the source code corresponding to the description-code pair based on the similarity score between the search vector and each description-code pair. 7. The method of claim 6 , wherein returning the source code corresponding to the description-code pair based on the similarity score between the search vector and each description-code pair comprises: ranking, by the processor, description-code pairs based on the similarity score between the search vector and each description-code pair; and returning, by the processor, one or more pieces of the source code corresponding to the description-code pairs based on the ranking. 8. One or more non-transitory computer-readable storage media configured to store instructions that, in response to being executed, cause a system to perform operations, the operations comprising: obtaining machine-readable source code; parsing the source code for one or more code descriptions; identifying a section of the source code corresponding to a code description of the one or more code descriptions; determining a description-code pair, the description-code pair including a first element representing the code description and a second element representing the section of the source code corresponding to the code description; generating an augmented programming language corpus using the description-code pair, the one or more code descriptions, and the source code; training a machine learning model to provide source-code recommendations based on the augmented programming language corpus; receiving a natural language search query for a source-code recommendation; identifying, by the machine learning model, the source code from the augmented programming language corpus responsive to the natural language search query; and responding to the natural language search query with the source code identified from the augmented programming language corpus. 9. The one or more non-transitory computer-readable storage media of claim 8 , wherein the one or more code descriptions are code comments and identifying the section of the source code corresponding to a code description of the one or more code descriptions comprises: determining one or more heuristics relating a location of the code comment in a piece of source code to the section of the source code; determining the location of the code comment in the piece of source code; and locating the section of the source code to which the code comment corresponds based on the one or more heuristics and the location of the code comment in the piece of source code. 10. The one or more non-transitory computer-readable storage media of claim 8 , wherein the natural language search query is received via a text-input field in an integrated development environment (IDE), the IDE including an interface for software development. 11. The one or more non-transitory computer-readable storage media of claim 8 , wherein obtaining source code comprises: obtaining a source-code package; parsing the source-code package to identify one or more files, each file of the one or more files including at least a portion of the source code; and parsing the one or more files to identify files written in a target programming language. 12. The one or more non-transitory computer-readable storage media of claim 8 , further comprising: generating a negatively classified example based on the description-code pair; and training the machine learning model to provide source-code recommendations based on the augmented programming language corpus and the negatively classified example. 13. The one or more non-transitory computer-readable storage media of claim 8 , wherein responding to the natural language search query with the source code identified from the augmented programming language corpus comprises: mapping the natural language search query to a search vector; comparing the search vector to each description-code pair; determining a similarity score between the search vector and each description-code pair based on a cosine similarity between the search vector and each description-code pair; and returning the source code corresponding to the description-code pair based on the similarity score between the search vector and each description-code pair.

Assignees

Fujitsu Ltd

Inventors

Classifications

G06F8/36
Software reuse · CPC title
G06N20/00
Machine learning · CPC title
G06F8/75Primary
Structural analysis for program understanding · CPC title
G06F8/33Primary
Intelligent editors · CPC title
G06F16/90332
Natural language query formulation or dialogue systems · CPC title

Patent family

Related publications grouped by family.

View patent family 82494156

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11609748B2 cover?: A method may include obtaining machine-readable source code. The method may include parsing the source code for one or more code descriptions and identifying a section of the source code corresponding to each of the code descriptions. The method may include determining a description-code pair including a first element representing the code description and a second element representing the secti…
Who is the assignee on this patent?: Fujitsu Ltd
What technology area does this patent fall under?: Primary CPC classification G06F8/75. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).