What technology area does this patent fall under?

Primary CPC classification G06F11/3608. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 17 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Representing source code in vector space to detect errors

US11334467B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11334467-B2
Application number	US-201916402965-A
Country	US
Kind code	B2
Filing date	May 3, 2019
Priority date	May 3, 2019
Publication date	May 17, 2022
Grant date	May 17, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method, system and computer program product for representing source code in vector space. The source code is parsed into an abstract syntax tree, which is then traversed to produce a sequence of tokens. Token embeddings may then be constructed for a subset of the sequence of tokens, which are inputted into an encoder artificial neural network (“encoder”) for encoding the token embeddings. A decoder artificial neural network (“decoder”) is initialized with a final internal cell state of the encoder. The decoder is run the same number of steps as the encoding performed by the encoder. After running the decoder and completing the training of the decoder to learn the inputted token embeddings, the final internal cell state of the encoder is used as the code representation vector which may be used to detect errors in the source code.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for representing source code in vector space, the method comprising: parsing source code into an abstract syntax tree; traversing said abstract syntax tree to produce a sequence of tokens; constructing token embeddings for a subset of said sequence of tokens; inputting said token embeddings into an encoder artificial neural network for encoding said token embeddings; initializing a decoder artificial neural network with a final internal cell state of said encoder artificial neural network when encoding said token embeddings; running said decoder artificial neural network a same number of steps as encoding performed by said encoder artificial neural network; using said final internal cell state of said encoder artificial neural network as a code representation vector in response to completing said running of said decoder artificial neural network; and using said code representation vector to detect errors in said source code. 2. The computer-implemented method as recited in claim 1 , wherein said abstract syntax tree is traversed using a depth-first traversal. 3. The computer-implemented method as recited in claim 1 , wherein said abstract syntax tree is traversed using a structure-based traversal. 4. The computer-implemented method as recited in claim 1 further comprising: constructing a list of frequently occurring tokens found in said abstract syntax tree; and removing tokens from said sequence of tokens with a frequency below a frequency threshold to form said subset of said sequence of tokens. 5. The computer-implemented method as recited in claim 1 , wherein said token embeddings are randomly constructed. 6. The computer-implemented method as recited in claim 1 , wherein pretrained embeddings are used to construct said token embeddings. 7. The computer-implemented method as recited in claim 1 further comprising: computing a loss function based on a quality of reconstruction from running said decoder artificial neural network; updating internal parameters of said encoder artificial neural network and said decoder artificial neural network based on said computed loss function; and using said final internal cell state of said encoder artificial neural network as said code representation vector in response to completing said running of said decoder artificial neural network and in response to convergence of said updated internal parameters of said encoder artificial neural network and said decoder artificial neural network. 8. The computer-implemented method as recited in claim 1 , wherein said artificial neural network is a recurrent neural network. 9. A computer program product for representing source code in vector space, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code comprising the programming instructions for: parsing source code into an abstract syntax tree; traversing said abstract syntax tree to produce a sequence of tokens; constructing token embeddings for a subset of said sequence of tokens; inputting said token embeddings into an encoder artificial neural network for encoding said token embeddings; initializing a decoder artificial neural network with a final internal cell state of said encoder artificial neural network when encoding said token embeddings; running said decoder artificial neural network a same number of steps as encoding performed by said encoder artificial neural network; using said final internal cell state of said encoder artificial neural network as a code representation vector in response to completing said running of said decoder artificial neural network; and using said code representation vector to detect errors in said source code. 10. The computer program product as recited in claim 9 , wherein said abstract syntax tree is traversed using a depth-first traversal. 11. The computer program product as recited in claim 9 , wherein said abstract syntax tree is traversed using a structure-based traversal. 12. The computer program product as recited in claim 9 , wherein the program code further comprises the programming instructions for: constructing a list of frequently occurring tokens found in said abstract syntax tree; and removing tokens from said sequence of tokens with a frequency below a frequency threshold to form said subset of said sequence of tokens. 13. The computer program product as recited in claim 9 , wherein said token embeddings are randomly constructed. 14. The computer program product as recited in claim 9 , wherein pretrained embeddings are used to construct said token embeddings. 15. The computer program product as recited in claim 9 , wherein the program code further comprises the programming instructions for: computing a loss function based on a quality of reconstruction from running said decoder artificial neural network; updating internal parameters of said encoder artificial neural network and said decoder artificial neural network based on said computed loss function; and using said final internal cell state of said encoder artificial neural network as said code representation vector in response to completing said running of said decoder artificial neural network and in response to convergence of said updated internal parameters of said encoder artificial neural network and said decoder artificial neural network. 16. The computer program product as recited in claim 9 , wherein said artificial neural network is a recurrent neural network. 17. A system, comprising: a memory for storing a computer program for representing source code in vector space; and a processor connected to said memory, wherein said processor is configured to execute the program instructions of the computer program comprising: parsing source code into an abstract syntax tree; traversing said abstract syntax tree to produce a sequence of tokens; constructing token embeddings for a subset of said sequence of tokens; inputting said token embeddings into an encoder artificial neural network for encoding said token embeddings; initializing a decoder artificial neural network with a final internal cell state of said encoder artificial neural network when encoding said token embeddings; running said decoder artificial neural network a same number of steps as encoding performed by said encoder artificial neural network; using said final internal cell state of said encoder artificial neural network as a code representation vector in response to completing said running of said decoder artificial neural network; and using said code representation vector to detect errors in said source code. 18. The system as recited in claim 17 , wherein said abstract syntax tree is traversed using a depth-first traversal. 19. The system as recited in claim 17 , wherein said abstract syntax tree is traversed using a structure-based traversal. 20. The system as recited in claim 17 , wherein the program instructions of the computer program further comprise: constructing a list of frequently occurring tokens found in said abstract syntax tree; and removing tokens from said sequence of tokens with a frequency below a frequency threshold to form said subset of said sequence of tokens.

Assignees

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 73016510

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11334467B2 cover?: A computer-implemented method, system and computer program product for representing source code in vector space. The source code is parsed into an abstract syntax tree, which is then traversed to produce a sequence of tokens. Token embeddings may then be constructed for a subset of the sequence of tokens, which are inputted into an encoder artificial neural network (“encoder”) for encoding the …
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F11/3608. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 17 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Identification and traceability of application programming interface (API) functionality in a distributed computing environment

Source code bug prediction

Quasi-recurrent neural network based encoder-decoder model

Parallel compilation of software application

Obtaining correct compile results by absorbing mismatches between data types representations

Mining application repositories

Frequently asked questions