Source code generation, completion, checking, correction

US9928040B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9928040-B2
Application numberUS-201414190221-A
CountryUS
Kind codeB2
Filing dateFeb 26, 2014
Priority dateNov 12, 2013
Publication dateMar 27, 2018
Grant dateMar 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Automated generation, or completion, or checking of source code is described whereby a probabilistic model having been trained using a corpus of natural source code examples is used. In various examples the probabilistic model comprises probability distributions describing belief about structure of natural source code and takes into account source code analysis from a compiler or other source code analyzer. In various examples, source code analysis may comprise syntactic structure, type information and other data about source code. In various examples, the trained probabilistic model is used to predict sequences of source code elements. For example, to generate source code, to auto-complete source code, to error check source code, to error correct source code or for other purposes.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer implemented method comprising: accessing, from a memory, a trained probabilistic model of natural source code written in a programming language, the trained probabilistic model arranged to take into account source code analysis output of an analyzer operable with the programming language; operating the analyzer to obtain source code analysis of at least part of a sequence of source code elements, the source code analysis comprising a graph of syntactic structure of the sequence of source code elements; operating the analyzer to determine which variables are in scope at any point in the program, wherein: scope is represented as a set of feature vectors; at least one feature vector comprises a string identifier corresponding to a feature vector variable; the at least one feature vector further comprises a data related to at least one of: how recently the feature vector variable was declared, or how recently the feature vector variable was assigned; and the determination comprises determining whether a string identifier of a given variable is the same as a string identifier corresponding to a feature vector variable in the set of scope feature vector variables; and at a processor: calculating the trained probabilistic model and the source code analysis, one or more predictions of elements of the sequence of source code elements; identifying source code errors by comparing the one or more predicted elements with source code previously generated; and correcting the identified source code errors using the one or more predicted elements. 2. A method as claimed in claim 1 comprising calculating the predictions by using the source code analysis to add to or reduce a number of possible source code elements from which the predictions are calculated. 3. A method as claimed in claim 1 comprising adding the one or more predicted elements to the sequence of source code elements. 4. A method as claimed in claim 1 wherein the source code is previously generated by a human. 5. A method as claimed in claim 4 comprising generating graphical user interface output suggesting corrections to the identified source code errors on the basis of the one or more predicted elements. 6. A method as claimed in claim 4 comprising automatically correcting the identified source code errors using the one or more predicted elements. 7. A method as claimed in claim 1 comprising generating a graphical user interface display suggesting the one or more predicted elements to a user and providing certainty information associated with the predicted elements. 8. A method as claimed in claim 1 wherein the source code analysis comprises any one or more of: static analysis of source code, extracting syntactic elements from source code, type checking source code, associating variables in source code with definitions of the variables in the source code, associating functions in source code with definitions of the functions in the source code, carrying out data flow analysis of source code, carrying out dependence analysis of source code, carrying out alias analysis of source code, carrying out pointer analysis of source code, carrying out escape analysis of source code. 9. A method as claimed in claim 1 wherein the graph is any of: a control flow graph, a flat sequence of node, a tree. 10. A method as claimed in claim 1 wherein the graph comprises a plurality of nodes populated with data from an abstract syntax tree calculated by the compiler. 11. A method as claimed in claim 10 wherein at least some of the nodes are annotated with a type of an expression associated with the node. 12. A method as claimed in claim 10 , the trained probabilistic model comprising a number of parameters, the number of parameters being less than a number of parameters fully describing the output from the source code analysis. 13. A method as claimed in claim 10 , the trained probabilistic model comprising at least a probability distribution over child nodes of the graph conditioned on associated parent nodes of the graph. 14. A method as claimed in claim 1 , the trained probabilistic model comprising at least a probability distribution over latent variables that evolve sequentially over an ordering of the source code elements. 15. One or more device-readable computer storage media comprising: device-executable instructions to access a corpus of examples of natural source code written in a programming language; device-executable instructions to access source code analysis for the examples from an analyzer operable with the programming language, the source code analysis comprising a graph of syntactic structure of the sequence of source code elements and an indication of which variables are in scope at any point in the program, wherein: scope is represented as a set of feature vectors; at least one feature vector comprises a string identifier corresponding to a feature vector variable; the at least one feature vector further comprises data related to at least one of: how recently the feature vector variable was declared, or how recently the feature vector variable was assigned; and the determination comprises determining whether a string identifier of a given variable is the same as a string identifier corresponding to a feature vector variable in the variable was assigned; device-executable instructions to calculate, from the corpus of examples and the source code analysis, one or more predictions of elements of the sequence of source code elements; device-executable instructions to identify source code errors by comparing the one or more predicted elements with source code previously generated; and device-executable instructions to correct the identified source code errors using the one or more predicted elements. 16. One or more device-readable storage media as claimed in claim 15 wherein the corpus of examples is from a programmer such that the probabilistic model is trained to learn a programming style of the programmer. 17. A computing apparatus comprising: a memory storing a trained probabilistic model of natural source code written in a programming language, the trained probabilistic model arranged to take into account source code analysis output of an analyzer; and a processor coupled to the memory and configured to execute the analyzer to compute analysis of at least part of a sequence of source code elements, the source code analysis comprising an abstract syntax tree of the sequence of source code elements, the processor executing the analyzer further configured to determine which variables are in scope at any point in the program, wherein: scope is represented as a set of feature vectors; at least one feature vector comprises a string identifier corresponding to a feature vector variable; and the determination comprises determining whether a string identifier of a given variable is the same as a string identifier corresponding to a feature vector variable in the set of scope feature vector variables; and the processor further configured to: calculate, from the trained probabilistic model and the source code analysis, one or more predictions of elements of the sequence of source code elements, the calculation comprising a depth-first traversal of the abstract syntax tree to produce a sequence of internal nodes, traversal variables, and tokens; identify source code errors by comparing the one or more predicted elements with source code previously generated; and correct the identified source code errors using the one or more predicted elements.

Assignees

Inventors

Classifications

  • Analysis of software for verifying properties of programs (testing of software G06F11/3668) · CPC title

  • Structural analysis for program understanding · CPC title

  • G06F8/30Primary

    Creation or generation of source code · CPC title

  • G06F8/34Primary

    Graphical or visual programming · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9928040B2 cover?
Automated generation, or completion, or checking of source code is described whereby a probabilistic model having been trained using a corpus of natural source code examples is used. In various examples the probabilistic model comprises probability distributions describing belief about structure of natural source code and takes into account source code analysis from a compiler or other source c…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F8/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).