Code completion of custom classes with machine learning

US2019303108A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019303108-A1
Application numberUS-201816207952-A
CountryUS
Kind codeA1
Filing dateDec 3, 2018
Priority dateMar 29, 2018
Publication dateOct 3, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A code completion tool uses machine learning models generated for custom or proprietary classes associated with a custom library of classes of a programming language and for overlapping classes associated with a standard library of classes for the programming language. The machine learning models are trained with features from usage patterns of the custom classes and overlapping classes found in two different sources of training data. An n-order Markov chain model is trained for each custom class and each overlapping class from the usage patterns to generate probabilities to predict a method invocation more likely to follow a sequence of method invocations for a custom class and for an overlapping class.

First claim

Opening claim text (preview).

What is claimed: 1 . A system comprising: at least one processor and a memory; at least one program configured to be executed by the one or more processors, the at least one program including instructions that: train a first sequential model for at least one custom class with features from a plurality of source code programs using methods associated with the at least one custom class, the at least one custom class associated with a custom library of classes for a programming language that differs from a standard library of classes for the programming language, the first sequential model including a sequence of n custom method invocations, at least one custom candidate method succeeding the sequence of n custom method invocations, and a probability that the at least one custom candidate method succeeds the sequence of n custom method invocations; train a second sequential model for at least one overlapping class with features from a plurality of source code programs using methods associated with the at least one overlapping class, the at least one overlapping class associated with a standard library of classes for the programming language that are present in the custom library, the second sequential model including a sequence of n overlapping method invocations, at least one overlapping candidate method succeeding the sequence of n overlapping method invocations, and a probability that the at least one overlapping candidate method succeeds the sequence of n overlapping method invocations; and employ the first sequential model and the second sequential model in a code completion system to provide at least one candidate to complete a method invocation for a source code program having the custom class. 2 . The system of claim 1 , wherein the first sequential model and the second sequential model are n-order Markov chain models. 3 . The system of claim 1 , wherein the probability that at least one overlapping candidate method succeeds the sequence of n overlapping method invocations is based on a combined frequency derived from training data using only the standard library of classes and training data using the custom library of classes. 4 . The system of claim 1 , wherein the probability that the at least one overlapping candidate method succeeds the sequence of n overlapping method invocations is based on a combined frequency derived from a frequency of the overlapping candidate method in a sequence in training data using only standard classes and a frequency of the overlapping candidate method in a sequence in training data using custom classes. 5 . The system of claim 4 , wherein the frequency of the overlapping candidate method in the sequence in training data using custom classes is weighted by an oversample ratio. 6 . The system of claim 5 , wherein the oversample ratio is based on a ratio of a total frequency of the overlapping candidate method in the sequence in training data using standard classes over the frequency of the overlapping candidate method in the sequence in training data using custom classes. 7 . The system of claim 1 , wherein the probability that a custom candidate method succeeds the sequence of n custom method invocations is based on a frequency of the custom candidate method occurring in a sequence in training data using custom classes over a frequency of other methods in the class occurring after the sequence in training data using custom classes. 8 . The system of claim 1 , wherein the at least one program includes further instructions that rank candidates based on highest probability. 9 . The system of claim 1 , wherein the at least one program includes further instructions that train a third sequential model for at least one standard class with features from a plurality of source code programs using methods associated with the at least one standard class, the first sequential model including a sequence of n standard method invocations, at least one standard candidate method succeeding the sequence of n custom method invocations, and a probability that the at least one standard candidate method succeeds the sequence of n custom method invocations. 10 . The system of claim 9 , wherein the at least one program includes further instructions that employ the third sequential model into the code completion system. 11 . A method, comprising: generating, on a computing device having at least one processor and a memory, at least one n-state sequence of method invocations of a custom class of a programming language, the n-state sequence of method invocations of the custom class associated with at least one custom-class method candidate; assigning the at least one custom-class method candidate with a probability of succeeding the at least one n-state sequence of method invocations of the custom class; generating at least one n-state sequence of method invocations of an overlapping class of the programming language, the n-state sequence of method invocations of the overlapping class associated with at least one overlapping-class method candidate; assigning the at least one overlapping-class method candidate with a probability of succeeding the at least one n-state sequence of method invocations of the overlapping class; and formatting the at least one n-state sequence of method invocations of the overlapping class and the at least one n-state sequence of method invocations of the custom class for use in a code completion system to predict a method to complete a method invocation, wherein the at least one overlapping class is part of a standard library of classes for the programming language, wherein the at least one custom class is part of a custom library of classes for the programming language, wherein the custom class is not part of the standard library of classes. 12 . The method of claim 11 , wherein the probability of the at least one n+1 state overlapping-class method candidate is based on a combined frequency derived from frequencies of methods of the overlapping class in training data that only uses standard classes and from frequencies of methods of the overlapping class in training data uses the custom classes. 13 . The method of claim 12 , wherein the frequencies of methods of the overlapping class in training data the uses custom classes is weighted by an oversample ratio. 14 . The method of claim 13 , wherein the oversample ratio is based on a ratio of a total frequency of the at least one overlapping-class method candidate in training data using standard classes over the frequency of the at least one overlapping-class method candidate in training data using custom classes. 15 . The method of claim 11 , wherein at least one n-state sequence of method invocations of a custom class and the at least one custom-class method candidate are represented by an n-order Markov model. 16 . The method of claim 11 , wherein the at least one n-state sequence of method invocations of an overlapping class and the at least one overlapping-class method candidate are represented by an n-order Markov model. 17 . The method of claim 11 , further comprising: extracting features from a plurality of source code programs using classes from the custom library to generate the at least one n-state sequence of method invocations of the custom class; and extracting features from a plurality of source code programs using classes from the standard library and the custom library to generate the at least one n-state sequence of method invocations of the overlapping class. 18 . The method of claim 11 , further comprising: gene

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06F8/33Primary

    Intelligent editors · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019303108A1 cover?
A code completion tool uses machine learning models generated for custom or proprietary classes associated with a custom library of classes of a programming language and for overlapping classes associated with a standard library of classes for the programming language. The machine learning models are trained with features from usage patterns of the custom classes and overlapping classes found i…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F8/33. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).