Software component defect prediction using classification models that generate hierarchical component classifications

US12061874B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12061874-B2
Application numberUS-202217929267-A
CountryUS
Kind codeB2
Filing dateSep 1, 2022
Priority dateJan 28, 2019
Publication dateAug 13, 2024
Grant dateAug 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for facilitating updates to software programs via machine-learning techniques are disclosed. In an example, an application generates a feature vector from a textual description of a software defect by applying a topic model to the textual description. The application uses the feature vector and one or more machine-learning models configured to predict classifications and sub-classifications of the textual description. The application integrates the classifications and the sub-classifications into a final classification of the textual description that indicates a software component responsible for causing the software defect. The final classification is usable for correcting the software defect.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: accessing training data comprising training pairs, each training pair comprising a software defect description and a corresponding classification of a set of classifications, wherein each classification of the set of classifications comprises one or more of: (i) a component label that indicates a type of software component that caused a defect; or (ii) a sub-component label that indicates a sub-type of the type of software component that caused the defect; generating a first subset of training data comprising a first set of training pairs, wherein each training pair of the first set of training pairs includes a classification for which a component label is available and sub-component labels are removed; training a first classification model with a first set of feature vectors generated from the first subset of training data to obtain a first set of learned parameters for the first classification model; generating a second subset of training data comprising a second set of training pairs, wherein each training pair of the second set of training pairs includes a classification for which a sub-component label is available; training a second classification model with a second set of feature vectors generated from the second subset of training data to obtain a second set of learned parameters for the second classification model, wherein the second set of learned parameters are distinct from the first set of learned parameters; generating a set of classification by applying the first classification model to a feature vector of a textual description of a software defect and a set of sub-classification by applying the second classification mode to the feature vector; integrating the set of classification and the set of sub-classifications into a final classification; identifying one or more software components that caused the software defect based on the final classification; and modifying the one or more software components that caused the software defect to correct the software defect. 2. The method of claim 1 , wherein: training the first classification model comprises, iteratively, for each training pair in the first subset of training data: generating a first feature vector for the training pair; obtaining a first classification by applying the first classification model to the first feature vector; and adjusting internal parameters of the first classification model to minimize a first loss generated by a first loss function associated with the first classification model, and wherein training the second classification model comprises, iteratively, for each training pair in the second subset of training data: generating a second feature vector for the training pair; obtaining a second classification by applying the second classification model to the second feature vector; and adjusting internal parameters of the second classification model to minimize a second loss generated by a second loss function associated with the second classification model. 3. The method of claim 1 , further comprising: normalizing the training data over time or classification to generate normalized training data, wherein the first subset of training data and the second subset of training data are generated from the normalized training data. 4. The method of claim 3 , wherein each training pair of the training data includes a timestamp, and normalizing the training data over time comprises: determining, for a particular classification of set of classifications, a non-uniformity of a distribution of the particular classification over time; and removing one or more training pairs of the training data to increase uniformity of the distribution of the particular classification. 5. The method of claim 3 , wherein normalizing the training data by classification comprises: determining that a difference between a first number of training pairs of a first subset of training data that correspond to a first classification and a second number of training pairs of a second subset of training data pairs that correspond to a second classification is greater than a first threshold; and adjusting the training pairs of the first subset of training data such that the difference is below a second threshold. 6. The method of claim 1 , further comprising: accessing validation data comprising validation pairs, each validation pair comprising a textual description of a software defect and a corresponding classification; generating a feature vector from a textual description of a validation pair of the validation data; determining a set of classifications by providing the feature vector to the first classification model, each classification of the set of classifications comprising a probability that the textual description is represented by the respective classification; determining a set of sub-classifications by providing the feature vector to a second classification model, each sub-classification comprising a probability that the textual description is represented by the respective sub-classification, wherein the set of sub-classifications is associated with a particular classification of the classifications; and calculating a weight that reflects a relative accuracy of the set of classifications and the set of sub-classifications. 7. The method of claim 1 , further comprising: constructing a language model by identifying a set of words and associated frequencies of occurrence in software defect descriptions of the training pairs; and constructing, from the language model, a topic model comprising a set of topics derived from the language model, wherein the first set of feature vectors are generated by applying the topic model to the software defect descriptions of the first subset of training data, and wherein the second set of feature vectors are generated by applying the topic model to the software defect descriptions of the second subset of training data. 8. The method of claim 7 , wherein determining the set of topics comprises applying Latent Dirichlet Allocation to the training data. 9. A system comprising: one or more processors; and a memory having stored thereon instructions that, upon execution by the one or more processors, cause the one or more processors to perform operations comprising: accessing training data comprising training pairs, each training pair comprising a software defection description and a corresponding classification of a set of classifications, wherein each classification of the set of classifications comprises one or more of: (i) a component label that indicates a type of software component that caused the a defect; or (ii) a sub-component label that indicates a sub-type of the type of software component that caused the defect; generating a first subset of training data comprising a first set of training pairs, wherein each training pair of the first set of training pairs includes a classification for which a component label is available and sub-component labels are removed; training a first classification model with a first set of feature vectors generated from the first subset of training data to obtain a first set of learned parameters for the first classification model; generating a second subset of training data comprising a second set of training pairs, wherein each training pair of the second set of training pairs includes a classification for which a sub-component label is available; training a second classification model with a second set of feature vectors generated from the second subset of training data to obtain a second set of learned parameters for the second classification model, wherein the second set of learned parameters ar

Assignees

Inventors

Classifications

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Classification techniques · CPC title

  • Statistical methods, e.g. probability models · CPC title

  • Machine learning · CPC title

  • Software maintenance or management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12061874B2 cover?
Systems and methods for facilitating updates to software programs via machine-learning techniques are disclosed. In an example, an application generates a feature vector from a textual description of a software defect by applying a topic model to the textual description. The application uses the feature vector and one or more machine-learning models configured to predict classifications and sub…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).