Computer System & Method for Detecting Anomalies in Multivariate Data
US-2019122138-A1 · Apr 25, 2019 · US
US12061874B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12061874-B2 |
| Application number | US-202217929267-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 1, 2022 |
| Priority date | Jan 28, 2019 |
| Publication date | Aug 13, 2024 |
| Grant date | Aug 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for facilitating updates to software programs via machine-learning techniques are disclosed. In an example, an application generates a feature vector from a textual description of a software defect by applying a topic model to the textual description. The application uses the feature vector and one or more machine-learning models configured to predict classifications and sub-classifications of the textual description. The application integrates the classifications and the sub-classifications into a final classification of the textual description that indicates a software component responsible for causing the software defect. The final classification is usable for correcting the software defect.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method comprising: accessing training data comprising training pairs, each training pair comprising a software defect description and a corresponding classification of a set of classifications, wherein each classification of the set of classifications comprises one or more of: (i) a component label that indicates a type of software component that caused a defect; or (ii) a sub-component label that indicates a sub-type of the type of software component that caused the defect; generating a first subset of training data comprising a first set of training pairs, wherein each training pair of the first set of training pairs includes a classification for which a component label is available and sub-component labels are removed; training a first classification model with a first set of feature vectors generated from the first subset of training data to obtain a first set of learned parameters for the first classification model; generating a second subset of training data comprising a second set of training pairs, wherein each training pair of the second set of training pairs includes a classification for which a sub-component label is available; training a second classification model with a second set of feature vectors generated from the second subset of training data to obtain a second set of learned parameters for the second classification model, wherein the second set of learned parameters are distinct from the first set of learned parameters; generating a set of classification by applying the first classification model to a feature vector of a textual description of a software defect and a set of sub-classification by applying the second classification mode to the feature vector; integrating the set of classification and the set of sub-classifications into a final classification; identifying one or more software components that caused the software defect based on the final classification; and modifying the one or more software components that caused the software defect to correct the software defect. 2. The method of claim 1 , wherein: training the first classification model comprises, iteratively, for each training pair in the first subset of training data: generating a first feature vector for the training pair; obtaining a first classification by applying the first classification model to the first feature vector; and adjusting internal parameters of the first classification model to minimize a first loss generated by a first loss function associated with the first classification model, and wherein training the second classification model comprises, iteratively, for each training pair in the second subset of training data: generating a second feature vector for the training pair; obtaining a second classification by applying the second classification model to the second feature vector; and adjusting internal parameters of the second classification model to minimize a second loss generated by a second loss function associated with the second classification model. 3. The method of claim 1 , further comprising: normalizing the training data over time or classification to generate normalized training data, wherein the first subset of training data and the second subset of training data are generated from the normalized training data. 4. The method of claim 3 , wherein each training pair of the training data includes a timestamp, and normalizing the training data over time comprises: determining, for a particular classification of set of classifications, a non-uniformity of a distribution of the particular classification over time; and removing one or more training pairs of the training data to increase uniformity of the distribution of the particular classification. 5. The method of claim 3 , wherein normalizing the training data by classification comprises: determining that a difference between a first number of training pairs of a first subset of training data that correspond to a first classification and a second number of training pairs of a second subset of training data pairs that correspond to a second classification is greater than a first threshold; and adjusting the training pairs of the first subset of training data such that the difference is below a second threshold. 6. The method of claim 1 , further comprising: accessing validation data comprising validation pairs, each validation pair comprising a textual description of a software defect and a corresponding classification; generating a feature vector from a textual description of a validation pair of the validation data; determining a set of classifications by providing the feature vector to the first classification model, each classification of the set of classifications comprising a probability that the textual description is represented by the respective classification; determining a set of sub-classifications by providing the feature vector to a second classification model, each sub-classification comprising a probability that the textual description is represented by the respective sub-classification, wherein the set of sub-classifications is associated with a particular classification of the classifications; and calculating a weight that reflects a relative accuracy of the set of classifications and the set of sub-classifications. 7. The method of claim 1 , further comprising: constructing a language model by identifying a set of words and associated frequencies of occurrence in software defect descriptions of the training pairs; and constructing, from the language model, a topic model comprising a set of topics derived from the language model, wherein the first set of feature vectors are generated by applying the topic model to the software defect descriptions of the first subset of training data, and wherein the second set of feature vectors are generated by applying the topic model to the software defect descriptions of the second subset of training data. 8. The method of claim 7 , wherein determining the set of topics comprises applying Latent Dirichlet Allocation to the training data. 9. A system comprising: one or more processors; and a memory having stored thereon instructions that, upon execution by the one or more processors, cause the one or more processors to perform operations comprising: accessing training data comprising training pairs, each training pair comprising a software defection description and a corresponding classification of a set of classifications, wherein each classification of the set of classifications comprises one or more of: (i) a component label that indicates a type of software component that caused the a defect; or (ii) a sub-component label that indicates a sub-type of the type of software component that caused the defect; generating a first subset of training data comprising a first set of training pairs, wherein each training pair of the first set of training pairs includes a classification for which a component label is available and sub-component labels are removed; training a first classification model with a first set of feature vectors generated from the first subset of training data to obtain a first set of learned parameters for the first classification model; generating a second subset of training data comprising a second set of training pairs, wherein each training pair of the second set of training pairs includes a classification for which a sub-component label is available; training a second classification model with a second set of feature vectors generated from the second subset of training data to obtain a second set of learned parameters for the second classification model, wherein the second set of learned parameters ar
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Classification techniques · CPC title
Statistical methods, e.g. probability models · CPC title
Machine learning · CPC title
Software maintenance or management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.