Software development using re-usable software components
US-2017177310-A1 · Jun 22, 2017 · US
US10664696B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10664696-B2 |
| Application number | US-201815934855-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 23, 2018 |
| Priority date | Apr 19, 2017 |
| Publication date | May 26, 2020 |
| Grant date | May 26, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Existing software defect text categorization approaches are based on use of supervised/semi-supervised machine learning techniques, which may require significant amount of labeled training data for each class in order to train the classifier model leading to significant amount of human effort, resulting in an expensive process. Embodiments of the present disclosure provide systems and methods for circumventing the problem of dependency on labeled training data and features derived from source code by performing concept based classification of software defect reports. In the present disclosure, semantic similarity between the defect category/type labels and the software defect report(s) is computed and represented in a concept space spanned by corpus of documents obtained from one or more knowledge bases, and distribution of similarity values are obtained. These similarity values are compared with a dynamically generated threshold, and based on the comparison, the software defect reports are classified into software defect categories.
Opening claim text (preview).
What is claimed is: 1. A processor implemented method, comprising: obtaining, by one or more hardware processors, input data comprising (a) one or more software defect reports; (b) one or more software defect categories, each software defect category from the one or more software defect categories comprising a class label and associated textual description thereof, and (c) a corpus of documents ( 302 ); segmenting, by the one or more hardware processors, input text of the one or more software defect reports into one or more segments based on the input data, each of the one or more segments comprises text content ( 304 ); performing analysis on the corpus of documents obtained from one or more knowledge bases to identify a subset of relevant documents from the corpus of documents ( 306 ); generating a concept space based on the identified subset of relevant documents ( 308 ); projecting into the concept-space, by the one or more hardware processors, text content of the one or more segments pertaining to at least one of the one or more software defect reports and the textual description of the one or more software defect categories to generate a concept-space representation for each of the one or more software defect reports and the one or more software defect categories ( 310 ); computing, by the one or more hardware processors, one or more similarities between the concept-space representation of each of the one or more software defect reports and each of the one or more software defect categories to obtain distribution of one or more similarity values specific to the one or more software defect reports to be classified ( 312 ); performing, by the one or more hardware processors, a comparison of distribution of the one or more similarity values with a dynamically generated threshold ( 314 ); and classifying by the one or more hardware processors, the one or more software defect reports into the one or more software defect categories based on the comparison ( 316 ). 2. The processor implemented method as claimed in claim 1 , wherein the step of performing analysis comprises applying one or more document identification techniques on the corpus of documents obtained from the one or more knowledge bases to identify the subset of relevant documents. 3. The processor implemented method as claimed in claim 2 , wherein the one or more document identification techniques comprises at least one of one or more graph-theoretic analysis, one or more keyword identification techniques and one or more text clustering techniques. 4. The processor implemented method as claimed in claim 1 , wherein when the one or more similarity values are higher than the dynamically generated threshold, the one or more software defect reports are classified into the one or more software defect categories. 5. The processor implemented method as claimed in claim 1 , wherein the dynamically generated threshold is based on the distribution of the one or more similarity values. 6. A system ( 100 ) comprising: a memory ( 102 ) storing instructions and one or more modules ( 108 ); one or more communication interfaces ( 106 ); and one or more hardware processors ( 104 ) coupled to the memory ( 102 ) via the one or more communication interfaces ( 106 ), wherein the one or more hardware processors ( 104 ) are configured by the instructions to execute the one or more modules ( 108 ) comprising: an input reader module ( 202 ) that is configured to: obtain input data comprising (a) one or more software defect reports; (b) each software defect category from the one or more software defect categories comprising a class label and associated textual description thereof, and (c) a corpus of documents; a software defect report text segmentation module ( 204 ) that is configured segment input text of the one or more software defect reports into one or more segments based on the input data, each of the one or more segments comprises text content; a concept-space creation module ( 206 ) that is configured to: perform analysis on the corpus of documents obtained from one or more knowledge bases to identify a subset of relevant documents from the corpus of documents, and generate a concept space based on the identified subset of relevant documents; a projection module ( 208 ) that is configured to project into the concept-space, text content of the one or more segments pertaining to at least one of the one or more software defect reports and the textual description of the one or more software defect categories to generate a concept-space representation for each of the one or more software defect reports and the one or more software defect categories; a concept-space similarity computation module ( 210 ) that is configured to compute one or more similarities between the concept-space representation of each of the one or more software defect reports and each of the one or more software defect categories to obtain distribution of one or more similarity values specific to the one or more software defect reports to be classified; and a software defect classification module ( 212 ) that is configured to: perform a comparison of distribution of the one or more similarity values with a dynamically generated threshold, and classify the one or more software defect reports into the one or more software defect categories based on the comparison. 7. The system as claimed in claim 6 , wherein the concept-space creation module ( 206 ) performs the analysis by applying one or more document identification techniques on the corpus of documents obtained from the one or more knowledge bases to identify the subset of relevant documents. 8. The system as claimed in claim 7 , wherein the one or more document identification techniques comprises at least one of one or more graph-theoretic analysis, one or more keyword identification techniques and one or more text clustering techniques. 9. The system as claimed in claim 6 , wherein when the one or more similarity values are higher than the dynamically generated threshold, the one or more software defect reports are classified into the one or more software defect categories. 10. The system as claimed in claim 6 , wherein the dynamically generated threshold is based on the distribution of the one or more similarity values. 11. One or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes: obtaining, by the one or more hardware processors, input data comprising (a) one or more software defect reports; (b) one or more software defect categories, each software defect category from the one or more software defect categories comprising a class label and associated textual description thereof, and (c) a corpus of documents; segmenting, by the one or more hardware processors, input text of the one or more software defect reports into one or more segments based on the input data, each of the one or more segments comprises text content; performing analysis on the corpus of documents obtained from one or more knowledge bases to identify a subset of relevant documents from the corpus of documents; generating a concept space based on the identified subset of relevant documents; projecting into the concept-space, by the one or more hardware processors, text content of the one or more segments pertaining to at least one of the one or more software defect reports and the textual description of the one or more software defect categories to generate a concept-space representation for each of the one or more software defect reports and the one or more software defect categories; computing, by the one or more hardware proce
Creation or modification of classes or clusters · CPC title
Document management systems · CPC title
Phrasal analysis, e.g. finite state techniques or chunking · CPC title
Physics · mapped topic
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.