Systems and methods for classification of software defect reports

US10664696B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10664696-B2
Application numberUS-201815934855-A
CountryUS
Kind codeB2
Filing dateMar 23, 2018
Priority dateApr 19, 2017
Publication dateMay 26, 2020
Grant dateMay 26, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Existing software defect text categorization approaches are based on use of supervised/semi-supervised machine learning techniques, which may require significant amount of labeled training data for each class in order to train the classifier model leading to significant amount of human effort, resulting in an expensive process. Embodiments of the present disclosure provide systems and methods for circumventing the problem of dependency on labeled training data and features derived from source code by performing concept based classification of software defect reports. In the present disclosure, semantic similarity between the defect category/type labels and the software defect report(s) is computed and represented in a concept space spanned by corpus of documents obtained from one or more knowledge bases, and distribution of similarity values are obtained. These similarity values are compared with a dynamically generated threshold, and based on the comparison, the software defect reports are classified into software defect categories.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor implemented method, comprising: obtaining, by one or more hardware processors, input data comprising (a) one or more software defect reports; (b) one or more software defect categories, each software defect category from the one or more software defect categories comprising a class label and associated textual description thereof, and (c) a corpus of documents ( 302 ); segmenting, by the one or more hardware processors, input text of the one or more software defect reports into one or more segments based on the input data, each of the one or more segments comprises text content ( 304 ); performing analysis on the corpus of documents obtained from one or more knowledge bases to identify a subset of relevant documents from the corpus of documents ( 306 ); generating a concept space based on the identified subset of relevant documents ( 308 ); projecting into the concept-space, by the one or more hardware processors, text content of the one or more segments pertaining to at least one of the one or more software defect reports and the textual description of the one or more software defect categories to generate a concept-space representation for each of the one or more software defect reports and the one or more software defect categories ( 310 ); computing, by the one or more hardware processors, one or more similarities between the concept-space representation of each of the one or more software defect reports and each of the one or more software defect categories to obtain distribution of one or more similarity values specific to the one or more software defect reports to be classified ( 312 ); performing, by the one or more hardware processors, a comparison of distribution of the one or more similarity values with a dynamically generated threshold ( 314 ); and classifying by the one or more hardware processors, the one or more software defect reports into the one or more software defect categories based on the comparison ( 316 ). 2. The processor implemented method as claimed in claim 1 , wherein the step of performing analysis comprises applying one or more document identification techniques on the corpus of documents obtained from the one or more knowledge bases to identify the subset of relevant documents. 3. The processor implemented method as claimed in claim 2 , wherein the one or more document identification techniques comprises at least one of one or more graph-theoretic analysis, one or more keyword identification techniques and one or more text clustering techniques. 4. The processor implemented method as claimed in claim 1 , wherein when the one or more similarity values are higher than the dynamically generated threshold, the one or more software defect reports are classified into the one or more software defect categories. 5. The processor implemented method as claimed in claim 1 , wherein the dynamically generated threshold is based on the distribution of the one or more similarity values. 6. A system ( 100 ) comprising: a memory ( 102 ) storing instructions and one or more modules ( 108 ); one or more communication interfaces ( 106 ); and one or more hardware processors ( 104 ) coupled to the memory ( 102 ) via the one or more communication interfaces ( 106 ), wherein the one or more hardware processors ( 104 ) are configured by the instructions to execute the one or more modules ( 108 ) comprising: an input reader module ( 202 ) that is configured to: obtain input data comprising (a) one or more software defect reports; (b) each software defect category from the one or more software defect categories comprising a class label and associated textual description thereof, and (c) a corpus of documents; a software defect report text segmentation module ( 204 ) that is configured segment input text of the one or more software defect reports into one or more segments based on the input data, each of the one or more segments comprises text content; a concept-space creation module ( 206 ) that is configured to: perform analysis on the corpus of documents obtained from one or more knowledge bases to identify a subset of relevant documents from the corpus of documents, and generate a concept space based on the identified subset of relevant documents; a projection module ( 208 ) that is configured to project into the concept-space, text content of the one or more segments pertaining to at least one of the one or more software defect reports and the textual description of the one or more software defect categories to generate a concept-space representation for each of the one or more software defect reports and the one or more software defect categories; a concept-space similarity computation module ( 210 ) that is configured to compute one or more similarities between the concept-space representation of each of the one or more software defect reports and each of the one or more software defect categories to obtain distribution of one or more similarity values specific to the one or more software defect reports to be classified; and a software defect classification module ( 212 ) that is configured to: perform a comparison of distribution of the one or more similarity values with a dynamically generated threshold, and classify the one or more software defect reports into the one or more software defect categories based on the comparison. 7. The system as claimed in claim 6 , wherein the concept-space creation module ( 206 ) performs the analysis by applying one or more document identification techniques on the corpus of documents obtained from the one or more knowledge bases to identify the subset of relevant documents. 8. The system as claimed in claim 7 , wherein the one or more document identification techniques comprises at least one of one or more graph-theoretic analysis, one or more keyword identification techniques and one or more text clustering techniques. 9. The system as claimed in claim 6 , wherein when the one or more similarity values are higher than the dynamically generated threshold, the one or more software defect reports are classified into the one or more software defect categories. 10. The system as claimed in claim 6 , wherein the dynamically generated threshold is based on the distribution of the one or more similarity values. 11. One or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes: obtaining, by the one or more hardware processors, input data comprising (a) one or more software defect reports; (b) one or more software defect categories, each software defect category from the one or more software defect categories comprising a class label and associated textual description thereof, and (c) a corpus of documents; segmenting, by the one or more hardware processors, input text of the one or more software defect reports into one or more segments based on the input data, each of the one or more segments comprises text content; performing analysis on the corpus of documents obtained from one or more knowledge bases to identify a subset of relevant documents from the corpus of documents; generating a concept space based on the identified subset of relevant documents; projecting into the concept-space, by the one or more hardware processors, text content of the one or more segments pertaining to at least one of the one or more software defect reports and the textual description of the one or more software defect categories to generate a concept-space representation for each of the one or more software defect reports and the one or more software defect categories; computing, by the one or more hardware proce

Assignees

Inventors

Classifications

  • Creation or modification of classes or clusters · CPC title

  • Document management systems · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Physics · mapped topic

  • G06F40/30Primary

    Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10664696B2 cover?
Existing software defect text categorization approaches are based on use of supervised/semi-supervised machine learning techniques, which may require significant amount of labeled training data for each class in order to train the classifier model leading to significant amount of human effort, resulting in an expensive process. Embodiments of the present disclosure provide systems and methods f…
Who is the assignee on this patent?
Tata Consultancy Services Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 26 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).