Machine-learned model for duplicate crash dump detection

US12333448B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12333448-B2
Application numberUS-202117188256-A
CountryUS
Kind codeB2
Filing dateMar 1, 2021
Priority dateOct 1, 2020
Publication dateJun 17, 2025
Grant dateJun 17, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an example embodiment, a machine learned model is utilized for identifying duplicate crash dumps. After a developer submits code, corresponding test cases are used to ensure the quality of the software delivery. Test failures can occur during this period, such as crashes, errors, and timeouts. Since it takes time for developers to resolve them, many duplicate failures can occur during this time period. In some embodiments, trash triggering is the most time-consuming task of development, and thus if duplicate crash failures can be automatically identified, the degree of automation will be significantly enhanced. To locate such duplicates, a training-based machine learned model uses component information of an in-memory database system to achieve better crash similarity comparison.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one hardware processor; and a non-transitory computer-readable medium storing instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform operations comprising: obtaining a plurality of crash dumps in a computer system, the crash dumps each comprising a separate file containing information about a crash failure that occurred during operation of a database, the information including one or more function names and one or more parameters; preprocessing the plurality of crash dumps by removing the one or more parameters from each crash dump in the plurality of crash dumps; using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names; adding an indication of each of the identified individual components into the preprocessed plurality of crash dumps; and training a first machine-learned model by feeding the preprocessed plurality of crash dumps and a coefficient indicating a level of component distance into a first machine learning algorithm, the first machine-learned model trained to output a similarity score between two crash dumps using a longest common subsequence from each crash dump, the first machine-learning model basing the similarity score on a coefficient indicating a number of components to consider from a top of a crash dump, component position in the crash dump, and the coefficient indicating the level of component distance, the level of component distance being a measurement of how similar two components are based on similarity of function names of the two components, each component being a software module upon which one or more functions are executed. 2. The system of claim 1 , wherein the coefficient indicating the number of components to consider from the top of a crash dump and the coefficient indicating the level of component distance are learned using a second machine learning algorithm. 3. The system of claim 1 , wherein the database is an in-memory database. 4. The system of claim 1 , wherein the operations further comprise feeding a first and a second crash dump into the first machine-learned model and eliminating either the first or the second crash dump if a similarity score for the first and second crash dumps exceeds a predetermined threshold. 5. The system of claim 1 , wherein the preprocessing further comprises: filtering out a first number of most frequently occurring function names, the first number determined dynamically. 6. The system of claim 1 , wherein the using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names includes using a breadth-first search to extract component-file mappings in layered CMakeLists from the database. 7. The system of claim 6 , wherein the using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names further comprises combining the component-file mappings with file-function mappings obtained from an abstract syntax tree of the database. 8. A method comprising: obtaining a plurality of crash dumps in a computer system, the crash dumps each comprising a separate file containing information about a crash failure that occurred during operation of a database, the information including one or more function names and one or more parameters; preprocessing the plurality of crash dumps by removing the one or more parameters from each crash dump in the plurality of crash dumps; using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names; adding an indication of each of the identified individual components into the preprocessed plurality of crash dumps; and training a first machine-learned model by feeding the preprocessed plurality of crash dumps and a coefficient indicating a level of component distance into a first machine learning algorithm, the first machine-learned model trained to output a similarity score between two crash dumps using a longest common subsequence from each crash dump, the first machine-learning model basing the similarity score on a coefficient indicating a number of components to consider from a top of a crash dump, component position in the crash dump, and the coefficient indicating the level of component distance, the level of component distance being a measurement of how similar two components are based on similarity of function names of the two components, each component being a software module upon which one or more functions are executed. 9. The method of claim 8 , wherein the coefficient indicating the number of components to consider from the top of a crash dump and the coefficient indicating the level of component distance are learned using a second machine learning algorithm. 10. The method of claim 8 , wherein the database is an in-memory database. 11. The method of claim 8 , further comprising feeding a first and a second crash dump into the first machine-learned model and eliminating either the first or the second crash dump if a similarity score for the first and second crash dumps exceeds a predetermined threshold. 12. The method of claim 8 , wherein the preprocessing further comprises: filtering out a first number of most frequently occurring function names, the first number determined dynamically. 13. The method of claim 8 , wherein the using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names includes using a breadth-first search to extract component-file mappings in layered CMakeLists from the database. 14. The method of claim 13 , wherein the using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names further comprises combining the component-file mappings with file-function mappings obtained from an abstract syntax tree of the database. 15. A non-transitory machine-readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: obtaining a plurality of crash dumps in a computer system, the crash dumps each comprising a separate file containing information about a crash failure that occurred during operation of a database, the information including one or more function names and one or more parameters; preprocessing the plurality of crash dumps by removing the one or more parameters from each crash dump in the plurality of crash dumps; using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names; adding an indication of each of the identified individual components into the preprocessed plurality of crash dumps; and training a first machine-learned model by feeding the preprocessed plurality of crash dumps and a coefficient indicating a level of component distance into a first machine learning algorithm, the first machine-learned model trained to output a similarity score between two crash dumps using a longest common subsequence from each crash dump, the first machine-learning model basing the similarity score on a coefficient indicating a number of components to consider from a top of a crash dump, component position in the crash dump, and the

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • G06N5/04Primary

    Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12333448B2 cover?
In an example embodiment, a machine learned model is utilized for identifying duplicate crash dumps. After a developer submits code, corresponding test cases are used to ensure the quality of the software delivery. Test failures can occur during this period, such as crashes, errors, and timeouts. Since it takes time for developers to resolve them, many duplicate failures can occur during this t…
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).