What technology area does this patent fall under?

Primary CPC classification G06F16/215. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Machine-learned model for duplicate crash dump detection

US12333448B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12333448-B2
Application number	US-202117188256-A
Country	US
Kind code	B2
Filing date	Mar 1, 2021
Priority date	Oct 1, 2020
Publication date	Jun 17, 2025
Grant date	Jun 17, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an example embodiment, a machine learned model is utilized for identifying duplicate crash dumps. After a developer submits code, corresponding test cases are used to ensure the quality of the software delivery. Test failures can occur during this period, such as crashes, errors, and timeouts. Since it takes time for developers to resolve them, many duplicate failures can occur during this time period. In some embodiments, trash triggering is the most time-consuming task of development, and thus if duplicate crash failures can be automatically identified, the degree of automation will be significantly enhanced. To locate such duplicates, a training-based machine learned model uses component information of an in-memory database system to achieve better crash similarity comparison.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one hardware processor; and a non-transitory computer-readable medium storing instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform operations comprising: obtaining a plurality of crash dumps in a computer system, the crash dumps each comprising a separate file containing information about a crash failure that occurred during operation of a database, the information including one or more function names and one or more parameters; preprocessing the plurality of crash dumps by removing the one or more parameters from each crash dump in the plurality of crash dumps; using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names; adding an indication of each of the identified individual components into the preprocessed plurality of crash dumps; and training a first machine-learned model by feeding the preprocessed plurality of crash dumps and a coefficient indicating a level of component distance into a first machine learning algorithm, the first machine-learned model trained to output a similarity score between two crash dumps using a longest common subsequence from each crash dump, the first machine-learning model basing the similarity score on a coefficient indicating a number of components to consider from a top of a crash dump, component position in the crash dump, and the coefficient indicating the level of component distance, the level of component distance being a measurement of how similar two components are based on similarity of function names of the two components, each component being a software module upon which one or more functions are executed. 2. The system of claim 1 , wherein the coefficient indicating the number of components to consider from the top of a crash dump and the coefficient indicating the level of component distance are learned using a second machine learning algorithm. 3. The system of claim 1 , wherein the database is an in-memory database. 4. The system of claim 1 , wherein the operations further comprise feeding a first and a second crash dump into the first machine-learned model and eliminating either the first or the second crash dump if a similarity score for the first and second crash dumps exceeds a predetermined threshold. 5. The system of claim 1 , wherein the preprocessing further comprises: filtering out a first number of most frequently occurring function names, the first number determined dynamically. 6. The system of claim 1 , wherein the using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names includes using a breadth-first search to extract component-file mappings in layered CMakeLists from the database. 7. The system of claim 6 , wherein the using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names further comprises combining the component-file mappings with file-function mappings obtained from an abstract syntax tree of the database. 8. A method comprising: obtaining a plurality of crash dumps in a computer system, the crash dumps each comprising a separate file containing information about a crash failure that occurred during operation of a database, the information including one or more function names and one or more parameters; preprocessing the plurality of crash dumps by removing the one or more parameters from each crash dump in the plurality of crash dumps; using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names; adding an indication of each of the identified individual components into the preprocessed plurality of crash dumps; and training a first machine-learned model by feeding the preprocessed plurality of crash dumps and a coefficient indicating a level of component distance into a first machine learning algorithm, the first machine-learned model trained to output a similarity score between two crash dumps using a longest common subsequence from each crash dump, the first machine-learning model basing the similarity score on a coefficient indicating a number of components to consider from a top of a crash dump, component position in the crash dump, and the coefficient indicating the level of component distance, the level of component distance being a measurement of how similar two components are based on similarity of function names of the two components, each component being a software module upon which one or more functions are executed. 9. The method of claim 8 , wherein the coefficient indicating the number of components to consider from the top of a crash dump and the coefficient indicating the level of component distance are learned using a second machine learning algorithm. 10. The method of claim 8 , wherein the database is an in-memory database. 11. The method of claim 8 , further comprising feeding a first and a second crash dump into the first machine-learned model and eliminating either the first or the second crash dump if a similarity score for the first and second crash dumps exceeds a predetermined threshold. 12. The method of claim 8 , wherein the preprocessing further comprises: filtering out a first number of most frequently occurring function names, the first number determined dynamically. 13. The method of claim 8 , wherein the using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names includes using a breadth-first search to extract component-file mappings in layered CMakeLists from the database. 14. The method of claim 13 , wherein the using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names further comprises combining the component-file mappings with file-function mappings obtained from an abstract syntax tree of the database. 15. A non-transitory machine-readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: obtaining a plurality of crash dumps in a computer system, the crash dumps each comprising a separate file containing information about a crash failure that occurred during operation of a database, the information including one or more function names and one or more parameters; preprocessing the plurality of crash dumps by removing the one or more parameters from each crash dump in the plurality of crash dumps; using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names; adding an indication of each of the identified individual components into the preprocessed plurality of crash dumps; and training a first machine-learned model by feeding the preprocessed plurality of crash dumps and a coefficient indicating a level of component distance into a first machine learning algorithm, the first machine-learned model trained to output a similarity score between two crash dumps using a longest common subsequence from each crash dump, the first machine-learning model basing the similarity score on a coefficient indicating a number of components to consider from a top of a crash dump, component position in the crash dump, and the

Assignees

Sap Se

Inventors

Classifications

G06N20/00
Machine learning · CPC title
G06F16/215Primary
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
G06N5/04Primary
Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

View patent family 80931480

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12333448B2 cover?: In an example embodiment, a machine learned model is utilized for identifying duplicate crash dumps. After a developer submits code, corresponding test cases are used to ensure the quality of the software delivery. Test failures can occur during this period, such as crashes, errors, and timeouts. Since it takes time for developers to resolve them, many duplicate failures can occur during this t…
Who is the assignee on this patent?: Sap Se
What technology area does this patent fall under?: Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).