Identifying the root cause of failure observed in connection to a workflow

US11789804B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11789804-B1
Application numberUS-202217589556-A
CountryUS
Kind codeB1
Filing dateJan 31, 2022
Priority dateOct 18, 2021
Publication dateOct 17, 2023
Grant dateOct 17, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of identifying a root cause of a failure for a trace within a microservices-based application includes determining if a root span of the trace is an error span resulting in an error experienced by a user at a front end of the microservices-based application. If the root span of the trace is an error span, the method analyzes a plurality of spans comprising the trace to determine if the trace comprises at least one leaf error span. If the trace comprises a single leaf error span, the method attributes the root cause of the failure in the trace to a service associated with the single leaf error span. If the trace comprises multiple leaf error spans the method attributes the root cause of the failure in the trace to a service associated with a leaf error span of the multiple leaf error spans comprising a latest starting timestamp.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of identifying a root cause of a failure for a trace within a microservices-based application, the method comprising: determining if a root span of the trace is an error span resulting in an error experienced by a user at a front end of the microservices-based application; responsive to a determination that the root span of the trace is the error span, analyzing a plurality of spans comprising the trace to determine if the trace comprises at least one leaf error span that is a last error span of a chain of unbroken error spans starting at the root span; responsive to a determination that the trace comprises the at least one leaf error span, attributing the root cause of the failure in the trace to a service associated with the at least one leaf error span; and responsive to a determination that the trace comprises multiple leaf error spans, attributing the root cause of the failure in the trace to a service associated with a leaf error span of the multiple leaf error spans that comprises a latest starting timestamp. 2. The method of claim 1 , wherein the trace is associated with a workflow, wherein the workflow is operable to group together a plurality of spans in the trace generated in response to a client process implemented by a group of services comprised within the microservices-based application. 3. The method of claim 1 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure. 4. The method of claim 1 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure, and further comprising: computing metrics for the service associated with the root cause of the failure using the global tag. 5. The method of claim 1 , further comprising: displaying the trace as a graphical element in a graphical user interface, wherein the graphical element visually indicates which service in the trace is associated with the root cause of the failure. 6. The method of claim 1 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure, and further comprising: computing metrics for the service associated with the root cause of the failure using the global tag and a data set associated with a metric time series modality. 7. The method of claim 1 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure, and further comprising: computing metrics for the service associated with the root cause of the failure using the global tag and a data set associated with a metric events modality. 8. The method of claim 1 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure, and further comprising: computing metrics for the service associated with the root cause of the failure using the global tag, wherein the metrics comprise: request; error; and latency related metrics. 9. The method of claim 1 , further comprising: displaying the trace as a graphical element in a graphical user interface, wherein the graphical element visually indicates which service in the trace is associated with the root cause of the failure; and providing a client with information regarding a service team connected with the service associated with the root cause of the failure through the graphical user interface. 10. A non-transitory computer-readable medium having computer-readable program code embodied therein for causing a computer system to perform a method of identifying a root cause of a failure for a trace within a microservices-based application, the method comprising: determining if a root span of the trace is an error span resulting in an error experienced by a user at a front end of the microservices-based application; responsive to a determination that the root span of the trace is an error span, analyzing a plurality of spans comprising the trace to determine if the trace comprises at least one leaf error span that is a last error span of a chain of unbroken error spans starting at the root span; responsive to a determination that the trace comprises at least one leaf error span, attributing the root cause of the failure in the trace to a service associated with the at least one leaf error span; and responsive to a determination that the trace comprises multiple leaf error spans, attributing the root cause of the failure in the trace to a service associated with a leaf error span of the multiple leaf error spans that comprises a latest starting timestamp. 11. The non-transitory computer-readable medium of claim 10 , wherein the trace is associated with a workflow, wherein the workflow groups together a plurality of spans in the trace generated in response to a client process implemented by a group of services comprised within the microservices-based application. 12. The non-transitory computer-readable medium of claim 10 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure. 13. The non-transitory computer-readable medium of claim 10 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure, and wherein the method further comprises: computing metrics for the service associated with the root cause of the failure using the global tag. 14. The non-transitory computer-readable medium of claim 10 , wherein the method further comprises: displaying the trace as a graphical element in a graphical user interface, wherein the graphical element visually indicates which service in the trace is associated with the root cause of the failure. 15. The non-transitory computer-readable medium of claim 10 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure, and wherein the method further comprises: computing metrics for the service associated with the root cause of the failure using the global tag and a data set associated with a metric time series modality. 16. The non-transitory computer-readable medium of claim 10 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure, and wherein the method further comprises: computing metrics for the service associated with the root cause of the failure using the global tag and a data set associated with a metric events modality. 17. The non-transitory computer-readable medium of claim 10 , wherein the trace is tagged with a global tag comprising a name of the service associated with the root cause of the failure, and wherein the method further comprises: computing metrics for the service associated with the root cause of the failure using the global tag, wherein the metrics comprise: request; error; and latency related metrics. 18. The non-transitory computer-readable medium of claim 10 , wherein the method further comprises: displaying the trace as a service graph in a graphical user interface, wherein the service graph visually indicates which service in the trace is associated with the root cause of the failure; and providing a client information regarding a service team connected with the service associated with the root cause of the failure through the graphical user interface. 19. A system for performing a method of identifying a root cause of a failure for a trace within a

Assignees

Inventors

Classifications

  • Environments for analysis, debugging or testing of software · CPC title

  • by tracing the execution of the program · CPC title

  • by runtime analysis (performance monitoring G06F11/3466) · CPC title

  • G06F11/079Primary

    Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

  • Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11789804B1 cover?
A method of identifying a root cause of a failure for a trace within a microservices-based application includes determining if a root span of the trace is an error span resulting in an error experienced by a user at a front end of the microservices-based application. If the root span of the trace is an error span, the method analyzes a plurality of spans comprising the trace to determine if the…
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/079. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).