Simulating service disruptions for an operational production system

US9946619B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9946619-B1
Application numberUS-201514977306-A
CountryUS
Kind codeB1
Filing dateDec 21, 2015
Priority dateDec 21, 2015
Publication dateApr 17, 2018
Grant dateApr 17, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The techniques described herein provide evaluations of a production system's ability to recover from a service disruption without actually disrupting service to the production system. In some examples, a live production system is at least partly duplicated to create a shadow production system that is a quarantined copy of the production system. Traffic between the production system, client devices, and possibly dependency services may be replicated onto the shadow production system while a recovery simulation service induces a specified type of service disruption onto the shadow production system. Behavior of the shadow production system during service disruption is used to identify performance differences and to evaluate expected recovery characteristics of the live production system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method performed by a recovery simulation service, the method comprising steps of: under control of one or more processors configured with executable instructions, providing, to a subscriber of a virtualized computing resource, a dashboard service configured to receive a selection of a production system that is hosted by the virtualized computing resource and at least one type of service disruption, the production system being in live operation; receiving, through the dashboard service, an instruction associated with evaluating one or more recovery characteristics of the production system corresponding to the at least one type of service disruption; duplicating, at least partially, the production system to create a shadow production system; replicating, onto the recovery simulation service, client traffic between one or more clients and the production system, the client traffic including at least a plurality of production requests; simulating, while the production system remains in live operation, the at least one type of service disruption on the shadow production system, wherein the simulating causes at least a portion of the shadow production system to undergo a reboot procedure; sending, from the recovery simulation service, a plurality of shadow production requests to the shadow production system for processing, the plurality of shadow production requests being based on the plurality of production requests; receiving a plurality of shadow production responses from the shadow production system, the plurality of shadow production responses being responsive to the plurality of shadow production requests; determining performance differences between the production system generating production responses and the shadow production system generating shadow production responses, the performance differences being based on live operation of the production system and disrupted operation of the shadow production system; and providing an evaluation of the one or more recovery characteristics of the production system based on the performance differences between the production system and the shadow production system. 2. The method as recited in claim 1 , further comprising: replicating, onto the recovery simulation service, dependency traffic between one or more dependency services and the production system, the dependency traffic including a plurality of dependency requests and a plurality of dependency responses; receiving a plurality of shadow dependency requests from the shadow production system, the plurality of shadow dependency requests being generated by the shadow production system for processing the plurality of shadow production requests; and in response to the plurality of shadow dependency requests, sending a plurality of shadow dependency responses to the shadow production system. 3. The method as recited in claim 1 , wherein the dashboard service is further configured to enable the subscriber of the virtualized computing resource to: input disruption parameters associated with the at least one type of service disruption; and initiate the step of simulating the at least one type of service disruption on the shadow production system, wherein the simulating is at least partially based on the disruption parameters. 4. The method as recited in claim 3 , wherein the disruption parameters cause the simulating to include at least one of: a soft reboot of the shadow production system; a hard reboot of the shadow production system; a disruption of one or more shadow dependency services; or a power outage. 5. The method as recited in claim 1 , wherein the evaluation includes a service deficiency estimation, corresponding to the production system, of the at least one type of service disruption, the service deficiency estimation based on at least one of a duration of the at least one type of service disruption on the shadow production system, a number of unserved requests by the shadow production system, or a delay in processing one or more of the plurality of shadow production requests. 6. The method as recited in claim 1 , wherein the sending the plurality of shadow production requests to the shadow production system continues from a first time that is before the simulating the at least one type of service disruption to a second time that is after the simulating the at least one type of service disruption. 7. A computer-implemented method, comprising: under control of one or more processors configured with executable instructions, creating, on a recovery simulation service, a shadow production system that is at least a partial duplicate of a production system; replicating, into a shadow queue of the recovery simulation service, a plurality of production requests being transmitted between the production system and one or more clients; simulating a service disruption on the shadow production system to cause disrupted operation of the shadow production system, the simulating occurring while the production system remains in live operation; sending, to the shadow production system from the shadow queue, a plurality of shadow production requests that are at least partial duplicates of the plurality of production requests; receiving, from the shadow production system, a plurality of shadow production responses that are responsive to the plurality of shadow production requests; and determining performance differences between live operation of the production system and disrupted operation of the shadow production system, the performance differences being caused by the simulating the service disruption. 8. The method as recited in claim 7 , further comprising providing a recovery characteristic evaluation associated with the production system based on the performance differences between live operation of the production system and disrupted operation of the shadow production system. 9. The method as recited in claim 7 , further comprising: storing the plurality of shadow production requests in the shadow queue; and increasing a rate of the sending from a first rate, at which the shadow production system operates substantively identical to the production system, to a second rate at which disrupted operation of the shadow production system occurs, the shadow production system having a first degree of elasticity corresponding to the production system, the first degree of elasticity corresponding to an ability of the shadow production system to autonomously adapt capacity based on workload. 10. The method as recited in claim 9 , further comprising: creating, on the recovery simulation service, a second shadow production system that is an at least partial duplicate of the production system, the second shadow production system having a second degree of elasticity that is higher than the first degree of elasticity; re-sending, to the second shadow production system from the shadow queue, the plurality of shadow production requests, wherein the re-sending includes increasing a rate of the re-sending from the first rate to the second rate; receiving, from the second shadow production system, a second plurality of shadow production responses that are responsive to the plurality of shadow production requests; and determining performance differences between the shadow production system and the second shadow production system, the performance differences being caused at least partly by the second degree of elasticity. 11. The method as recited in claim 7 , wherein the creating includes duplicating, at least partially, a plurality of networked computing resources of the production system to create a plurality of networked shadow computing resources of the

Assignees

Inventors

Classifications

  • Methods or tools to render software testable · CPC title

  • Virtual · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • for networked environments · CPC title

  • using arrangements specific to the hardware being tested · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9946619B1 cover?
The techniques described herein provide evaluations of a production system's ability to recover from a service disruption without actually disrupting service to the production system. In some examples, a live production system is at least partly duplicated to create a shadow production system that is a quarantined copy of the production system. Traffic between the production system, client devi…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/3696. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 17 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).