N-gram analysis of inputs to a software application

US9880915B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9880915-B2
Application numberUS-201414198239-A
CountryUS
Kind codeB2
Filing dateMar 5, 2014
Priority dateMar 5, 2014
Publication dateJan 30, 2018
Grant dateJan 30, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Input sequence information may be analyzed and quantified using n-gram analysis of inputs received by an application. The sequences of inputs may be represented by n-grams, and the frequency of the various n-grams may indicate the ‘real world’ uses of the application in production, which may be compared to a test suite whose coverage may be quantified using a similar n-gram analysis. A coverage factor may compare the observed inputs to the application in production to the test suite for the application. The n-grams may be further quantified or prioritized by resource utilization and several visualizations may be generated from the data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, implemented at a distributed computer system that includes at least one computer processor, said method for analyzing an application based on n-gram sequences associated with inputs of said application, said method comprising: executing an application in a production environment that comprises a first computer system of said distributed computer system; receiving first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment; identifying, within said first tracer data, a first plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences; identifying, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment; based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application; based on said one or more characteristics, configuring a test environment that comprises a second computer system of said distributed computer system; executing said application in said test environment that includes said one or more identified characteristics; receiving second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment; identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences; identifying a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and comparing said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application. 2. The method of claim 1 , wherein at least one of the plurality of n-grams comprise a tri-gram of input parameters. 3. The method of claim 1 , wherein at least one of the plurality of n-grams comprise a 4-gram of said input parameters. 4. The method of claim 1 , further comprising: presenting a graph including at least a subset of the intercepted data. 5. The method of claim 4 , wherein the graph comprises a histogram. 6. The method of claim 5 , wherein the histogram includes one or more sequences of input parameters. 7. The method of claim 6 , wherein the one or more sequences of input parameters are associated with an entirety of the application during execution of the application in the production environment. 8. The method of claim 6 , wherein the one or more sequences of input parameters are associated with one or more functions called during execution of the application in the production environment. 9. The method of claim 6 , wherein the histogram also includes sequences of one or more function calls associated with the one or more sequences of input parameters. 10. The method of claim 9 , wherein the histogram also includes at least one of memory usage, central processing unit usage, and network usage. 11. The method of claim 10 , wherein the intercepted data includes at least one of input parameters, functions called, memory usage, central processing unit usage, and network usage. 12. A distributed computer system comprising: at least one processor; and one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the at least one processor to cause the distributed computer system to analyze an application based on n-gram sequences associated with inputs of the application, the computer-executable instructions including instructions that are executable to cause the distributed computer system to perform at least the following: execute an application in a production environment that comprises a first computer system of said distributed computer system; receive first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment; identify, within said first tracer data, a first a plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences; identify, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment; based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application; based on said one or more characteristics, configure a test environment that comprises a second computer system of said distributed computer system; execute said application in said test environment that includes said one or more identified characteristics; receive second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment; identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences; identify a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and compare said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application. 13. The computer system of claim 12 , wherein at least one of the plurality of n-grams comprise a tri-gram of inputs. 14. The computer system of claim 12 , wherein at least one of the plurality of n-grams comprise a 4-gram of said inputs. 15. The computer system of claim 12 , wherein a graph comprising at least a subset of the intercepted data is presented. 16. The computer system of claim 15 , wherein the

Assignees

Inventors

Classifications

  • Monitoring of software · CPC title

  • Monitoring arrangements specially adapted to the computing system or computing system component being monitored · CPC title

  • the data filtering being achieved by aggregating or compressing the monitored data · CPC title

  • Assessing vulnerabilities and evaluating computer system security · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9880915B2 cover?
Input sequence information may be analyzed and quantified using n-gram analysis of inputs received by an application. The sequences of inputs may be represented by n-grams, and the frequency of the various n-grams may indicate the ‘real world’ uses of the application in production, which may be compared to a test suite whose coverage may be quantified using a similar n-gram analysis. A coverage…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/3003. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 30 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).