Shellcode detection

US10482248B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10482248-B2
Application numberUS-201715806229-A
CountryUS
Kind codeB2
Filing dateNov 7, 2017
Priority dateNov 9, 2016
Publication dateNov 19, 2019
Grant dateNov 19, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Identifying shellcode in a sequence of instructions by identifying a first instruction, the first instruction identifying a first bound of a sequence of instructions, identifying a second instruction, the second instruction identifying a second bound of the sequence of instructions, and generating a distribution for the sequence of instructions, bounded by the first instruction and the second instructions, the distribution indicative of whether the sequence of instructions is likely to include shellcode.

First claim

Opening claim text (preview).

What is claimed: 1. A system comprising: at least one processor; and at least one memory including program code which when executed by the at least one processor causes operations comprising: identifying a first instruction, the first instruction identifying a first bound of a sequence of instructions; identifying a second instruction, the second instruction identifying a second bound of the sequence of instructions; generating a distribution for the sequence of instructions, bounded between the first bound and the second bound, the distribution indicative of whether the sequence of instructions is likely to include shellcode; determining, based on the distribution and by a machine-learning model, a likelihood of whether the sequence of instructions is likely to include shellcode; and preventing the sequence of instructions from being executed if it determined that the sequence of instructions is likely to include shellcode; wherein the machine-learning model is assumptionless as to a form of and as to a frequency distribution of one or more mnemonics within the sequence of instructions, based on observed distributions of the one or more mnemonics in a first section of the sequence of instructions. 2. The system according to claim 1 , wherein the first instruction includes an identification of a first location in memory and the second instruction includes an identification of a second location in memory. 3. The system according to claim 2 , wherein the shellcode includes a position-independent instruction referencing the identification of the first location in memory. 4. The system according to claim 1 , wherein the first instruction and the second instruction are in a file and the identifying of the first instruction and the identifying of the second instruction further comprises: disassembling a binary form of the file into code, the code having a higher-level representation when compared to the binary form. 5. The system according to claim 4 , wherein the identifying of the first instruction and the identifying of the second instruction further comprises: assigning a mnemonic to individual elements of the code of the file, the individual elements of the code including at least the first instruction and the second instruction, wherein the identifying of the first instruction includes identifying a first mnemonic associated with the first instruction, and wherein the identifying of the second instruction includes identifying a second mnemonic associated with the second instruction. 6. The system according to claim 1 , wherein the distribution is a conditional probability distribution. 7. The system according to claim 1 , wherein the determining of the likelihood includes assigning a score to the sequence of instructions. 8. The system according to claim 7 , wherein the score is a perplexity score indicative of a level of difficulty to generate a prediction of the distribution. 9. The system according to claim 1 , wherein the machine learning model determines, based on the observed distributions of the one or more mnemonics in the first section of the sequence of instructions, a prediction of the frequency distribution of the one or more mnemonics in a second section of the sequence of instructions. 10. The system according to claim 9 , wherein the machine-learning model is a non-parametric and non-Markovian machine-learning model. 11. The system according to claim 10 , wherein the machine-learning model comprises a sequence memoizer. 12. The system according to claim 1 , wherein the machine-learning model comprises an online inference model. 13. The system according to claim 1 , wherein the sequence of instructions includes instructions and data. 14. The system according to claim 1 , wherein the sequence of instructions is inclusive of one or more of the first instruction and the second instruction. 15. The system according to claim 1 , wherein the sequence of instructions is exclusive of one or more of the first instruction and the second instruction. 16. A computer-implemented method comprising: identifying a first instruction, the first instruction identifying a first bound of a sequence of instructions; identifying a second instruction, the second instruction identifying a second bound of the sequence of instructions; and generating a distribution for the sequence of instructions, bounded between the first bound and the second bound, the distribution indicative of whether the sequence of instructions is likely to include shellcode; determining, based on the distribution and by a machine-learning model, a likelihood of whether the sequence of instructions is likely to include shellcode; and preventing the sequence of instructions from being executed if it determined that the sequence of instructions is likely to include shellcode; wherein the machine-learning model is assumptionless as to a form of and as to a frequency distribution of one or more mnemonics within the sequence of instructions, based on observed distributions of the one or more mnemonics in a first section of the sequence of instructions. 17. The method according to claim 16 , wherein: the first instruction includes an identification of a first location in memory and the second instruction includes an identification of a second location in memory; and the shellcode includes a position-independent instruction referencing the identification of the first location in memory. 18. The method of claim 16 , wherein the first instruction and the second instruction are in a file and the identifying of the first instruction and the identifying of the second instruction further comprises: disassembling a binary form of the file into code, the code having a higher-level representation when compared to the binary form. 19. The method according to claim 18 , wherein the identifying of the first instruction and the identifying of the second instruction further comprises: assigning a mnemonic to individual elements of the code of the file, the individual elements of the code including at least the first instruction and the second instruction, wherein the identifying of the first instruction includes identifying a first mnemonic associated with the first instruction, and wherein the identifying of the second instruction includes identifying a second mnemonic associated with the second instruction. 20. A non-transient computer readable medium containing program instructions which, when executed by at least one processor, cause the at least one processor to perform one or more operations, the operations comprising: identifying a first instruction, the first instruction identifying a first bound of a sequence of instructions; identifying a second instruction, the second instruction identifying a second bound of the sequence of instructions; generating a distribution for the sequence of instructions, bounded between the first bound and the second bound, the distribution indicative of whether the sequence of instructions is likely to include shellcode; and determining, based on the distribution and by a machine-learning model, a likelihood of whether the sequence of instructions is likely to include shellcode; and preventing the sequence of instructions from being executed if it determined that the sequence of instructions is likely to include shellcode; wherein the machine-learning model is assumptionless as to a form of and as to a frequency distribution of one or more mnemonics within the sequence of instructions, based on observed distr

Assignees

Inventors

Classifications

  • Test or assess software · CPC title

  • G06F21/566Primary

    Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title

  • involving long-term monitoring or reporting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10482248B2 cover?
Identifying shellcode in a sequence of instructions by identifying a first instruction, the first instruction identifying a first bound of a sequence of instructions, identifying a second instruction, the second instruction identifying a second bound of the sequence of instructions, and generating a distribution for the sequence of instructions, bounded by the first instruction and the second i…
Who is the assignee on this patent?
Cylance Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/566. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 19 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).