Training a machine learning model for analysis of instruction sequences

US10922604B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10922604-B2
Application numberUS-201615345436-A
CountryUS
Kind codeB2
Filing dateNov 7, 2016
Priority dateSep 9, 2016
Publication dateFeb 16, 2021
Grant dateFeb 16, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one respect, there is provided a system for training a neural network adapted for classifying one or more instruction sequences. The system may include at least one processor and at least one memory. The memory may include program code which when executed by the at least one processor provides operations including: training, based at least on training data, a machine learning model to detect one or more predetermined interdependencies amongst a plurality of tokens in the training data; and providing the trained machine learning model to enable classification of one or more instruction sequences. Related methods and articles of manufacture, including computer program products, are also provided.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for detecting malicious instruction sequences in a script which, when executed causes undesirable or harmful behavior to a computing device, the system comprising: at least one processor; and at least one memory including program code which when executed by the at least one processor provides operations comprising: tokenizing a plurality of historical instruction sequences each forming part of a different script to generate training data, wherein the instruction sequences are configured to be executed without compilation; training, based at least on the training data, at least one machine learning model to detect one or more predetermined interdependencies amongst a plurality of tokens in the training data, wherein at least one of the predetermined interdependencies indicates that the corresponding instructions sequence is malicious, the trained at least one machine learning model using encoding to vectorize instruction sequences so as to preserve similarities between tokens; and providing the trained at least one machine learning model to enable classification of one or more instruction sequences as either being malicious or benign based on the detected one or more predetermined interdependencies, the trained at least one machine learning model, when deployed, being used to prevent instruction sequences classified as malicious from being executed and causing undesirable or harmful behavior to the computing device; wherein: the trained at least one machine learning model comprises a recursive neural tensor network that assigns weights and tensors to nodes and connections of an abstract syntax tree representation of the instruction sequence such that a weight of a parent node p in the abstract syntax tree representation is based on: p = f ⁡ ( [ c 1 c 2 ] ⁢ V ⁡ [ c 1 c 2 ] + W ⁡ [ c 1 c 2 ] ) , wherein c 1 , and c 2 , correspond to scores assigned to children nodes in the abstract syntax tree representation, wherein tensor V and weight W connect the children nodes to the parent node, wherein a tensor V is defined as V∈R 2dx2dxd , and wherein d is a dimension of a vector representing a token; the abstract syntax tree representation of the instruction sequence preserves a structure of the instruction sequence including one or more rules for combining the tokens in the instruction sequence; the encoding maximizes an objective function J(θ) in order to generate v vector representations that preserve similarities between tokens: J ⁡ ( θ ) = 1 T ⁢ ∑ t = 1 T ⁢ ∑ - c ≤ j ≤ c , j ≠ 0 ⁢ log ⁢ ⁢ p ⁡ ( w t + j | w t ) , ⁢

Assignees

Inventors

Classifications

  • G06F21/563Primary

    by source code analysis · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/0442Primary

    characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10922604B2 cover?
In one respect, there is provided a system for training a neural network adapted for classifying one or more instruction sequences. The system may include at least one processor and at least one memory. The memory may include program code which when executed by the at least one processor provides operations including: training, based at least on training data, a machine learning model to detect…
Who is the assignee on this patent?
Cylance Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/563. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 16 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).