Signature detection

US10635662B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10635662-B2
Application numberUS-201615153504-A
CountryUS
Kind codeB2
Filing dateMay 12, 2016
Priority dateMay 12, 2016
Publication dateApr 28, 2020
Grant dateApr 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for multicategory classification of streaming data records includes receiving a list of signature rules, each comprising a Boolean expression of a set of atomic recognizers (ARs) of one or more input fields of an input record and an assignment of a signature if the Boolean expression evaluates to TRUE, generating a list of all signature contexts from the list of signature rules, generating a context lookup table for each context, and processing a stream of input records on which signature detection is performed by using said ARs, said list of signature contexts, and said context lookup table for each context, wherein each input record in the stream of input records is classified into one of a plurality of categories based on the signature detection result, wherein an amount of processing grows sublinearly with a number of signature rules being processed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for multicategory classification of streaming data records, comprising the steps of: receiving a list of signature rules, wherein a signature rule comprises a Boolean expression of a set of atomic recognizers (ARs) of one or more input fields of an input record and an assignment of a signature if the Boolean expression evaluates to TRUE, and an atomic recognizer is a logic function which takes an input record field value as input and determines which of different rule values for a <field, predicate-type> tuple corresponds to this input record field; generating a list of distinct signatures from the list of signature rules; generating a list of ARs from the list of signatures, and for each AR, a table of possible results for that AR; generating a list of all signature contexts from the list of signature rules, wherein the context of a signature rule is a subsequence of ARs ANDed together as conditions in the signature rule; generating a context lookup table for each context, wherein a dimensionality of each context table equals the number of AR's, a number of elements in each dimension is a number of possible output values of the corresponding AR, and for each context and for each rule resolved by that context, a context lookup table value corresponding to an AR output value is set to the signature result of that AR output value; determining a quit level for each context; and processing a stream of input records on which signature detection is performed by using said list of ARs, said list of signature contexts, and said context lookup table for each context, wherein each input record in the stream of input records is classified into one of a plurality of categories based on the signature detection result, wherein an amount of processing grows sublinearly with a number of signature rules being processed. 2. The method of claim 1 , further comprising outputting a stream of output records that include the classification of each input record based on the signature detection result. 3. The method of claim 1 , wherein each signature rule is in conjunctive normal form. 4. The method of claim 1 , wherein the list of ARs is generated from the list of signature rides by listing, for each input record field, a list of different AR predicates that take values of said input record fields as input, wherein AR predicates include a regular expression match, a string match of a starting substring or an ending substring of an input record field with a specified character string or regular expression, a test of whether a string input record field value includes a keyword, a match of the input record field value to one of a set of matching strings, a test of whether an input record field value lies within a specific range of values, and a test of whether a most recent value of a state associated with input records having a particular key field value is either not initialized or has a value set by a previous signature operation. 5. The method of claim 1 , further comprising: generating deterministic finite state automata (DFAs) from specifications of string AR's dealing with regular expression matching, keyword search, and beginning and ending substrings, wherein said DFAs are also used to detect signatures in the stream of input records; generating memory representations of those AR's that deal with value sets and value ranges for integer, floating point and address fields; and converting those AR's which do not require regular expression or keyword compilation into memory tables. 6. The method of claim 1 , wherein determining a quit level for each context comprises determining, for each context in order, a number wherein all higher precedence signature rules in a signature precedence order have already been resolved in this or a higher precedence context. 7. The method of claim 1 , wherein processing, a stream of input records on which signature detection is to be performed comprises: reading an input record from an input data stream; initializing state variables that track progress of signature detection processing in the input record, wherein that state variables include an AR result vector and a BEST_MATCH_SO_FAR variable, wherein the AR result vector has one entry for each AR defined for a current set of signature rules for storing a result of running that AR on an appropriate field of a current input record, and BEST_MATCH_SO_FAR is updated as signature matches are detected during processing of an input record to reflect that signature match with a rule of higher precedence has been found; applying the ARs for each signature context of the list of signature rules to the input record and saving an AR result value to the AR result vector; and determining from the context lookup table whether the AR result values in the AR result vector correspond to a signature match. 8. The method of claim 7 , further comprising: comparing, if a signature match has been found, a precedence of the signature match for this context with a precedence of a previous BEST_MATCH_SO_FAR, and updating BEST_MATCH_SO_FAR precedence of the signature match for this context has a higher precedence than the previous BEST_MATCH_SO_FAR; and comparing the precedence of BEST_MATCH_SO_FAR with a precedence of the quit level, wherein if the precedence of BEST_MATCH_SO_FAR is higher than the precedence of the quit level, processing of the input record ceases. 9. The method of claim 1 , wherein each field in the input record is identified by name and type. 10. The method of claim 7 , wherein the list of signature rules include one or more stateful signature rules that comprise a Boolean expression of a set of ARs of one or more input fields of the input record and a specification of a state transition, if the Boolean expression evaluates to TRUE, wherein the method further comprises, searching for a state-key value for the input record in a state store table using a state-key specification, wherein a state for the input record is set to the found state-key value if a state-key value is found, and the state for the input record is set to INIT if no state-key value is found; saving a destination state value for a signature rule, if no signature has been matched for the input record; and deleting the destination state value for the signature rule, if a signature match has been found. 11. A method for multicategory classification of streaming data records, comprising the steps of: receiving a list of signature rules, wherein a signature rule comprises a Boolean expression of a set of atomic recognizers (ARs) of one or more input fields of an input record and an assignment of a signature if the Boolean expression evaluates to TRUE, and an atomic recognizer is a logic function which takes an input record field value as input and determines which of different rule values for a <field, predicate-type> tuple corresponds to this input record field; reading an input record from an input data stream and identifying each field in the input record by name and type; receiving a list of all signature contexts for the list of signature rules and a context lookup table for each context, wherein the context of a signature rule is a subsequence of ARs ANDed together as conditions in the signature rule, and a context lookup table value for an AR for each context and for each rule resolved by that context is a signature result of that AR output value; initializing state variables that track progress of signature detection processing in the input record, wherein that state variables include an AR result vector and a BEST_MATCH_SO_FAR variable, wherein the AR result vector has one entry for each AR defined for a current set of

Assignees

Inventors

Classifications

  • Tablespace storage structures; Management thereof · CPC title

  • G06F16/245Primary

    Query processing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10635662B2 cover?
A method for multicategory classification of streaming data records includes receiving a list of signature rules, each comprising a Boolean expression of a set of atomic recognizers (ARs) of one or more input fields of an input record and an assignment of a signature if the Boolean expression evaluates to TRUE, generating a list of all signature contexts from the list of signature rules, genera…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).