Method and system for high performance pattern indexing

US9323794B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9323794-B2
Application numberUS-201213686338-A
CountryUS
Kind codeB2
Filing dateNov 27, 2012
Priority dateNov 13, 2006
Publication dateApr 26, 2016
Grant dateApr 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein is a method and system for accelerating the generation of pattern indexes. In exemplary embodiments, regular expression pattern matching can be performed at high speeds on data to determine whether a pattern is present in the data. Pattern indexes can then be built based on the results of such regular expression pattern matching. Reconfigurable logic such a field programmable gate arrays (FPGAs) can be used to hardware accelerate these operations.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a field programmable gate array (FPGA); and a bus through which streaming data is delivered to the FPGA at a bus bandwidth rate; the FPGA having firmware logic deployed thereon, the firmware logic configured to (1) receive the streaming data, (2) perform regular expression pattern matching on the streaming data with respect to a plurality of patterns to detect whether any pattern matches exist within the streaming data, and (3) concurrently build, at the bus bandwidth rate, a plurality of pattern indexes for the streaming data based on detected pattern matches, wherein each pattern index corresponds to a pattern against which regular expression pattern matching was performed and comprises location data for detected pattern matches. 2. The apparatus of claim 1 wherein the firmware logic is further configured to perform the regular expression pattern matching on the streaming data at hardware speeds. 3. The apparatus of claim 1 wherein the bus is a PCI-Express bus. 4. The apparatus of claim 1 further comprising a network interface, the network interface configured to receive the streaming data from a network and deliver the streaming data to the FPGA through the bus at the bus bandwidth rate. 5. The apparatus of claim 4 further comprising a processor, the processor configured to manage a flow of the streaming data to the FPGA via the network interface and the bus. 6. The apparatus of claim 1 further comprising a disk controller, the disk configured to receive the streaming data from a disk and deliver the streaming data to the FPGA through the bus at the bus bandwidth rate. 7. The apparatus of claim 6 further comprising a processor, the processor configured to manage a flow of the streaming data to the FPGA via the network interface and the bus. 8. The apparatus of claim 1 wherein the firmware logic comprises a plurality of regular expression pattern matching modules in parallel, each regular expression pattern matching module being keyed with a pattern; each regular expression pattern matching module being configured to (1) receive the streaming data, (2) perform a regular expression pattern matching operation on the streaming data to detect any pattern matches that exist within the streaming data with respect to its keyed pattern, and (3) operate simultaneously in parallel with the other regular expression pattern matching modules. 9. The apparatus of claim 8 wherein each regular expression pattern matching module is keyed with a different pattern. 10. The apparatus of claim 8 wherein the streaming data comprises a plurality of unstructured data objects; wherein the pattern indexes comprise a plurality of terms and a plurality of pointers associated with the terms, the terms corresponding to portions of the data stream for which a pattern match was detected, wherein each pointer identifies where at least one data object related to the term associated with that pointer can be located in a computer system; wherein the firmware logic is further configured to (1) pre-process the streaming data by (i) parsing the data objects into a plurality of words, (ii) generating a data object identifier that corresponds to each data object, wherein each data object identifier is indicative of its corresponding data object's position within the streaming data, and (iii) generating a position identifier that corresponds to each parsed word, wherein each position identifier is indicative of its corresponding word's position within the streaming data, and (2) build the pattern indexes by generating the pointers for the pattern indexes based at least in part on the generated data object identifiers and the generated position identifiers for the detected pattern matches. 11. The apparatus of claim 10 wherein the firmware logic is further configured to build a general index from the pre-processed streaming data using the generated data object identifiers and the generated position identifiers such that (1) the general index's terms comprise the words within the pre-processed streaming data and (2) the terms' associated pointers comprise the data object identifiers and position identifiers corresponding to those words. 12. The apparatus of claim 8 wherein the firmware logic further comprises an exact matching module in parallel with the parallel regular expression pattern matching modules, the exact matching module being keyed with a dictionary, the dictionary comprising a plurality of words; wherein the exact matching module is configured to perform an exact matching operation on the streaming data to thereby detect any exact matches that exist within the streaming data with respect to any of the dictionary words; and wherein the firmware logic is further configured to build a dictionary index for the streaming data based on the detected exact matches. 13. The apparatus of claim 12 wherein the exact matching module and the parallel regular expression pattern matching modules are configured to operate on the streaming data at hardware speeds. 14. The apparatus of claim 8 wherein the firmware logic further comprises an approximate matching module in parallel with the parallel regular expression pattern matching modules, the approximate matching module being keyed with a dictionary, the dictionary comprising a plurality of words; wherein the approximate matching module is configured to perform an approximate matching operation on the streaming data to thereby detect any approximate matches that exist within the streaming data with respect to any of the dictionary words; and wherein the firmware logic is further configured to build a dictionary index for the streaming data based on the detected approximate matches. 15. The apparatus of claim 14 wherein the approximate matching module and the parallel regular expression pattern matching modules are configured to operate on the streaming data at hardware speeds. 16. The apparatus of claim 8 wherein the streaming data comprises a plurality of data objects, and wherein the firmware logic is further configured to, in response to a detection from by a regular expression pattern matching operation that data within a data object is a pattern match with respect to a pattern, tag that data object as belonging to a class. 17. The apparatus of claim 16 wherein the firmware logic is further configured to build a class index based on the tagged data objects, the class index being associated with the class and configured to identify the data objects that belong to the associated class. 18. The apparatus of claim 1 wherein at least one of the patterns comprises a credit card number pattern. 19. The apparatus of claim 1 wherein at least one of the patterns comprises a social security number pattern. 20. The apparatus of claim 1 wherein at least one of the patterns comprises an email address pattern. 21. The apparatus of claim 1 wherein at least one of the patterns comprises a telephone number pattern. 22. The apparatus of claim 1 wherein at least one of the patterns comprises an Internet uniform resource locator (URL) pattern. 23. The apparatus of claim 1 wherein the streaming data comprises a plurality of unstructured data objects; wherein the pattern indexes comprise a plurality of terms and a plurality of pointers associated with the terms, the terms corresponding to portions of the streaming data for which a pattern match was detected, wherein each pointer identifies where

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9323794B2 cover?
Disclosed herein is a method and system for accelerating the generation of pattern indexes. In exemplary embodiments, regular expression pattern matching can be performed at high speeds on data to determine whether a pattern is present in the data. Pattern indexes can then be built based on the results of such regular expression pattern matching. Reconfigurable logic such a field programmable g…
Who is the assignee on this patent?
Exegy Inc, Ip Reservoir Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30312. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).