Template based data reduction for security related information flow data

US10733149B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10733149-B2
Application numberUS-201815979512-A
CountryUS
Kind codeB2
Filing dateMay 15, 2018
Priority dateMay 18, 2017
Publication dateAug 4, 2020
Grant dateAug 4, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for data reduction including organizing data of an event stream into a file access table concurrently with receiving the event stream, the data including independent features and dependent features. A frequent pattern tree (FP-Tree) is built including nodes corresponding to the dependent features according to a frequency of occurrence of the dependent features relative to the independent features. Each single path in the FP-Tree is merged into a special node corresponding to segments of dependent features to produce a reduced FP-Tree. All path combinations in the reduced FP-Tree are identified. A compressible file access template (CFAT) is generated corresponding to each of the path combinations. The data of the event stream is compressed with the CFATs to reduce the dependent features to special events representing the dependent features.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for reducing data for storage in a data storage system, the method comprising: organizing data of an event stream into a file access table concurrently with receiving the event stream, the data including independent features and dependent features; building a frequent pattern tree (FP-Tree) including nodes corresponding to the dependent features according to a frequency of occurrence of the dependent features relative to the independent features; merging each single path in the FP-Tree into a special node corresponding to segments of dependent features to produce a reduced FP-Tree, the merging comprising weakly dominated path (WDP) merging, wherein the merging is determined to be WDP merging by (1−σ)<(p1·counter)/(p2·counter)<(1+σ), where p1 represents a first node, p2 represents a second node, σ is a deviation, and counter represents a counter of a node; identifying all path combinations in the reduced FP-Tree; generating a compressible file access template (CFAT) corresponding to each of the path combinations; and compressing the data of the event stream with the CFATs to reduce the dependent features to special events representing the dependent features; and selecting segments of the dependent features according to a data reduction score of each of the segments based on a size of the segment and a frequency of occurrence of the segment, the data reduction score being determined by score=t·size×t·freq−t·size−t·freq, where score represents the data reduction score, t represents the segment, size represents the size of the segment, and freq represents the frequency of occurrence of the segment. 2. The method of claim 1 , wherein the single paths include sets of nodes of the FP-Tree that have only one parent node and only one child node. 3. The method of claim 1 , wherein the frequency of occurrence of the segment is determined according to a counter of a node corresponding to a least frequently occurring dependent feature of the corresponding single path. 4. The method of claim 1 , further including: comparing each segment to every other segment of a corresponding single path to determine an intersection with another segment; and removing an intersecting segment having a lower score than a segment with which the intersecting segment intersects. 5. The method of claim 1 , further including selecting path combinations of the reduced FP-Tree according to a data reduction score of each of the path combinations based on a size of the path combination and a frequency of occurrence of the path combination. 6. The method of claim 5 , wherein the frequency of occurrence of the path combination is determined according to a counter of a node or special node corresponding to a least frequently occurring node or special node in the path combination. 7. The method of claim 5 , further including: comparing each path combination to every other path combination to determine an overlap with another path combination; and removing an overlapping path combination having a lower score than a path combination with which the overlapping path combination overlaps. 8. The method of claim 1 , further including: generating a finite state automaton by ordering the dependent features in each CFAT; converting the ordered dependent features of each CFAT into a string; and using the finite state automaton strings to match CFATs to dependent feature sequences corresponding to an independent feature in the event stream. 9. The method of claim 1 , wherein the independent features include processes performed by computers in a network, and the dependent features include files accessed in initial stages of the processes. 10. A method for reducing data for storage in a data storage system, the method comprising: collecting data in an event stream from a network of computers, the data including each process run by each computer and files accessed by each process in initial stages of each of processes; organizing the data into a file access table concurrently with receiving the event stream; building a frequent pattern tree (FP-Tree) including nodes corresponding to the files according to a frequency of occurrence of the files relative to the processes; merging each single path in the FP-Tree into a special node corresponding to segments of files to produce a reduced FP-Tree, the merging comprising weakly dominated path (WDP) merging, wherein the merging is determined to be WDP merging by (1−σ)<(p1·counter)/(p2·counter)<(1+σ), where p1 represents a first node, p2 represents a second node, σ is a deviation, and counter represents a counter of a node; identifying all path combinations in the reduced FP-Tree; generating a compressible file access template (CFAT) corresponding to each of the path combinations; compressing the data of the event stream with the CFATs to reduce the files to special events representing the files; selecting segments of the dependent features according to a data reduction score of each of the segments based on a size of the segment and a frequency of occurrence of the segment, the data reduction score being determined by score=t·size×t·freq−t·size−t·freq, where score represents the data reduction score, t represents the segment, size represents the size of the segment, and freq represents the frequency of occurrence of the segment; and analyzing the compressed data with a pattern analysis system. 11. The method of claim 10 , wherein the single paths include sets of nodes of the FP-Tree that have only one parent node and only one child node. 12. The method of claim 10 , further including selecting segments of the files according to a data reduction score of each of the segments based on a size of the segment and a frequency of occurrence of the segment. 13. The method of claim 12 , wherein the frequency of occurrence of the segment is determined according to a counter of a node corresponding to a least frequently occurring file of the corresponding single path. 14. The method of claim 12 , further including: comparing each segment to every other segment of a corresponding single path to determine an intersection with another segment; and removing an intersecting segment having a lower score than a segment with which the intersecting segment intersects. 15. The method of claim 10 , further including selecting path combinations of the reduced FP-Tree according to a data reduction score of each of the path combinations based on a size of the path combination and a frequency of occurrence of the path combination. 16. The method of claim 15 , wherein the frequency of occurrence of the path combination is determined according to a counter of a node or special node corresponding to a least frequently occurring node or special node in the path combination. 17. The method of claim 15 , further including: comparing each path combination to every other path combination to determine an overlap with another path combination; and removing an overlapping path combination having a lower score than a path combination with which the overlapping path combination overlaps. 18. The method of claim 10 , further including: generating a finite state automaton by ordering the files in each CFAT; converting the ordered files of each CFAT into a string; and using the finite state automaton strings to match CFATs to file sequences correspond to a process in the event stream. 19. The method of claim 10 , wherein building the FP-Tree further includes pruning infrequently occurring paths of nodes from the FP-Tree to

Assignees

Inventors

Classifications

  • to a system of files or objects, e.g. local or distributed file system or database · CPC title

  • using compression, e.g. sparse files · CPC title

  • Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram · CPC title

  • Trees · CPC title

  • Clearing memory, e.g. to prevent the data from being stolen · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10733149B2 cover?
Systems and methods for data reduction including organizing data of an event stream into a file access table concurrently with receiving the event stream, the data including independent features and dependent features. A frequent pattern tree (FP-Tree) is built including nodes corresponding to the dependent features according to a frequency of occurrence of the dependent features relative to th…
Who is the assignee on this patent?
Nec Lab America Inc, Nec Corp
What technology area does this patent fall under?
Primary CPC classification G06F21/6218. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).