Mitigation of malware
US-2015379264-A1 · Dec 31, 2015 · US
US2017193230A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017193230-A1 |
| Application number | US-201514702750-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 3, 2015 |
| Priority date | May 3, 2015 |
| Publication date | Jul 6, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein is a system and method for determining whether two files are similar or an unknown file contains malware or other malicious activity. The system takes a suspect file and generates a hash for the file. The hash represents segments of a file that may be compared with segments of other hashes. This hash is then compared with the hash of another file. The comparison measures the distance between the two hashes and if the two hashes are close enough to each other then the two files are consider similar to each other.
Opening claim text (preview).
1 . A system for determining similarity between two files comprising: at least one processor and at least one memory device; a representation component configured to receive a file and generate a hash of the file, the hash including a list of transitions and a list of levels; and a distance component configured to determine a distance between the received file and a second file based on a comparison of the hash and a hash for the second file. 2 . The system of claim 1 wherein the representation component further comprises: a preprocessing component, the preprocessing component configured to convert the file to a signal representative of the file. 3 . The system of claim 2 wherein the preprocessing component applies a Huffman code to the file to generate the signal. 4 . The system of claim 1 wherein the representation component further comprises: a segmentation component configured to divide a signal associated with the file into at least two segments and provided the segments as the list of transitions. 5 . The system of claim 4 wherein the segmentation component is configured to identify a transition point, the transition point representative of a boundary between two segments. 6 . The system of claim 4 wherein the segmentation component is further configured to generate a first window having a first size and a second window having a second size, the segmentation component further configured to place the first window at a first byte in the signal and place the second window at a byte following a last byte of the first window. 7 . The system of claim 6 wherein the segmentation component is further configured to calculate a first statistical property for the first window and calculate a second statistical property for the second window and compare the first statistical property with the second statistical property and determine if a difference between the first statistical property and the second statistical property exceeds a threshold value. 8 . The system of claim 7 wherein the segmentation component is further configured to enlarge the size of the first window when the difference does not exceed the threshold and move the second window to a location following the last byte of the enlarged first window. 9 . The system of claim 1 wherein the representation component further comprises: a represent component configured to identify a statistical property for each transition in the list of transitions. 10 . The system of claim 1 wherein the distance component is further configured to calculate the distance based on a calculated area between segments of the hash and segments of the hash of the second file. 11 . The system of claim 10 wherein the distance component is further configured to calculate a structural distance between the hash and the hash of the second file. 12 . The system of claim 11 wherein the distance component applies a weighting factor to the structural distance. 13 . A method of generating a hash for a file comprising: receiving a file; preprocessing the file to convert the file to a signal representative of the bytes in the file; identifying a list of segments in the preprocessed file based on statistical property differences with other portions of the preprocessed file; representing the preprocessed file by generating a level value for each segment in the list of segments as a list of levels; and generating a hash of the file, wherein the hash comprises the list of segments and the list of levels. 14 . The method of claim 13 wherein identifying the list of segments further comprises: determining a size of a first window; placing the first window on a first byte of the preprocessed file; placing a second window at a first byte position after an end byte of the first window; calculating a first statistical property for the first window and a second statistical property for the second window; and determining if a difference between the first statistical property and the second statistical property exceeds a threshold value; and noting as a transition point the end byte when the difference exceeds the threshold value. 15 . The method of claim 14 , when the difference does not exceed the threshold value, further comprising: increasing the size of the first window; moving the second window to the first byte position after a new end byte of the first window; and repeating the steps of calculating, determining and noting. 16 . The method of claim 14 when the difference exceeds the threshold value, further comprising: moving the first window to the first byte position of the second window; resetting the size of the first window to an original size; and repeating the steps of placing, calculating, determining and noting for the first window and the second window for the new location. 17 . The method of claim 13 wherein the level value is generated by calculating a statistical property for each segment in the list of segments. 18 . A computer readable storage device having computer executable instructions that when executed by at least one computer cause the at least one computer to: receive a hash of a file to analyze; obtain a second hash, the second hash representative of a second file to compare with the file; determine an area between the hash and the second hash; determine a structural distance between the hash and the second hash; calculate a distance between the hash and the second hash based on the area and the structural distance; determine if the two hashes are similar or dissimilar based on a comparison of the calculated distance to a threshold value. 19 . The computer readable storage device of claim 18 wherein calculate the distance between the hash and the second hash further comprises instructions to applying a weighting factor to the structural distance. 20 . The computer readable storage device of claim 18 wherein receive a hash of a file further comprises instructions to: receive the file; provide the file to a representation component; and receive from the representation component a hash of the file.
Test or assess a computer or a system · CPC title
by checking file integrity · CPC title
by virus signature recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.