Combined structure and import behavior signatures based malware learning and detection

US12367280B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12367280-B2
Application numberUS-202218050508-A
CountryUS
Kind codeB2
Filing dateOct 28, 2022
Priority dateOct 28, 2022
Publication dateJul 22, 2025
Grant dateJul 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system has been created that represents a binary file with a combination of signatures that account for both structure as expressed by control flow and an abstraction of functionality as expressed by import behavior. The system analyses intra-subroutine control flow and calls to import code units. The system generates structure signatures for the subroutines based on the intra-subroutine control flows. The system also generates an import behavior signature based on calls to import code units and caller-callee relationships between the subroutines and the import code units. The system uses the structure signatures to identify the caller subroutines in generating the import behavior signature. The combination of structure signatures and import behavior signature allows for accurate determination of code similarity without the noise of superficial variations in code organization and other mutations or alterations that facilitate avoiding malware detection.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: determining, based on program code from a disassembled first binary file, a plurality of intra-subroutine control flows, a set of one or more import code unit identifiers, and call relationships between subroutines of the program code and the set of import code unit identifiers; generating a plurality of signatures for the subroutines based, at least in part, on the plurality of intra-subroutine control flows; generating a second signature based, at least in part, on the set of import code unit identifiers and the call relationships; and combining the plurality of signatures with the second signature as a representation of structure and import behavior of the first binary file. 2. The method of claim 1 further comprising determining a malicious or benign verdict for the first binary file based, at least in part, on the combination of the plurality of signatures with the second signature. 3. The method of claim 1 , wherein generating the plurality of signatures for the subroutines comprises, for each subroutine: creating a representation of the subroutine that identifies basic blocks of the subroutine, that indicates size of each basic block, and that indicates control flow among the basic blocks based on the one of the plurality of intra-subroutine control flows corresponding to the subroutine; and hashing the subroutine representation to generate the one of the plurality of signatures corresponding to the subroutine. 4. The method of claim 3 , wherein creating the representation that indicates control flow comprises creating the representation to indicate at least one of types of jumps in the basic blocks and successor blocks of the basic blocks. 5. The method of claim 3 , wherein creating the representations comprises normalizing identifiers of the basic blocks. 6. The method of claim 5 , wherein normalizing identifiers of the basic blocks comprises, for each basic block of a subroutine, determining an offset of the basic block relative to a beginning of the corresponding subroutine. 7. The method of claim 1 , further comprising: forming a signature vector with the plurality of signatures, wherein combining the plurality of signatures with the second signature comprises associating the signature vector with the second signature. 8. The method of claim 7 , wherein forming the signature vector comprises: deterministically ordering the plurality of signatures in the signature vector. 9. The method of claim 1 further comprising generating clusters based on combined signatures of sample binary files and corresponding verdicts, wherein the combined signatures include the combined plurality of signatures and the second signature and the sample binary files include the first binary file. 10. The method of claim 9 , wherein generating the clusters comprises: generating fuzzy representations of each of the combined signatures; clustering the fuzzy representations; and for each cluster, indicating a malicious or benign verdict based, at least in part, on verdicts of cluster members. 11. The method of claim 1 further comprising: for each import code unit, determining which of the subroutines calls the import code unit; and associating the signature of each caller subroutine with the import code unit, wherein generating the second signature comprises generating the second signature based, at least in part, on associations of the set of import code units with the signatures of caller subroutines. 12. The method of claim 11 , wherein generating the second signature further comprises, for each import code unit, determining a quantity of call sites in each caller subroutine, wherein the second signature is also generated based on the quantity of call sites. 13. The method of claim 1 further comprising: generating a first file signature of the first binary file; and determining that the first file signature does not have a match in a cache of binary file signatures, wherein generating the plurality of signatures, generating the second signature, and combining the plurality of signatures with the second signature is based, at least in part, on determining that the first file signature does not have a match in the cache. 14. A non-transitory, machine-readable medium having program code stored thereon, the program code comprising instructions to: based on program code generated from disassembly of a binary program code, create a control flow based representation of each subroutine; generate a subroutine signature for each control flow based representation; generate an import behavior signature based, at least in part, on an import table of the program code; and indicate an association of the subroutine signatures and the import behavior signature as a representation of structure and import behavior of the binary program code. 15. The machine-readable medium of claim 14 , wherein the instructions to create the control flow based representations of the subroutines comprise instructions to: for each subroutine, determine descriptors for each basic block including a normalized identifier of the basic block; aggregate the basic block descriptors; and hash the aggregated basic block descriptors. 16. The machine-readable medium of claim 15 , wherein the instructions to determine descriptors including a normalized identifier of each basic block of each subroutine comprise instructions to determine an offset of each basic block relative to a beginning of the corresponding subroutine and use the relative offset as the normalized basic block identifier. 17. The machine-readable medium of claim 15 , wherein the descriptors for a basic block also comprise a basic block size, a jump type of the basic block, and indication of one or more successor basic blocks. 18. The machine-readable medium of claim 14 , wherein the program code further comprises instructions to deterministically order the subroutine signatures in a data structure and to associate the data structure of deterministically ordered subroutine signatures with the import behavior signature. 19. The machine-readable medium of claim 14 , wherein the instructions to generate the import behavior signature comprise instructions to: determine caller-callee relationships between the subroutines and import code units indicated in the import table; and generate the import behavior signature based, at least in part, on indications of the caller-callee relationships, wherein the indications use corresponding ones of the subroutine signatures. 20. An apparatus comprising: a processor; and a machine-readable medium having instructions stored thereon that are executable by the processor to cause the apparatus to, based on program code generated from disassembly of a binary program code, create a control flow based representation of each subroutine; generate a subroutine signature for each control flow based representation; generate an import behavior signature based, at least in part, on an import table of the program code; and indicate an association of the subroutine signatures and the import behavior signature as a functional representation of the binary program code. 21. The apparatus of claim 20 , wherein the instructions to create the control flow based representations of the subroutines comprise instructions executable by the processor to cause the apparatus to: for each subroutine, determine descriptors for each basic block including a normalized identifier of th

Assignees

Inventors

Classifications

  • to a system of files or objects, e.g. local or distributed file system or database · CPC title

  • Test or assess a computer or a system · CPC title

  • G06F21/56Primary

    Computer malware detection or handling, e.g. anti-virus arrangements · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12367280B2 cover?
A system has been created that represents a binary file with a combination of signatures that account for both structure as expressed by control flow and an abstraction of functionality as expressed by import behavior. The system analyses intra-subroutine control flow and calls to import code units. The system generates structure signatures for the subroutines based on the intra-subroutine cont…
Who is the assignee on this patent?
Palo Alto Networks Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/6218. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).