High throughput embedding generation system for executable code and applications

US11080236B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11080236-B1
Application numberUS-202117198312-A
CountryUS
Kind codeB1
Filing dateMar 11, 2021
Priority dateJul 18, 2019
Publication dateAug 3, 2021
Grant dateAug 3, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A novel high-throughput embedding generation and comparison system for executable code is presented in this invention. More specifically, the invention relates to a deep-neural-network based graph embedding generation and comparison system. A novel bi-directional code graph embedding generation has been proposed to enrich the information extracted from code graph. Furthermore, by deploying matrix manipulation, the throughput of the system has significantly increased for embedding generation. Potential applications such as executable file similarity calculation, vulnerability search are also presented in this invention.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for high-throughput embedding generation and comparison, comprising: circuitry configured to take executable code and extract Bi-directional Attributable Control Flow Graphs (ACFGs) of functions from the executable code; conduct high-throughput embedding generation to generate embeddings for the Bi-directional ACFGs; conduct high-throughput similarity comparison of the functions using the embeddings; and conduct the high-throughput similarity comparison to compare a similarity of a plurality of executable files by applying Principal Component Analysis on the embeddings. 2. The system of claim 1 , wherein each of the Bi-directional ACFGs is a directed graph with two edges. 3. The system of claim 1 , wherein the high-throughput embedding generation deploys stacked Bi-directional ACFGs to maximize a throughput of an embedding network. 4. They system of claim 1 , wherein the circuitry is further configured to implement a graph embedding network to which the bi-directional ACFGs are input. 5. The system of claim 1 , wherein the circuitry is configured to conduct the high-throughput similarity comparison using matrix manipulation. 6. The system of claim 5 , wherein the circuitry is configured to implement the matrix manipulation by stacking function embedding vectors into matrix format, and processing the function embedding vectors in batches through one calculation to provide high speed cosine similarity calculation. 7. The system of claim 1 , wherein the circuitry is configured to implement an executable file similarity comparison system using the high-throughput embedding generation and comparison. 8. The system of claim 7 , wherein the circuitry is configured to conduct the principal component analysis on the embeddings of the functions extracted from the plurality of executable files to generate the embeddings of the executable files. 9. The system of claim 7 , wherein the circuitry is configured to use cosine similarity of the embeddings of the executable files to calculate the similarity of the executable files. 10. The system of claim 1 , wherein the circuitry is configured to implement a vulnerability search system using the high-throughput embedding generation and comparison. 11. The system of claim 10 , wherein the circuitry is configured to use the high-throughput embedding generation and comparison to identify candidates list of vulnerable functions. 12. The system of claim 10 , wherein the circuitry is configured to use condition formula comparison to identify true positive vulnerable functions in the candidates list. 13. A method for high-throughput embedding generation and comparison, compromising: taking executable code and extracting Bi-directional Attributable Control Flow Graphs (ACFGs) of functions from the executable code; conducting high-throughput embedding generation to generate embeddings for the Bi-directional AFCGs; conducting high-throughput similarity comparison of the functions using the embeddings; and conducting the high-throughput similarity comparison to compare a similarity of a plurality of executable files by applying Principal Component Analysis on the embeddings. 14. A non-transitory, computer-readable storage medium storing instructions that, when executed on a computer, control the computer to perform a method for high-throughput embedding generation and comparison, compromising: taking executable code and extracting Bi-directional Attributable Control Flow Graphs (ACFGs) of functions from the executable code; conducting high-throughput embedding generation to generate embeddings for the Bi-directional AFCGs; conducting high-throughput similarity comparison of the functions using the embeddings; and conducting the high-throughput similarity comparison to compare a similarity of a plurality of executable files by applying Principal Component Analysis on the embeddings.

Assignees

Inventors

Classifications

  • G06F17/10Primary

    Complex mathematical operations {(function generation by table look-up G06F1/03; evaluation of elementary functions by calculation G06F7/544)} · CPC title

  • G06F16/148Primary

    File search processing · CPC title

  • based on approximation criteria, e.g. principal component analysis · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • Graphical models, e.g. Bayesian networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11080236B1 cover?
A novel high-throughput embedding generation and comparison system for executable code is presented in this invention. More specifically, the invention relates to a deep-neural-network based graph embedding generation and comparison system. A novel bi-directional code graph embedding generation has been proposed to enrich the information extracted from code graph. Furthermore, by deploying matr…
Who is the assignee on this patent?
Deepbits Tech Inc, Univ California
What technology area does this patent fall under?
Primary CPC classification G06F17/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 03 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).