What technology area does this patent fall under?

Primary CPC classification G06F21/562. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Machine learning-based determination of program code characteristics

US10917415B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10917415-B2
Application number	US-201815867251-A
Country	US
Kind code	B2
Filing date	Jan 10, 2018
Priority date	Jan 10, 2018
Publication date	Feb 9, 2021
Grant date	Feb 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique includes processing a plurality of sets of program code to extract call graphs; determining similarities between the call graphs; applying unsupervised machine learning to an input formed from the determined similarities to determine latent features of the input; clustering the determined latent features; and determining a characteristic of a given program code set of the plurality of program code sets based on a result of the clustering.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: processing a plurality of program code sets to extract call graphs; determining similarities between the call graphs; applying unsupervised machine learning to an input formed from the determined similarities to determine latent features of the input; clustering the determined latent features; and determining a characteristic of a given program code set of the plurality of program code sets based on a result of the clustering. 2. The method of claim 1 , wherein determining similarities between the call graphs comprises applying seeded graph matching to the plurality of program code sets to determine distances between pairs of the plurality of program code sets. 3. The method of claim 2 , wherein determining distances between the program code sets comprises generating a matrix. 4. The method of claim 3 , wherein generating the matrix comprises generating a similarity matrix. 5. The method of claim 3 , wherein generating the matrix comprises generating a matrix in which each row of the matrix is associated with a program code set of the plurality of program code sets, each columns of the matrix is associated with a program code set of the plurality of program code sets, a given element of the matrix is associated a pair of the program code sets of the plurality of program code sets and represents a distance between the pair. 6. The method of claim 2 , wherein applying seeded graph matching comprises applying a Fast Approximate Quadratic (FAQ) assignment algorithm. 7. The method of claim 1 , wherein determining the similarities comprises determining distances between the call graphs, and the method further comprises normalizing the distances to generate the input for the unsupervised machine learning. 8. The method of claim 1 , wherein applying the unsupervised machine learning comprises applying deep neural network learning. 9. The method of claim 1 , wherein clustering the determined latent features comprises applying k-means clustering. 10. The method of claim 1 , wherein determining the characteristic comprises identifying a characteristic associated with malicious software. 11. The method of claim 10 , further comprising taking corrective action against the given program code set in response to identifying the characteristic. 12. The method of claim 11 , wherein taking corrective action comprises quarantining the given program code set. 13. A non-transitory storage medium storing instructions that, when executed by a processor-based machine, cause a processor to: access data representing control flow graphs, wherein each control flow graph represents a set of machine executable instructions of a plurality of sets of machine executable instructions; determine a similarity matrix based on the control flow graphs; apply neural network-based machine learning to, based on the similarity matrix, determine features of the plurality of sets of machine executable instructions shared in common; cluster the features; and determine a characteristic of a given set of machine executable instructions of the plurality of sets of machine executable instructions based on a result of the clustering. 14. The storage medium of claim 13 , wherein the instructions, when executed by the processor, cause the processor to identify the given set of machine executable instructions of the plurality of sets of machine executable instructions as associated with malicious activity based on the determined features. 15. The storage medium of claim 13 , wherein the instructions, when executed by the processor, cause the processor to determine the similarity matrix based on seeded graph matching. 16. The storage medium of claim 13 , wherein the instructions, when executed by the processor, cause the processor to: train a sparse autoencoder to determine the features; and cluster the sets of machine executable instructions based on the determined features. 17. An apparatus comprising: a processor; and a storage medium to store instructions that, when executed by the processor, cause the processor to: apply seeded graph matching to call graphs associated with a plurality of program code sets to determine distances among the call graphs; apply unsupervised machine learning to the distances to determine latent features of the call graphs; cluster the determined latent features to form a plurality of clusters, wherein each cluster is associated with at least one of the plurality of program code sets, a first program code set is associated with a given cluster of the plurality of clusters, and the given cluster is associated with at least one other program code set of the plurality of program code sets; and characterize the first program code set based on the least one other program code set of the plurality of program code sets. 18. The apparatus of claim 17 , wherein the instructions, when executed by the processor, cause the processor to selectively take corrective action based on the characterization of the first program code set. 19. The apparatus of claim 17 , wherein the instructions, when executed by the processor, cause the processor to: build a sparse autoencoder; and use back propagation to train the sparse autoencoder to determine the latent features of the call graphs. 20. The apparatus of claim 19 , wherein the instructions, when executed by the processor, cause the processor to: determine hidden layers of the sparse autoencoder to reconstruct state of inputs to the hidden layers.

Assignees

Intel Corp

Inventors

Chen Li

Classifications

G06N3/04
Architecture, e.g. interconnection topology · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N20/00
Machine learning · CPC title
G06F21/562Primary
Static detection · CPC title

Patent family

Related publications grouped by family.

View patent family 65231977

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10917415B2 cover?: A technique includes processing a plurality of sets of program code to extract call graphs; determining similarities between the call graphs; applying unsupervised machine learning to an input formed from the determined similarities to determine latent features of the input; clustering the determined latent features; and determining a characteristic of a given program code set of the plurality …
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F21/562. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).