Semantically sensitive code region fingerprint calculation for programming languages

US9851959B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9851959-B2
Application numberUS-201615243516-A
CountryUS
Kind codeB2
Filing dateAug 22, 2016
Priority dateFeb 17, 2016
Publication dateDec 26, 2017
Grant dateDec 26, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Herein disclosed is an optimization for a compiler, the optimization configured to assign numeric values, or semantic fingerprints, to portions of code, and to combine these fingerprints to arrive at fingerprints for larger and larger portions of code. The fingerprints can be provided to various consumers such as code redundancy optimization modules and copyright violation and malware/virus identification modules. The fingerprints can also be used to cluster similar code, and then code within each cluster can be merged. Merger can include creating a single merged portion of code including the same portions of code from the original portions of code plus control flow and new arguments to account for differences between the original portions of code. The original portions of code can be replaced with wrappers that use new arguments to call to the merged portion of code.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory, tangible computer readable storage medium, encoded with processor readable instructions to perform a method for reducing duplicated code during compiling of source code, the method comprising: receiving a plurality of fingerprints each associated with one of a plurality of segments of an intermediate representation of the source code; grouping the plurality of segments of the intermediate representation into clusters based on a similarity of the fingerprints such that each cluster comprises segments of the intermediate representation that are substantially the same; for each cluster, recording a new merged function to a memory, the merged function including code that is the same among all segments of the intermediate representation for a given cluster; inserting control flow and new arguments into the new merged function to handle differences between the segments of the intermediate representation in the given cluster; and replacing the segments of the intermediate representation for the given cluster with wrappers that call the new merged function using the new arguments. 2. The non-transitory, tangible computer readable storage medium of claim 1 , further comprising re-assigning fingerprints to basic blocks that have been replaced by wrappers and re-assigning region fingerprints to regions whose basic blocks have been replaced by the wrappers. 3. The non-transitory, tangible computer readable storage medium of claim 1 , further comprising iteratively re-assigning fingerprints to the segments of the intermediate representation to form re-assigned fingerprints, and then performing further merging of the segments of the intermediate representation based on the re-assigned fingerprints. 4. The non-transitory, tangible computer readable storage medium of claim 3 , further comprising providing the re-assigned fingerprints to one of the following: one or more code optimizing modules configured to merge redundant and similar portions of the intermediate representation; a back end configured to transform the intermediate representation to machine code or assembly code; a copyright violation identification module that compares the re-assigned fingerprints to fingerprints of copyrighted code; and a virus or malware identification module that compares the re-assigned fingerprints to fingerprints of intermediate representations of known virus or malware code. 5. The non-transitory, tangible computer readable storage medium of claim 1 , wherein the segment of the intermediate representation is a basic block, a region, or a function. 6. The non-transitory, tangible computer readable storage medium of claim 1 , wherein the wrappers include one or more arguments from the segments of the intermediate representation. 7. A system comprising: a processing portion with one or more processing components therein; a memory coupled to the processing portion and configured to store source code and a corresponding executable code; a compiler stored on the memory and executable on the processing portion to: receive a plurality of fingerprints each associated with one of a plurality of segments of an intermediate representation of the source code; group the plurality of segments of the intermediate representation into clusters based on a similarity of the fingerprints such that each cluster comprises segments of the intermediate representation that are substantially the same; for each cluster, record a new merged function to the memory, the merged function including code that is the same among all segments of the intermediate representation for a given cluster; insert control flow and new arguments into the new merged function to handle differences between the segments of the intermediate representation in the given cluster; and replace the segments of the intermediate representation for the given cluster with wrappers that call the new merged function using the new arguments. 8. The system of claim 7 , further comprising re-assigning fingerprints to basic blocks that have been replaced by wrappers and re-assigning region fingerprints to regions whose basic blocks have been replaced by wrappers. 9. The system of claim 7 , further comprising iteratively re-assigning fingerprints to the segments of the intermediate representation to form re-assigned fingerprints and then performing further merging of the segments of the intermediate representation based on the re-assigned fingerprints. 10. The system of claim 9 , further comprising providing the re-assigned fingerprints to one of the following: one or more code optimizing modules configured to merge redundant and similar portions of the intermediate representation; a back end configured to transform the intermediate representation to machine code or assembly code; a copyright violation identification module that compares the re-assigned fingerprints to fingerprints of copyrighted code; and a virus or malware identification module that compares the re-assigned fingerprints to fingerprints of intermediate representations of known virus or malware code. 11. The system of claim 7 , wherein the segment of the intermediate representation is a basic block, a region, or a function. 12. The system of claim 7 , wherein the wrappers include one or more arguments from the segments of the intermediate representation. 13. A method comprising: receiving a plurality of fingerprints each associated with one of a plurality of segments of an intermediate representation of the source code; grouping the plurality of segments of the intermediate representation into clusters based on a similarity of the fingerprints such that each cluster comprises segments of the intermediate representation that are substantially the same; for each cluster, recording a new merged function to a memory, the merged function including code that is the same among all segments of the intermediate representation for a given cluster; inserting control flow and new arguments into the new merged function to handle differences between the segments of the intermediate representation in the given cluster; and replacing the segments of the intermediate representation for the given cluster with wrappers that call the new merged function using the new arguments. 14. The method of claim 13 , further comprising for re-assigning fingerprints to basic blocks that have been replaced by wrappers and re-assigning region fingerprints to regions whose basic blocks have been replaced by wrappers. 15. The method of claim 13 , further comprising for iteratively re-assigning fingerprints to the segments of the intermediate representation to form re-assigned fingerprints and then performing further merging of the segments of the intermediate representation based on the re-assigned fingerprints. 16. The method of claim 15 , further comprising for providing the re-assigned fingerprints to one of the following: one or more code optimizing modules configured to merge redundant and similar portions of the intermediate representation; a back end configured to transform the intermediate representation to machine code or assembly code; a copyright violation identification module that compares the re-assigned fingerprints to fingerprints of copyrighted code; and a virus or malware identification module that compares the re-assigned fingerprints to fingerprints of intermediate representations of known virus or malware code. 17. The method of claim 13 , wherein the segment of the intermediate representation is a basic block, a region, or a function.

Assignees

Inventors

Classifications

  • G06F8/4435Primary

    Detection or removal of dead or redundant code · CPC title

  • G06F8/443Primary

    Optimisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9851959B2 cover?
Herein disclosed is an optimization for a compiler, the optimization configured to assign numeric values, or semantic fingerprints, to portions of code, and to combine these fingerprints to arrive at fingerprints for larger and larger portions of code. The fingerprints can be provided to various consumers such as code redundancy optimization modules and copyright violation and malware/virus ide…
Who is the assignee on this patent?
Qualcomm Innovation Ct Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/4435. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).