Software version fingerprint generation and identification

US10338916B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10338916-B2
Application numberUS-201615371678-A
CountryUS
Kind codeB2
Filing dateDec 7, 2016
Priority dateDec 7, 2016
Publication dateJul 2, 2019
Grant dateJul 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for accessing a source code repository comprising a plurality of versions of code, analyzing the plurality of versions of code of the component to compute metrics to identify each version of code, analyzing the metrics to determine a subset of the metrics to use to as a fingerprint definition to identify each version of the code, generating a fingerprint for each version of code using the fingerprint definition, generating a fingerprint matrix with the fingerprint for each version of code for the software component and storing the fingerprint definition and the fingerprint matrix.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: accessing, at a server computer, a source code repository comprising a plurality of versions of code for a software component; analyzing, by the server computer, the plurality of versions of code of the component to compute values of metrics to identify each version of code for the software component; wherein the metrics are stored in a metrics matrix along with the plurality of versions of code; analyzing, by the server computer, the metrics to determine a subset of the metrics to use to as a fingerprint definition to identify each version of the code for the software component by: (1) determining a best candidate metric among the metrics in the metrics matrix to identify the most versions of the plurality of versions of code; (2) adding the best candidate metric to a set of optimal metrics; (3) determining which versions of code of the plurality of versions of code can be identified with the best candidate metric; (4) removing the versions of code of the plurality of versions of code that can be identified with the best candidate metric and the optimal metrics from the metrics matrix to generate a reduced metrics matrix; and (5) repeating (1)-(4) of the process on the reduced metrics matrix until all versions of the plurality of versions of code are uniquely identified by a combination of selected metrics or until there is one or more versions of code that cannot be uniquely identified using the metrics; and based on determining that all versions of the plurality of version of code are uniquely identified by the combination of selected metrics, and wherein the set of optimal metrics is the subset of the metrics to use as the fingerprint definition, generating, by the server computer, a fingerprint for each version of code for the software component, using the fingerprint definition; generating, by the server computer, a fingerprint matrix with the fingerprint for each version of code for the software component; and storing, by the server computer, the fingerprint definition and the fingerprint matrix. 2. The method of claim 1 , wherein analyzing the plurality of versions of code of the component to compute values of metrics comprises: retrieving the metrics; retrieving operational instructions for computing the values of the metrics; computing values corresponding to the metrics on the source code repository; and generating a vector with the computed values of the metrics. 3. The method of claim 1 , wherein the metrics include at least one from a group comprising: name and size of classes, name and size of methods, number of methods, name and type of method parameters, name and type of local variables, conditional instruction branching conditions, cyclomatic complexity by method, (WMC) weighted methods per class, (DIT) depth of inheritance tree, (NOC) number of children, (CBO) coupling between object classes, (RFC) response for a class, (LCOM) lack of cohesion in methods, (Ca) afferent couplings, (NPM) number of public methods, and Chidamber and Kemerer metrics. 4. The method of claim 1 , further comprising: receiving a new version of code for the software component; generating a fingerprint for the new version of code for the software component using the fingerprint definition; determining that the fingerprint for the new version of code is unique from other fingerprints in the fingerprint matrix; and storing the new fingerprint for the new version of code in the fingerprint matrix. 5. The method of claim 1 , further comprising: receiving a new version of code for the software component, generating a fingerprint for the new version of code for the software component using the fingerprint definition; determining that the fingerprint for the new version of code is not unique from other fingerprints in the fingerprint matrix; analyzing the plurality of versions of code of the component, including the new version of code, to compute metrics to identify a version of the code for the software component; analyzing, by the server computer, the metrics to determine a subset of the metrics to use as a new fingerprint definition to identify each version of the code for the software component; generating, by the server computer, a fingerprint for each version of code for the software component, using the new fingerprint definition; generating, by the server computer, an updated fingerprint matrix with the fingerprint for each version of code for the software component; storing, by the server computer, the new fingerprint definition and the updated fingerprint matrix. 6. The method of claim 1 , wherein determining the best candidate metric among the metrics to identify the most versions of the plurality of versions of code comprises choosing a metric with the largest Shannon entropy contribution as the best candidate metric. 7. The method of claim 1 , further comprising: receiving, at the server computer, a request for version analysis, the request comprising a package associated with the software component; generating, by the server computer, a fingerprint for the package using the fingerprint definition; accessing, by the server computer, the fingerprint matrix to determine the version of the package using the fingerprint for the package; and providing, by the server computer, the version of the package for the component. 8. The method of claim 1 , wherein providing the version of the package for the component comprises providing a list with each version of the plurality of versions of the source code repository and a level of matching with the package. 9. The method of claim 8 , wherein the level of matching is evaluated by computing the Euclidean distance between the package fingerprint and the fingerprints of each version of the plurality of versions. 10. A server computer comprising: a processor; and a computer-readable medium coupled with the processor, the computer-readable medium comprising instructions stored thereon that are executable by the processor to cause the server computer to perform operations comprising: accessing a source code repository comprising a plurality of versions of code for a software component; analyzing the plurality of versions of code of the component to compute values of metrics to identify each version of code for the software component; wherein the metrics are stored in a metrics matrix along with the plurality of versions of code; analyzing the metrics to determine a subset of the metrics to use to as a fingerprint definition to identify each version of the code for the software component by: (1) determining a best candidate metric among the metrics in the metrics matrix to identify the most versions of the plurality of versions of code; (2) adding the best candidate metric to a set of optimal metrics; (3) determining which versions of code of the plurality of versions of code can be identified with the best candidate metric; (4) removing the versions of code of the plurality of versions of code that can be identified with the best candidate metric and the optimal metrics from the metrics matrix to generate a reduced metrics matrix; and (5) repeating (1)-(4) of the process on the reduced metrics matrix until all versions of the plurality of versions of code are uniquely identified by a combination of selected metrics or until there is one or more versions of code that cannot be uniquely identified using the metrics; and based on determining that all versions of the plurality of version of code are uniquely identified by the combination of selected metrics, and wherein the set of optimal metrics is the subset of the metrics to use as the fingerprint definition, gene

Assignees

Inventors

Classifications

  • G06F8/71Primary

    Version control (security arrangements therefor G06F21/57); Configuration management · CPC title

  • Software reuse · CPC title

  • Software metrics · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10338916B2 cover?
Systems and methods are provided for accessing a source code repository comprising a plurality of versions of code, analyzing the plurality of versions of code of the component to compute metrics to identify each version of code, analyzing the metrics to determine a subset of the metrics to use to as a fingerprint definition to identify each version of the code, generating a fingerprint for eac…
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F8/71. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).