Software version fingerprint generation and identification
US-10338916-B2 · Jul 2, 2019 · US
US10474456B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10474456-B2 |
| Application number | US-201916415192-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 17, 2019 |
| Priority date | Dec 7, 2016 |
| Publication date | Nov 12, 2019 |
| Grant date | Nov 12, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for accessing a source code repository comprising a plurality of versions of code, analyzing the plurality of versions of code of the component to compute metrics to identify each version of code, analyzing the metrics to determine a subset of the metrics to use to as a fingerprint definition to identify each version of the code, generating a fingerprint for each version of code using the fingerprint definition, generating a fingerprint matrix with the fingerprint for each version of code for the software component and storing the fingerprint definition and the fingerprint matrix
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: accessing, by a computing device, a metrics matrix comprising candidate metrics to identify each version of code of a plurality of versions of code for a software component; generating, by the computing device, a fingerprint definition to identify each version of the code for the software component by performing operations comprising: (1) determining a best candidate metric among the candidate metrics to identify the most versions of the plurality of versions of code; (2) adding the best candidate metric to a set of optimal metrics; (3) determining which versions of code of the plurality of versions of code can be identified with the best candidate metric; (4) removing the versions of code of the plurality of versions of code that can be identified with the best candidate metric and the best candidate metric from the metrics matrix to generate a reduced metrics matrix; and (5) repeating (1)-(4) on the reduced metrics matrix until all versions of the plurality of versions of code are uniquely identified by the set of optimal metrics or until there is one or more versions of code that cannot be uniquely identified using the candidate metrics; and based on determining that all versions of the plurality of version of code are uniquely identified by the set of optimal metrics, generating a fingerprint for each version of code for the software component, using the fingerprint definition comprising the set of optimal metrics. 2. The method of claim 1 , further comprising: generating a fingerprint matrix comprising the fingerprint for each version of code for the software component; receiving a request for version analysis, the request comprising a package associated with the software component; generating a fingerprint for the package using the fingerprint definition; accessing the fingerprint matrix to determine the version of the package using the fingerprint for the package; and providing the version of the package for the component. 3. The method of claim 2 , wherein providing the version of the package for the component comprises providing a list with each version of the plurality of versions of the source code repository and a level of matching with the package. 4. The method of claim 3 , wherein the level of matching is evaluated by computing the Euclidean distance between the package fingerprint and the fingerprints of each version of the plurality of versions. 5. The method of claim 1 , further comprising: receiving a new version of code for the software component; generating a fingerprint for the new version of code for the software component using the fingerprint definition; determining that the fingerprint for the new version of code is not unique from other fingerprints in a fingerprint matrix comprising the fingerprint for each version of code for the software component; generating an updated fingerprint definition to identify each version of the code for the software component by performing operations (1)-(5); and based on determining that all versions of the plurality of version of code are uniquely identified by the set of optimal metrics, generating an updated fingerprint for each version of code for the software component, using the updated fingerprint definition comprising the set of optimal metrics. 6. The method of claim 5 , further comprising: generating an updated fingerprint matrix comprising the updated fingerprint for each version of code for the software component. 7. The method of claim 1 , further comprising: receiving a new version of code for the software component; generating a fingerprint for the new version of code for the software component using the fingerprint definition; determining that the fingerprint for the new version of code is unique from other fingerprints in a fingerprint matrix comprising the fingerprint for each version of code for the software component; and storing the new fingerprint for the new version of code in the fingerprint matrix. 8. The method of claim 1 , further comprising: accessing a second metrics matrix comprising candidate metrics to identify each version of code of a plurality of versions of code for a second software component; generating, by the computing device a second fingerprint definition to identify each version of the code for the second software component by performing operations (1)-(5); and based on determining that there is one or more version of code that cannot be uniquely identified using the candidate metrics, analyzing the plurality of versions of code of the second software component to compute additional candidate metrics to identify each version for the second software component. 9. The method of claim 1 , wherein the candidate metrics include at least one from a group comprising: name and size of classes, name and size of methods, number of methods, name and type of method parameters, name and type of local variables, conditional instruction branching conditions, cyclomatic complexity by method, (WMC) weighted methods per class, (DIT) depth of inheritance tree, (NOC) number of children, (CBO) coupling between object classes, (RFC) response for a class, (LCOM) lack of cohesion in methods, (Ca) afferent couplings, (NPM) number of public methods, and Chidamber and Kemerer metrics. 10. The method of claim 1 , wherein determining a best candidate metric among the candidate metrics to identify the most versions of the plurality of versions of code comprises choosing a candidate metric with the largest Shannon entropy contribution as the best candidate metric. 11. A computing device comprising: a memory that stores instructions; and one or more processors configured to perform operations comprising: accessing a metrics matrix comprising candidate metrics to identify each version of code of a plurality of versions of code for a software component; generating a fingerprint definition to identify each version of the code for the software component by performing operations comprising: (1) determining a best candidate metric among the candidate metrics to identify the most versions of the plurality of versions of code; (2) adding the best candidate metric to a set of optimal metrics; (3) determining which versions of code of the plurality of versions of code can be identified with the best candidate metric; (4) removing the versions of code of the plurality of versions of code that can be identified with the best candidate metric and the best candidate metric from the metrics matrix to generate a reduced metrics matrix; and (5) repeating (1)-(4) on the reduced metrics matrix until all versions of the plurality of versions of code are uniquely identified by the set of optimal metrics or until there is one or more versions of code that cannot be uniquely identified using the candidate metrics; and based on determining that all versions of the plurality of version of code are uniquely identified by the set of optimal metrics, generating a fingerprint for each version of code for the software component, using the fingerprint definition comprising the set of optimal metrics. 12. The computing device of claim 11 , the operations further comprising: generating a fingerprint matrix comprising the fingerprint for each version of code for the software component; receiving a request for version analysis, the request comprising a package associated with the software component; generating a fingerprint for the package using the fingerprint definition; accessing the fingerprint matrix to determine the version of the package using the fingerprint for the package; and providing the version of the packa
Related publications grouped by family.
Answers are generated from the same data shown on this page.