Systems, devices, and methods for software coding
US-11308269-B1 · Apr 19, 2022 · US
US11972256B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11972256-B2 |
| Application number | US-202217651270-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 16, 2022 |
| Priority date | Feb 16, 2022 |
| Publication date | Apr 30, 2024 |
| Grant date | Apr 30, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system for determining code ancestry. The system includes: a memory; and a processor communicatively coupled to the memory. The processor is configured to perform a method comprising: receiving a source code file; parsing a plurality of functions out of the source code file; generating fuzzy fingerprints from the plurality of functions; and storing the fuzzy fingerprints in a graph database.
Opening claim text (preview).
What is claimed is: 1. A system for determining code ancestry, comprising: a memory; and a processor communicatively coupled to the memory, wherein the processor is configured to perform a method comprising: receiving a source code file; parsing a plurality of functions out of the source code file; sanitizing each of the plurality of functions by applying specific filtering to each of the plurality of functions, stripping unnecessary components from the source code file, and producing structure data and sanitized data; generating a plurality of fuzzy fingerprints from the plurality of functions, including generating separate fuzzy fingerprint signatures for both the structure data and the sanitized data of each of the plurality of functions that represent the plurality of functions; storing the plurality of fuzzy fingerprints in a graph database; comparing each of the plurality of fuzzy fingerprints to another one of the plurality of fuzzy fingerprints by applying fuzzy matching to determine a similarity; determining whether the plurality of fuzzy fingerprints include portions of the source code file that are a direct match, a variation, a derivative, or not a match by comparing the similarity to a threshold; and establishing an ancestry of the portions of the source code file based on the determining. 2. The system of claim 1 , wherein the sanitizing step includes stripping unnecessary components from the functions. 3. The system of claim 1 , wherein the processor is further configured to perform the method further comprising: using a graph-traversing algorithm to identify temporal and spatial relationships between the functions based on their corresponding fuzzy fingerprints in the graph database. 4. The system of claim 3 , wherein the processor is further configured to perform the method further comprising: storing the temporal and spatial relationships between the functions identified by the graph database in a secondary database. 5. The system of claim 2 , wherein the processor is further configured to perform the method further comprising: comparing the temporal and spatial relationships between the functions to the threshold to determine the similarity. 6. The system of claim 1 , wherein the parsing step includes recognizing a filetype and a programming language of the source code file. 7. The system of claim 1 , wherein the graph database is configured to generate a graph from temporal and spatial relationships between the fuzzy fingerprints using a graph-traversing algorithm. 8. A computer program product for software analysis using fuzzy fingerprinting to determine code ancestry, the computer program product comprising one or more computer readable storage media having program instructions embodied therewith, the program instructions executable by a device to cause the device to: receive a source code file of a computer program; parse a plurality of functions out of the source code file; sanitize each of the plurality of functions by applying specific filter ring to each of the plurality of functions, stripping unnecessary components from the source code file, and producing structure data and sanitized data; generate a plurality of fuzzy fingerprints from the plurality of functions, including generating separate fuzzy fingerprint signatures for both the structure data and the sanitized data of each of the plurality of functions that represent the plurality of functions; store the plurality of fuzzy fingerprints in a graph database; compare each of the plurality of fuzzy fingerprints to another one of the plurality of fuzzy fingerprints by applying fuzzy matching to determine a similarity; determine whether the plurality of fuzzy fingerprints include portions of the source code file that are a direct match, a variation, a derivative, or not a match by comparing the similarity to a threshold; and establish an ancestry of the portions of the source code file based on the determining. 9. The computer program product of claim 8 , wherein the program instructions cause the device to use a graph-traversing algorithm to identify temporal and spatial relationships between the functions based on their corresponding fuzzy fingerprints in the graph database. 10. The computer program product of claim 9 , wherein the program instructions cause the device to store the temporal and spatial relationships between the functions identified by the graph database in a secondary database. 11. The computer program product of claim 9 , wherein the program instructions cause the device to compare the temporal and spatial relationships between the functions to the threshold to determine the similarity. 12. A method for determining code ancestry, comprising: receiving a source code file; parsing a plurality of functions out of the source code file; sanitizing each of the plurality of functions by applying specific filtering to each of the plurality of functions, stripping unnecessary components from the source code file, and producing structure data and sanitized data; generating a plurality of fuzzy fingerprints from the plurality of functions, including generating separate fuzzy fingerprint signatures for both the structure data and the sanitized data of each of the plurality of functions that represent the plurality of functions; storing the plurality of fuzzy fingerprints in a graph database; comparing each of the plurality of fuzzy fingerprints to another one of the plurality of fuzzy fingerprints by applying fuzzy matching to determine a similarity; determining whether the plurality of fuzzy fingerprints include portions of the source code file that are a direct match, a variation, a derivative, or not a match by comparing the similarity to a threshold; and establishing an ancestry of the portions of the source code file based on the determining. 13. The method of claim 12 , wherein the sanitizing step includes stripping unnecessary components from the functions. 14. The method of claim 12 , further comprising: using a graph-traversing algorithm to identify temporal and spatial relationships between the functions based on their corresponding fuzzy fingerprints in the graph database. 15. The method of claim 14 , further comprising: storing the temporal and spatial relationships between the functions identified by the graph database in a secondary database. 16. The method of claim 14 , further comprising: comparing the temporal and spatial relationships between the functions to the threshold to determine the similarity. 17. The method of claim 12 , wherein the parsing step includes recognizing a filetype and a programming language of the source code file.
Structural analysis for program understanding · CPC title
Dependency analysis; Data or control flow analysis · CPC title
Version control (security arrangements therefor G06F21/57); Configuration management · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Matching criteria, e.g. proximity measures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.