Code recommendation
US-2015378692-A1 · Dec 31, 2015 · US
US10628577B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10628577-B2 |
| Application number | US-201615296024-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 17, 2016 |
| Priority date | Jul 9, 2015 |
| Publication date | Apr 21, 2020 |
| Grant date | Apr 21, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and computer program embodiments are disclosed for detecting software components in a software codebase. In an embodiment, a source file containing source code may be received, and a code signature may be generated for the source file based on a determined structure of the source code. The generated code signature may then be compared to signatures stored in a reference database to identify matching software files. In an embodiment, the reference database may store a plurality of code signatures corresponding to software files. A list of the identified software files may be created and presented to a user.
Opening claim text (preview).
We claim: 1. A system comprising: one or more computing devices; a reference database storing reference information comprising a code signature corresponding to a software file; a signature generator implemented on the one or more computing devices and configured to: receive an original source file comprising source code; determine a programming language of the source code in the original source file; create an updated source file from a first portion of the original source file corresponding to a predetermined character sequence specific to the determined programming language, wherein a second portion of the original source file is omitted from the updated source file; replace one or more language keywords or key phrases with a representation in the updated source file to produce an encoded sequence, wherein the representation is a compact byte representation smaller in size than the corresponding language reserved keywords or key phrases; and generate an updated code signature based at least in part on content of the updated source file; and an analyzer implemented on the one or more computing devices and configured to: perform a comparison of the generated updated code signature with the code signature from the reference information stored in the reference database; identify, based on the comparison, a matching code signature from within the software file corresponding to the source file; and add an entry, corresponding to the matching code signature, to a list of identified software files. 2. The system of claim 1 , wherein the code signature comprises a hash of the encoded sequence. 3. The system of claim 1 , wherein the signature generator is further configured to: divide the source code of the original source file into a first code module and a second code module; and generate a first code signature corresponding to the first code module and a separate second code signature corresponding to the second code module. 4. The system of claim 3 , wherein the signature generator is further configured to: aggregate the first and second code signatures into the code signature for the updated source file. 5. The system of claim 3 , wherein the analyzer is further configured to: compare the first and second code signatures to the code signature stored in the reference database to identify a further match in the software file to the original source file. 6. The system of claim 1 , wherein the signature generator is further configured to: parse the original source file to determine an identifying attribute. 7. The system of claim 6 , wherein the analyzer is further configured to: utilize the identifying attributes to identify a further match. 8. The system of claim 6 , wherein the identifying attributes of the original source file comprise at least one of authorship information, copyright ownership information, license information, license obligations, encryption scheme, code quality attributes, or file origin. 9. The system of claim 6 , wherein the identifying attribute of the original source file comprises at least one of authorship information, copyright ownership information, license information, license obligations, encryption scheme, code quality attributes, or file origin. 10. The system of claim 1 , wherein the reference database is accessible via an application programming interface (API). 11. A method, comprising: receiving, by at least one computer processor, an original source file comprising source code; determining, by the at least one computer processor, a programming language of the source code in the original source file; creating, by the at least one computer processor, an updated source file from a first portion of the original source file corresponding to a predetermined character sequence specific to the determined programming language, wherein a second portion of the original source file is omitted from the updated source file; replacing, by the at least one computer processor, one or more language keywords or key phrases with a representation in the updated source file to produce an encoded sequence, wherein the representation is a compact byte representation smaller in size than the corresponding language reserved keywords or key phrases; generating, by the at least one computer processor, an updated code signature based at least in part on content of the updated source file; performing, by the at least one computer processor, a comparison of the generated updated code signature with the code signature from reference information stored in a reference database comprising a code signature corresponding to a software file; identifying, by the at least one computer processor, based on the comparison, a matching code signature from within the software file corresponding to the source file; and adding an entry, by the at least one computer processor, corresponding to the matching code signature, to a list of identified software files. 12. The method of claim 11 , wherein the generating the code signature comprises: hashing, by the at least one computer processor, the encoded sequence to produce at least part of the code signature. 13. The method of claim 11 , further comprising: dividing, by the at least one computer processor, the source code of the original source file into a first code module and a second code module; and generating, by the at least one computer processor, a first code signature corresponding to the first code module and a separate second code signature corresponding to the second code module. 14. The method of claim 13 , further comprising: aggregating, by the at least one computer processor, the first and second code signatures into the code signature for the updated source file. 15. The method of claim 13 , further comprising: comparing, by the at least one computer processor, the first and second code signatures to the code signature stored in the reference database to identify a further match in the software files to the original source file. 16. The method of claim 11 , further comprising: parsing, by the at least one computer processor, the original source file to determine an identifying attribute. 17. The method of claim 16 , further comprising: utilizing the identifying attribute to identify a further match. 18. The method of claim 16 , wherein the identifying attribute of the original source file comprises at least one of authorship information, copyright ownership information, license information, license obligations, encryption scheme, code quality attributes, or file origin.
at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability · CPC title
Interprogram communication · CPC title
Version control (security arrangements therefor G06F21/57); Configuration management · CPC title
Code clone detection · CPC title
Structural analysis for program understanding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.