Machine learning-based universal software component identification
US-12175241-B1 · Dec 24, 2024 · US
US9262157B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9262157-B2 |
| Application number | US-201514696185-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 24, 2015 |
| Priority date | Apr 24, 2014 |
| Publication date | Feb 16, 2016 |
| Grant date | Feb 16, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for matching and attributing code violations. One of the methods includes receiving a snapshot S of a code base of source code and a different snapshot T of the code base. Data representing first violations in the snapshot S and second violations in the snapshot T is received. Pairs of matching violations are determined using performing two or more matching processes, including performing a first matching process, the first matching process determining first pairs of matching violations according to a first matching algorithm and performing a second matching process, the second matching process determining second pairs of matching violations according to a second matching algorithm from violations not matched by the first matching process. The first pairs of matching violations and the second pairs of matching violations are included in the determined pairs of matching violations.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method to distinguish new violations from old violations in snapshots of a code base, the method comprising: receiving data representing one or more parent snapshots S of a code base and a child snapshot T of the code base, wherein each of the one or more parent snapshots S is a parent of the child snapshot T in a revision graph of the code base, wherein each snapshot includes the source code of files of the code base as the files existed at a particular point in time; receiving data representing parent violations in the one or more parent snapshots S and child violations in the child snapshot T, each violation having a respective location in its snapshot, a respective violation snippet, and a respective violation type, the violation snippet being source code that violates a respective coding standard; identifying, by one or more computers, as unmatched child violations those child violations that do not have a matching parent violation in any of the parent snapshots S; and attributing to the child snapshot T a violation introduction for each unmatched child violation. 2. The method of claim 1 , wherein the one or more parent snapshots S comprise two or more parent snapshots S. 3. The method of claim 1 , further comprising: determining pairs of matching violations, each pair of matching violations including a first violation in a snapshot S of the one or more snapshots S and a corresponding second violation in the snapshot T, wherein the first and second violations have the same type; wherein determining the pairs of matching violations comprises performing two or more matching processes, including: performing a first matching process, the first matching process determining first pairs of matching violations according to a first matching algorithm; performing a different second matching process, the second matching process determining second pairs of matching violations according to a second matching algorithm from violations not matched by the first matching process; and including the first pairs of matching violations and the second pairs of matching violations in the determined pairs of matching violations. 4. The method of claim 3 , wherein: the first and second matching processes comprise two of a line matching process, a snippet matching process, or a hash matching process; the line matching process comprises: identifying one or more pairs of matching source code files, each pair including a first file of a snapshot S and a second file of the snapshot T; for each first file and second file of each pair of matching source code files, performing a diffing method to partition the first file and the second file into corresponding line range pairs, each line range pair being a pair of a first line range from the first file and a second line range from the second file; and designating as a pair of matching violations, each pair of violations made up of a first violation in the snapshot S and a second violations in the snapshot T, wherein the first and second violations satisfy matching conditions, the matching conditions including that: the first and second violations have the same type; and a position of the first violation within a first line range differs from a position of the second violation within a corresponding second line range by no more than a threshold amount; the snippet matching process comprises: determining pairs of matching violations, each pair of matching violations including one violation in the snapshot S and one violation in the snapshot T, including: determining a first violation in the snapshot S that has a first violation snippet that matches a second violation snippet of a second violation in the snapshot T; determining that the first violation in the snapshot S has a type that matches a type of the second violation in the snapshot T; determining that a path and a location of the first violation matches a path and a location of the second violation; and designating the first violation and the second violation as a pair of matching violations; and the hash matching process comprises: determining pairs of matching violations, each pair of matching violations including one violation in the snapshot S and one violation in the snapshot T having a same type, including: determining a first set of one or more hash values for a first violation in the snapshot S and a second set of one or more hash values for a second violation in the snapshot T; determining that at least one of the hash values of a first violation in the first set matches a corresponding hash value of a second violation in the second set and that the first violation has a type that matches a type of the second violation; and designating the first violation and the second violation as a pair of matching violations. 5. The method of claim 4 , wherein the one or more hash values for the first or the second violation include one or more hash values computed from a token occurring before the respective violation or after the respective violation or both. 6. The method of claim 3 , wherein: the first matching process is a line matching process and the second matching process is a snippet matching process. 7. The method of claim 6 , wherein determining the pairs of matching violations further comprises: performing a different third matching process, the third matching process determining third pairs of matching violations according to a third matching algorithm from violations not matched by the first matching process or the second matching process; and including the third pairs of matching violations in the determined pairs of matching violations. 8. The method of claim 7 , wherein: the third matching process comprises one of the line matching process, the snippet matching process, or the hash matching process. 9. The method of claim 8 , further comprising: attributing the violation introduction to an entity responsible for the child snapshot T. 10. The method of claim 1 , further comprising: identifying as unmatched parent violations those violations that occur in all the one or more parent snapshots S and that do not occur in the child snapshot T; and attributing to the snapshot T a violation correction for each unmatched parent violation. 11. The method of claim 10 , further comprising: attributing the violation correction to an entity responsible for the child snapshot T. 12. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving data representing one or more parent snapshots S of a code base and a child snapshot T of the code base, wherein each of the one or more parent snapshots S is a parent of the child snapshot Tin a revision graph of the code base, wherein each snapshot includes the source code of files of the code base as the files existed at a particular point in time; receiving data representing parent violations in the one or more parent snapshots S and child violations in the child snapshot T, each violation having a respective location in its snapshot, a respective violation snippet, and a respective violation type, the violation snippet being source code that violates a respective coding standard; identifying as unmatched child violations those child violations that do not have a matching parent violation in any of the parent snapshots 5 ; and attributing to the child snapshot T a violation introduction for each unmatched child violation. 13. The system of claim 12 , where
File search processing · CPC title
Software metrics · CPC title
Performance of employee with respect to a job function · CPC title
Version control (security arrangements therefor G06F21/57); Configuration management · CPC title
Structural analysis for program understanding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.