Source code violation matching and attribution

US9262157B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9262157-B2
Application numberUS-201514696185-A
CountryUS
Kind codeB2
Filing dateApr 24, 2015
Priority dateApr 24, 2014
Publication dateFeb 16, 2016
Grant dateFeb 16, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for matching and attributing code violations. One of the methods includes receiving a snapshot S of a code base of source code and a different snapshot T of the code base. Data representing first violations in the snapshot S and second violations in the snapshot T is received. Pairs of matching violations are determined using performing two or more matching processes, including performing a first matching process, the first matching process determining first pairs of matching violations according to a first matching algorithm and performing a second matching process, the second matching process determining second pairs of matching violations according to a second matching algorithm from violations not matched by the first matching process. The first pairs of matching violations and the second pairs of matching violations are included in the determined pairs of matching violations.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method to distinguish new violations from old violations in snapshots of a code base, the method comprising: receiving data representing one or more parent snapshots S of a code base and a child snapshot T of the code base, wherein each of the one or more parent snapshots S is a parent of the child snapshot T in a revision graph of the code base, wherein each snapshot includes the source code of files of the code base as the files existed at a particular point in time; receiving data representing parent violations in the one or more parent snapshots S and child violations in the child snapshot T, each violation having a respective location in its snapshot, a respective violation snippet, and a respective violation type, the violation snippet being source code that violates a respective coding standard; identifying, by one or more computers, as unmatched child violations those child violations that do not have a matching parent violation in any of the parent snapshots S; and attributing to the child snapshot T a violation introduction for each unmatched child violation. 2. The method of claim 1 , wherein the one or more parent snapshots S comprise two or more parent snapshots S. 3. The method of claim 1 , further comprising: determining pairs of matching violations, each pair of matching violations including a first violation in a snapshot S of the one or more snapshots S and a corresponding second violation in the snapshot T, wherein the first and second violations have the same type; wherein determining the pairs of matching violations comprises performing two or more matching processes, including: performing a first matching process, the first matching process determining first pairs of matching violations according to a first matching algorithm; performing a different second matching process, the second matching process determining second pairs of matching violations according to a second matching algorithm from violations not matched by the first matching process; and including the first pairs of matching violations and the second pairs of matching violations in the determined pairs of matching violations. 4. The method of claim 3 , wherein: the first and second matching processes comprise two of a line matching process, a snippet matching process, or a hash matching process; the line matching process comprises: identifying one or more pairs of matching source code files, each pair including a first file of a snapshot S and a second file of the snapshot T; for each first file and second file of each pair of matching source code files, performing a diffing method to partition the first file and the second file into corresponding line range pairs, each line range pair being a pair of a first line range from the first file and a second line range from the second file; and designating as a pair of matching violations, each pair of violations made up of a first violation in the snapshot S and a second violations in the snapshot T, wherein the first and second violations satisfy matching conditions, the matching conditions including that: the first and second violations have the same type; and a position of the first violation within a first line range differs from a position of the second violation within a corresponding second line range by no more than a threshold amount; the snippet matching process comprises: determining pairs of matching violations, each pair of matching violations including one violation in the snapshot S and one violation in the snapshot T, including: determining a first violation in the snapshot S that has a first violation snippet that matches a second violation snippet of a second violation in the snapshot T; determining that the first violation in the snapshot S has a type that matches a type of the second violation in the snapshot T; determining that a path and a location of the first violation matches a path and a location of the second violation; and designating the first violation and the second violation as a pair of matching violations; and the hash matching process comprises: determining pairs of matching violations, each pair of matching violations including one violation in the snapshot S and one violation in the snapshot T having a same type, including: determining a first set of one or more hash values for a first violation in the snapshot S and a second set of one or more hash values for a second violation in the snapshot T; determining that at least one of the hash values of a first violation in the first set matches a corresponding hash value of a second violation in the second set and that the first violation has a type that matches a type of the second violation; and designating the first violation and the second violation as a pair of matching violations. 5. The method of claim 4 , wherein the one or more hash values for the first or the second violation include one or more hash values computed from a token occurring before the respective violation or after the respective violation or both. 6. The method of claim 3 , wherein: the first matching process is a line matching process and the second matching process is a snippet matching process. 7. The method of claim 6 , wherein determining the pairs of matching violations further comprises: performing a different third matching process, the third matching process determining third pairs of matching violations according to a third matching algorithm from violations not matched by the first matching process or the second matching process; and including the third pairs of matching violations in the determined pairs of matching violations. 8. The method of claim 7 , wherein: the third matching process comprises one of the line matching process, the snippet matching process, or the hash matching process. 9. The method of claim 8 , further comprising: attributing the violation introduction to an entity responsible for the child snapshot T. 10. The method of claim 1 , further comprising: identifying as unmatched parent violations those violations that occur in all the one or more parent snapshots S and that do not occur in the child snapshot T; and attributing to the snapshot T a violation correction for each unmatched parent violation. 11. The method of claim 10 , further comprising: attributing the violation correction to an entity responsible for the child snapshot T. 12. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving data representing one or more parent snapshots S of a code base and a child snapshot T of the code base, wherein each of the one or more parent snapshots S is a parent of the child snapshot Tin a revision graph of the code base, wherein each snapshot includes the source code of files of the code base as the files existed at a particular point in time; receiving data representing parent violations in the one or more parent snapshots S and child violations in the child snapshot T, each violation having a respective location in its snapshot, a respective violation snippet, and a respective violation type, the violation snippet being source code that violates a respective coding standard; identifying as unmatched child violations those child violations that do not have a matching parent violation in any of the parent snapshots 5 ; and attributing to the child snapshot T a violation introduction for each unmatched child violation. 13. The system of claim 12 , where

Assignees

Inventors

Classifications

  • File search processing · CPC title

  • Software metrics · CPC title

  • Performance of employee with respect to a job function · CPC title

  • G06F8/71Primary

    Version control (security arrangements therefor G06F21/57); Configuration management · CPC title

  • Structural analysis for program understanding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9262157B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for matching and attributing code violations. One of the methods includes receiving a snapshot S of a code base of source code and a different snapshot T of the code base. Data representing first violations in the snapshot S and second violations in the snapshot T is received. Pairs of matching viola…
Who is the assignee on this patent?
Semmle Ltd
What technology area does this patent fall under?
Primary CPC classification G06F8/71. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).