Transitive source code violation matching and attribution

US9507590B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9507590-B2
Application numberUS-201414565314-A
CountryUS
Kind codeB2
Filing dateDec 9, 2014
Priority dateDec 8, 2014
Publication dateNov 29, 2016
Grant dateNov 29, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for matching and attributing code violations. One of the methods includes receiving a plurality of snapshots of a code base, including data representing a revision graph of the snapshots of the code base and data representing respective violations in each of the plurality of snapshots. A plurality of transitively matched violations in the code base are generated, wherein each transitively matched violation represents a respective sequence of matching violations from a first violation of a first snapshot to a second violation of a second snapshot, wherein each transitively matched violation identifies a respective first violation representing an initial occurrence of a coding defect in the code base and a respective second violation representing a last occurrence of the coding defect in the code base.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving a plurality of snapshots of a code base, including data representing a revision graph of the snapshots of the code base; receiving data representing respective violations occurring in each of the plurality of snapshots; and generating a plurality of transitively matched violations in the code base, wherein each transitively matched violation is data that represents a sequence of matching violations occurring in a sequence of respective snapshots of the revision graph of the code base, wherein each transitively matched violation identifies a respective first violation representing a first-observed occurrence of a coding defect in the code base in an first snapshot and a respective second violation representing a last-observed occurrence of the coding defect in a second snapshot, and wherein the transitively matched violation represents that the coding defect occurs in any intervening snapshots in the revision graph between the first snapshot and the second snapshot of the transitively matched violation. 2. The method of claim 1 , wherein each transitively matched violation includes an identifier that indicates a snapshot in which the coding defect was first observed to be absent from the code base or a null identifier representing that the coding defect never became absent from the code base. 3. The method of claim 1 , further comprising: obtaining, for a particular snapshot, a partial transitively matched violation for a parent snapshot of the particular snapshot, wherein the partial transitively matched violation includes a parent violation occurring in the parent snapshot and an associated ancestor violation representing an initial occurrence of a coding defect in an ancestor snapshot of the revision graph; determining that the parent violation matches a particular violation of the particular snapshot; and generating a second partial transitively matched violation comprising (i) the ancestor violation representing the initial occurrence of the coding defect in the ancestor snapshot of the revision graph and (ii) the particular violation of the particular snapshot. 4. The method of claim 3 , further comprising: determining that the particular violation of the particular snapshot does not match any violations in a child snapshot of the particular snapshot; and generating a full transitively matched violation that includes the ancestor violation, the particular violation of the particular snapshot, and an identifier of the child snapshot, wherein the identifier of the child snapshot represents a first snapshot in which the coding defect was first observed to be absent from the code base. 5. The method of claim 1 , further comprising: generating a topological ordering of the snapshots in the code base, wherein generating the plurality of transitively matched violations in the code base comprises generating a set of partial transitively matched violations for each of the snapshots in an order determined by the topological ordering. 6. The method of claim 1 , further comprising: determining that a particular snapshot of the plurality of snapshots is unanalyzable; and removing the particular snapshot from the revision graph to generate a modified revision graph, including assigning all child snapshots of the particular snapshot to be children of each parent snapshot of the particular snapshot and assigning all parent snapshots of the particular snapshot to be parents of each child snapshot of the particular snapshot. 7. The method of claim 1 , further comprising: identifying an existing violation that occurs in a most recent snapshot of the revision graph, the existing violation representing a coding defect that exists in the most recent snapshot of the code base; obtaining, from the plurality of transitively matched violations, a first transitively matched violation that includes the existing violation, wherein the first transitively matched violation also identifies an ancestor violation in an ancestor snapshot; identifying a responsible entity for the ancestor snapshot; and designating the responsible entity for the ancestor snapshot as an author of the coding defect. 8. The method of claim 1 , further comprising: identifying an existing violation that occurs in a most recent snapshot of the revision graph; obtaining, from the plurality of transitively matched violations, a first transitively matched violation including an ancestor violation in an ancestor snapshot, the existing violation, and a null identifier representing that a coding defect represented by the first violation was never observed to be absent from the code base; and computing a length of time that the coding defect existed in the code base, the length of time being based on an elapsed time since the ancestor snapshot was committed. 9. The method of claim 1 , further comprising: obtaining, from the plurality of transitively matched violations, a plurality of first transitively matched violations that represent coding defects fixed by a first developer; determining how many of the first transitively matched violations represent coding defects introduced by each of a plurality of other developers; and determining a second developer whose violations are most often fixed by the first developer. 10. The method of claim 9 , further comprising: automatically generating a notification that suggests pairing the first developer and the second developer together. 11. The method of claim 1 , further comprising: computing respective representative durations for each type of transitively matched violation in the plurality of transitively matched violations, including comparing respective commit times of a first snapshot and a second snapshot of each type of transitively matched violation; and computing a representative duration for each respective type of coding defect, including comparing, for each transitively matched violation of a particular type, a first commit time of an ancestor violation of the transitively matched violation and a second commit time of a first snapshot in which the coding defect became absent from the code base. 12. The method of claim 11 , further comprising: assigning respective levels of priority to each violation type based on the representative durations of each violation type, including assigning higher priorities to violation types having lower representative durations and assigning lower priorities to violation types having higher representative durations. 13. The method of claim 12 , further comprising: determining that one of more high-priority violations occur in most recent snapshot of the code base; and generating an automatic notification to one or more developers of the code base of the high-priority violations. 14. The method of claim 1 , further comprising: computing respective durations for each transitively matched violation in the plurality of transitively matched violations, including comparing, for each transitively matched violation, a first commit time of an ancestor snapshot of the transitively matched violation and a second commit time of first snapshot in which the coding defect became absent; and computing a measure of central tendency of the durations of the transitively matched violations of the code base. 15. The method of claim 1 , further comprising: computing, for each of one or more responsible entities, respective durations for each transitively matched violation representing a coding defect introduced by the responsible entity and fixed by the responsible entity; and computing a measure

Assignees

Inventors

Classifications

  • using software metrics · CPC title

  • G06F8/71Primary

    Version control (security arrangements therefor G06F21/57); Configuration management · CPC title

  • Dependency analysis; Data or control flow analysis · CPC title

  • Analysis of software for verifying properties of programs (testing of software G06F11/3668) · CPC title

  • Structural analysis for program understanding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9507590B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for matching and attributing code violations. One of the methods includes receiving a plurality of snapshots of a code base, including data representing a revision graph of the snapshots of the code base and data representing respective violations in each of the plurality of snapshots. A plurality of…
Who is the assignee on this patent?
Semmle Ltd
What technology area does this patent fall under?
Primary CPC classification G06F11/3616. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).