Source code search engine

US9811556B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9811556-B2
Application numberUS-201615342183-A
CountryUS
Kind codeB2
Filing dateNov 3, 2016
Priority dateJun 10, 2015
Publication dateNov 7, 2017
Grant dateNov 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A source code search comprises a two-pass search. The first pass comprises a topological measure of similarity. The second pass comprises a semantic measure of similarity. The query source code is a user-selected portion of source code. The results may be ranked and output to an I/O device.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to cause the processor to perform a method comprising: creating a respective abstract syntax tree (AST) of each of a user-defined query source code data set and at least one target source code data set, wherein the user-defined query source code data set comprises a selected portion of source code in a given programming language comprising a complete function, wherein the target source code data set comprises at least one file within at least one repository containing source code in the given programming language; calculating a respective first similarity value for each of one or more portions of each of the at least one target source code data sets, wherein each respective first similarity value comprises a topological measure of similarity between the user-defined query source code data set and each respective portion of the at least one target source code data set, which comprises a respective target source code abstract syntax subtree, wherein calculating the respective first similarity value further comprises: calculating, for the query source code abstract syntax tree, a first number of vertices and edges; calculating, for each respective target source code abstract syntax subtree, a respective second number of vertices and edges; calculating, for each respective target source code abstract syntax subtree a respective absolute value of a difference between the first number and the respective second number; and comparing, for each respective target source code abstract syntax subtree, the respective absolute value to a first threshold; identifying portions of each of the at least one target source code data sets having the respective first similarity value less than or equal to the first threshold, wherein the first threshold comprises a permissible difference in the number of vertices, edges, or vertices and edges between the user-defined query source code abstract syntax tree and the respective target source code abstract syntax subtree; calculating a respective second similarity value for each portion of the target source code data set having the respective first similarity value less than or equal to the first threshold, the respective second similarity value comprising a semantic measure of similarity between the user-defined query source code data set and each respective portion of the target source code data set having the respective first similarity value less than or equal to the first threshold, wherein calculating the respective second similarity value further comprises: identifying one or more series of operations to transform the target source code abstract syntax subtree to the query source code abstract syntax tree, wherein said series of operations comprises one or more of insert, delete, and rename operations; calculating, for each identified series of operations, a cost of the identified series of operations, wherein the cost of the identified series of operations is associated with one or more of insert, delete, and rename operations; wherein the cost of the identified series of operations is the respective second similarity value; and selecting the series of operations having a lowest cost; outputting, to a user interface, each portion of each target source code data set having the second similarity value less than or equal to a second threshold, wherein each portion is ranked according to the second similarity value.

Assignees

Inventors

Classifications

  • Source to source · CPC title

  • Code clone detection · CPC title

  • Parsing · CPC title

  • Code refactoring · CPC title

  • Programming languages or programming paradigms · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9811556B2 cover?
A source code search comprises a two-pass search. The first pass comprises a topological measure of similarity. The second pass comprises a semantic measure of similarity. The query source code is a user-selected portion of source code. The results may be ranked and output to an I/O device.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).