Systems and methods for finding project-related information by clustering applications into related concept categories

US9804838B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9804838-B2
Application numberUS-201514985060-A
CountryUS
Kind codeB2
Filing dateDec 30, 2015
Priority dateSep 29, 2011
Publication dateOct 31, 2017
Grant dateOct 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, method, and computer-readable medium, is described that finds similarities among programming applications based on semantic anchors found within the source code of such applications. The semantic anchors may be API calls, such as Java's package and class calls of the JDK. Latent Semantic Indexing may be used to process the application and semantic anchor data and automatically develop a similarity matrix that contains numbers representing the similarity of one program to another.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: a processor, at least partially implemented in hardware, to: generate a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls, the similarity matrix being generated from a term document matrix using singular value decomposition, the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization, elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimension, and at least one of the API calls corresponding to one of the categories, the similarity being based on weights for the API calls contained in the plurality of computer applications, a respective weight for a respective API call in a respective computer application being based on a quantity of API calls in the respective computer application and a quantity of computer applications, of the plurality of computer applications, that contain the respective API call; receive a selection of a first computer application of the plurality of computer applications; and provide an indication of at least one second computer application, of the plurality of computer applications, using the similarity matrix and based on the selection of the first computer application. 2. The device of claim 1 , where the similarity matrix defines the similarity between the plurality of computer applications as numerical values based on the API calls in source code of the plurality of computer applications. 3. The device of claim 1 , where the processor, when generating the similarity matrix, is to: generate the similarity matrix from a plurality of vectors corresponding to the plurality of computer applications using a vector space model, the plurality of vectors including elements corresponding to the categories of the categorization, the elements including values based on a number of the API calls in source code and documentation for a computer application corresponding to a vector and in the category corresponding to one of the elements, at least one of the API calls corresponding to one of the categories. 4. The device of claim 1 , where the processor is further to: receive the plurality of computer applications from a computer application archive via a network. 5. The device of claim 1 , where the processor, when generating the similarity matrix, is to: utilize an inverse document frequency calculation to find common API calls; and filter out the common API calls from the API calls prior to the categorization of the API calls. 6. The device of claim 1 , where the first dimension of first entries are columns and the second dimension of second entries are rows. 7. The device of claim 1 , where different API calls have different weights. 8. A non-transitory computer-readable medium for storing instructions, the instructions comprising: a plurality of instructions which, when executed by one or more processors, cause the one or more processors to: generate a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls, the similarity matrix being generated from a term document matrix using singular value decomposition, the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization, elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimension, and at least one of the API calls corresponding to one of the categories, the similarity being based on weights for the API calls contained in the plurality of computer applications, a respective weight for a respective API call in a respective computer application being based on a quantity of API calls in the respective computer application and a quantity of computer applications, of the plurality of computer applications, that contain the respective API call; receive a selection of a first computer application of the plurality of computer applications; and provide an indication of at least one second computer application, of the plurality of computer applications, using the similarity matrix and based on the selection of the first computer application. 9. The non-transitory computer-readable medium of claim 8 , where the similarity matrix defines the similarity between the plurality of computer applications as numerical values based on the API calls in source code of the plurality of computer applications. 10. The non-transitory computer-readable medium of claim 8 , where the plurality of instructions, when executed by the one or more processors to generate the similarity matrix, cause the one or more processors to: generate the similarity matrix from a plurality of vectors corresponding to the plurality of computer applications using a vector space model, the plurality of vectors including elements corresponding to the categories of the categorization, the elements including values based on a number of the API calls in source code and documentation for a computer application corresponding to a vector and in the category corresponding to one of the elements, at least one of the API calls corresponding to one of the categories. 11. The non-transitory computer-readable medium of claim 8 , where the plurality of instructions, when executed by the one or more processors to generate the similarity matrix, further cause the one or more processors to: extract the API calls from source code of the plurality of computer applications. 12. The non-transitory computer-readable medium of claim 8 , where the plurality of instructions, when executed by the one or more processors to generate the similarity matrix, cause the one or more processors to: utilize an inverse document frequency calculation to find common API calls; and filter out the common API calls from the API calls prior to the categorization of the API calls. 13. The non-transitory computer-readable medium of claim 8 , where the first dimension of first entries are columns and the second dimension of second entries are rows. 14. The non-transitory computer-readable medium of claim 8 , where different API calls have different weights. 15. A method comprising: generating, by a device, a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls, the similarity matrix being generated from a term document matrix using singular value decomposition, the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization, elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimens

Assignees

Inventors

Classifications

  • using natural language analysis · CPC title

  • Version control (security arrangements therefor G06F21/57); Configuration management · CPC title

  • G06F8/70Primary

    Software maintenance or management · CPC title

  • Calculation of difference between files · CPC title

  • Clustering or classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9804838B2 cover?
A system, method, and computer-readable medium, is described that finds similarities among programming applications based on semantic anchors found within the source code of such applications. The semantic anchors may be API calls, such as Java's package and class calls of the JDK. Latent Semantic Indexing may be used to process the application and semantic anchor data and automatically develop…
Who is the assignee on this patent?
Accenture Global Services Ltd
What technology area does this patent fall under?
Primary CPC classification G06F8/70. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).