What technology area does this patent fall under?

Primary CPC classification G06F8/70. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for finding project-related information by clustering applications into related concept categories

US9804838B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9804838-B2
Application number	US-201514985060-A
Country	US
Kind code	B2
Filing date	Dec 30, 2015
Priority date	Sep 29, 2011
Publication date	Oct 31, 2017
Grant date	Oct 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, method, and computer-readable medium, is described that finds similarities among programming applications based on semantic anchors found within the source code of such applications. The semantic anchors may be API calls, such as Java's package and class calls of the JDK. Latent Semantic Indexing may be used to process the application and semantic anchor data and automatically develop a similarity matrix that contains numbers representing the similarity of one program to another.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: a processor, at least partially implemented in hardware, to: generate a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls, the similarity matrix being generated from a term document matrix using singular value decomposition, the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization, elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimension, and at least one of the API calls corresponding to one of the categories, the similarity being based on weights for the API calls contained in the plurality of computer applications, a respective weight for a respective API call in a respective computer application being based on a quantity of API calls in the respective computer application and a quantity of computer applications, of the plurality of computer applications, that contain the respective API call; receive a selection of a first computer application of the plurality of computer applications; and provide an indication of at least one second computer application, of the plurality of computer applications, using the similarity matrix and based on the selection of the first computer application. 2. The device of claim 1 , where the similarity matrix defines the similarity between the plurality of computer applications as numerical values based on the API calls in source code of the plurality of computer applications. 3. The device of claim 1 , where the processor, when generating the similarity matrix, is to: generate the similarity matrix from a plurality of vectors corresponding to the plurality of computer applications using a vector space model, the plurality of vectors including elements corresponding to the categories of the categorization, the elements including values based on a number of the API calls in source code and documentation for a computer application corresponding to a vector and in the category corresponding to one of the elements, at least one of the API calls corresponding to one of the categories. 4. The device of claim 1 , where the processor is further to: receive the plurality of computer applications from a computer application archive via a network. 5. The device of claim 1 , where the processor, when generating the similarity matrix, is to: utilize an inverse document frequency calculation to find common API calls; and filter out the common API calls from the API calls prior to the categorization of the API calls. 6. The device of claim 1 , where the first dimension of first entries are columns and the second dimension of second entries are rows. 7. The device of claim 1 , where different API calls have different weights. 8. A non-transitory computer-readable medium for storing instructions, the instructions comprising: a plurality of instructions which, when executed by one or more processors, cause the one or more processors to: generate a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls, the similarity matrix being generated from a term document matrix using singular value decomposition, the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization, elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimension, and at least one of the API calls corresponding to one of the categories, the similarity being based on weights for the API calls contained in the plurality of computer applications, a respective weight for a respective API call in a respective computer application being based on a quantity of API calls in the respective computer application and a quantity of computer applications, of the plurality of computer applications, that contain the respective API call; receive a selection of a first computer application of the plurality of computer applications; and provide an indication of at least one second computer application, of the plurality of computer applications, using the similarity matrix and based on the selection of the first computer application. 9. The non-transitory computer-readable medium of claim 8 , where the similarity matrix defines the similarity between the plurality of computer applications as numerical values based on the API calls in source code of the plurality of computer applications. 10. The non-transitory computer-readable medium of claim 8 , where the plurality of instructions, when executed by the one or more processors to generate the similarity matrix, cause the one or more processors to: generate the similarity matrix from a plurality of vectors corresponding to the plurality of computer applications using a vector space model, the plurality of vectors including elements corresponding to the categories of the categorization, the elements including values based on a number of the API calls in source code and documentation for a computer application corresponding to a vector and in the category corresponding to one of the elements, at least one of the API calls corresponding to one of the categories. 11. The non-transitory computer-readable medium of claim 8 , where the plurality of instructions, when executed by the one or more processors to generate the similarity matrix, further cause the one or more processors to: extract the API calls from source code of the plurality of computer applications. 12. The non-transitory computer-readable medium of claim 8 , where the plurality of instructions, when executed by the one or more processors to generate the similarity matrix, cause the one or more processors to: utilize an inverse document frequency calculation to find common API calls; and filter out the common API calls from the API calls prior to the categorization of the API calls. 13. The non-transitory computer-readable medium of claim 8 , where the first dimension of first entries are columns and the second dimension of second entries are rows. 14. The non-transitory computer-readable medium of claim 8 , where different API calls have different weights. 15. A method comprising: generating, by a device, a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls, the similarity matrix being generated from a term document matrix using singular value decomposition, the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization, elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimens

Assignees

Accenture Global Services Ltd

Inventors

Grechanik Mark

Classifications

G06F16/3344
using natural language analysis · CPC title
G06F8/71
Version control (security arrangements therefor G06F21/57); Configuration management · CPC title
G06F8/70Primary
Software maintenance or management · CPC title
G06F40/194
Calculation of difference between files · CPC title
G06F16/285
Clustering or classification · CPC title

Patent family

Related publications grouped by family.

View patent family 47993904

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9804838B2 cover?: A system, method, and computer-readable medium, is described that finds similarities among programming applications based on semantic anchors found within the source code of such applications. The semantic anchors may be API calls, such as Java's package and class calls of the JDK. Latent Semantic Indexing may be used to process the application and semantic anchor data and automatically develop…
Who is the assignee on this patent?: Accenture Global Services Ltd
What technology area does this patent fall under?: Primary CPC classification G06F8/70. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).