What technology area does this patent fall under?

Primary CPC classification G06F8/70. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Component discovery from source code

US9323520B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9323520-B2
Application number	US-201414504194-A
Country	US
Kind code	B2
Filing date	Oct 1, 2014
Priority date	Apr 9, 2012
Publication date	Apr 26, 2016
Grant date	Apr 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for component discovery from source code may include receiving source code, and determining business classes by excluding packages and classes in the source code identified as belonging to a presentation layer, as belonging to a data access layer, as models and/or as utilities. The method may further include extracting multi-dimensional features from the business classes, estimating similarity for business class pairs based on the extracted multi-dimensional features, clustering the business classes based on the similarity and mapping functional concepts to the clusters. The clusters generated by the clustering may represent components of the source code. The method may also include determining interfaces for the components based on the clustering.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for component discovery from source code, the method performed by a processor and comprising: receiving source code; determining business classes by excluding packages and classes in the source code; extracting features from the business classes; estimating similarity for business class pairs based on the extracted features by determining textual similarity by using a co-occurrence matrix that is defined as a sequence of the business classes in the source code and a sequence of unique intermediate representation (IR) tokens occurring across the business classes, and evaluating, for the co-occurrence matrix, a frequency of occurrence of an IR token from the IR tokens occurring in a particular business class of the business classes; clustering the business classes based on the similarity, wherein clusters generated by the clustering represent components of the source code; and determining interfaces for the components based on the clustering. 2. The method of claim 1 , wherein clustering the business classes based on the similarity further comprises: using k-means clustering to generate initial clusters that are used to cluster the business classes. 3. The method of claim 1 , further comprising: clustering a plurality of application portfolios that each includes a plurality of applications that use different types of source code including the source code. 4. The method of claim 1 , further comprising: determining similarity between different pairs of the clusters based on a normalized summation of similarity scores between the business class pairs across the clusters. 5. The method of claim 1 , wherein estimating similarity for business class pairs based on the extracted features further comprises: including a class name in an inheritance and interface realization list for a current business class; including names of other business classes in the inheritance and interface realization list that have the class name of the current business class in inheritance and interface realization lists of the other business classes; and determining inheritance and interface realization based similarity for the business class pairs based on evaluation of the inheritance and interface realization list for the current business class and an inheritance and interface realization list for the other business classes. 6. The method of claim 1 , wherein clustering the business classes based on the similarity further comprises: generating a set of seed clusters by using top weighted edges between business class pairs, wherein the edges represent the similarity for the business class pairs. 7. The method of claim 1 , wherein clustering the business classes based on the similarity further comprises: generating a set of seed clusters by using edges between business class pairs with non-zero inheritance and interface realization similarity, wherein the edges represent the similarity for the business class pairs. 8. The method of claim 1 , wherein clustering the business classes based on the similarity further comprises: generating a set of seed clusters based on a clique strength of nodes of edges between business class pairs, wherein the edges represent the similarity for the business class pairs and the nodes representing the business classes. 9. The method of claim 1 , wherein clustering the business classes based on the similarity further comprises: generating a set of seed clusters based on a characteristic of edges or nodes of the business class pairs, wherein the edges represent the similarity for the business class pairs and the nodes representing the business classes; and evaluating a modularisation quality (MQ) of the set of seed clusters. 10. The method of claim 1 , wherein clustering the business classes based on the similarity further comprises: maximizing modularisation quality (MQ) of clusters based on movement of nodes between the clusters, wherein the nodes represent the business classes. 11. The method of claim 1 , wherein determining interfaces for the components further comprises: identifying public methods of the business classes in a cluster that are called by the business classes of other clusters. 12. The method of claim 1 , further comprising: determining component interactions based on public methods of a cluster that are called by the business classes of another cluster. 13. The method of claim 1 , further comprising: identifying borderline classes by identifying the business classes in a first cluster having a high similarity to the business classes in another cluster. 14. A method for component discovery from source code, the method performed by a processor and comprising: receiving source code; determining business classes by excluding packages and classes in the source code; extracting features from the business classes; estimating similarity for business class pairs based on the extracted features by determining structural similarity by collapsing edges with a same method name for a dependency graph that includes nodes representing the business classes, wherein an edge of the edges represents a function call in the source code for a business class of the business classes where a function of another business class of the business classes is called; clustering the business classes based on the similarity, wherein clusters generated by the clustering represent components of the source code; and determining interfaces for the components based on the clustering. 15. A method for component discovery from source code, the method performed by a processor and comprising: receiving source code; determining business classes by excluding packages and classes in the source code; extracting features from the business classes; estimating similarity for business class pairs based on the extracted features by determining a combined similarity for the business class pairs based on evaluation of similarity measures that include textual, class name, method name, packaging, inheritance and interface realization, and structural based similarities, and using a relative significance factor for each of the similarity measures such that the sum of the similarity measures is equal to a predetermined value; clustering the business classes based on the similarity, wherein clusters generated by the clustering represent components of the source code; and determining interfaces for the components based on the clustering. 16. A component discovery system comprising: a processor; and a memory storing machine readable instructions that when executed by the processor cause the processor to: determine business classes by excluding packages and classes in source code; extract at least one of textual, code, and structural dependency based features from the business classes; estimate similarity for business class pairs based on the extracted features by populating a class name matrix that accounts for a frequency of occurrence of word concepts in a business class name, applying term frequency-inverse document frequency (tf-idf) based weighting to the class name matrix, and determining class name similarity for the business class pairs by evaluating class name matrices corresponding to the business class pairs; cluster the business classes based on the similarity, wherein clusters generated by the clustering represent components of the source code; and determine interfaces for the components based on the clustering. 17. The component discovery system according to claim 16 , wherein the machine readabl

Assignees

Accenture Global Services Ltd

Inventors

Classifications

G06Q10/06
Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling · CPC title
G06F8/75
Structural analysis for program understanding · CPC title
G06F16/285
Clustering or classification · CPC title
G06F8/70Primary
Software maintenance or management · CPC title
G06F8/315
Object-oriented languages · CPC title

Patent family

Related publications grouped by family.

View patent family 48044507

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9323520B2 cover?: A method for component discovery from source code may include receiving source code, and determining business classes by excluding packages and classes in the source code identified as belonging to a presentation layer, as belonging to a data access layer, as models and/or as utilities. The method may further include extracting multi-dimensional features from the business classes, estimating si…
Who is the assignee on this patent?: Accenture Global Services Ltd
What technology area does this patent fall under?: Primary CPC classification G06F8/70. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).