Component discovery from source code

US9836301B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9836301-B2
Application numberUS-201615076207-A
CountryUS
Kind codeB2
Filing dateMar 21, 2016
Priority dateApr 9, 2012
Publication dateDec 5, 2017
Grant dateDec 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for component discovery from source code may include receiving source code, and determining business classes by excluding packages and classes in the source code identified as belonging to a presentation layer, as belonging to a data access layer, as models and/or as utilities. The method may further include extracting multi-dimensional features from the business classes, estimating similarity for business class pairs based on the extracted multi-dimensional features, clustering the business classes based on the similarity and mapping functional concepts to the clusters. The clusters generated by the clustering may represent components of the source code. The method may also include determining interfaces for the components based on the clustering.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for component discovery from source code, the method performed by a processor and comprising: receiving source code; determining business classes by determining a component identification boundary in the source code; extracting features from the business classes by extracting packaging information for each of the business classes, wherein extracting packaging information for each of the business classes includes extracting concept words embedded in business class names, extracting a packaging hierarchy as a string, and extracting a substring that describes the packaging hierarchy; estimating similarity for business class pairs based on the extracted features; clustering the business classes based on the similarity, wherein clusters generated by the clustering represent components of the source code; and determining interfaces for the components based on the clustering. 2. The method of claim 1 , wherein extracting features from the business classes further comprises: extracting inheritance and interface realization relationships for each of the business classes. 3. The method of claim 1 , wherein clustering the business classes based on the similarity further comprises: generating partitions for the clusters by determining, for each node in a cluster, whether the node belongs to the cluster or to a different cluster, wherein the node represents a business class; and moving, based on the determination that the node belongs to the different cluster, the node to the different cluster. 4. The method of claim 1 , wherein estimating similarity for business class pairs based on the extracted features further comprises: determining, based on the extracted packaging information, packaging based similarity for the business class pairs. 5. The method of claim 1 , wherein clustering the business classes based on the similarity further comprises: generating seed populations by sorting a list of edges between business class pairs; and generating, based on the seed populations, a set of seed clusters. 6. The method of claim 1 , further comprising: mapping functional entities to the components by separating each of the functional entities into distinct words; and determining, based on the separation of each of the functional entities into distinct words, a similarity between each of the functional entities and the components. 7. A component discovery system comprising: a processor; and a memory storing machine readable instructions that when executed by the processor cause the processor to: determine business classes by excluding packages and classes in source code; extract textual features from the business classes by extracting packaging information for each of the business classes, wherein extracting packaging information for each of the business classes includes extracting concept words embedded in business class names, extracting a packaging hierarchy as a string, and extracting a substring that describes the packaging hierarchy; estimate similarity for business class pairs based on the extracted features; cluster the business classes based on the similarity by generating seed populations by sorting a list of edges between business class pairs, and generating, based on the seed populations, a set of seed clusters, wherein clusters generated by the clustering represent components of the source code; and determine interfaces for the components based on the clustering. 8. The component discovery system according to claim 7 , wherein the machine readable instructions to extract the textual features from the business classes further comprise machine readable instructions that when executed by the processor further cause the processor to: extract inheritance and interface realization relationships for each of the business classes. 9. The component discovery system according to claim 7 , wherein the machine readable instructions to cluster the business classes based on the similarity further comprise machine readable instructions that when executed by the processor further cause the processor to: generate partitions for the clusters by determining, for each node in a cluster, whether the node belongs to the cluster or to a different cluster, wherein the node represents a business class; and move, based on the determination that the node belongs to the different cluster, the node to the different cluster. 10. The component discovery system according to claim 7 , wherein the machine readable instructions to estimate similarity for business class pairs based on the extracted features further comprise machine readable instructions that when executed by the processor further cause the processor to: determine, based on the extracted packaging information, packaging based similarity for the business class pairs. 11. The component discovery system according to claim 7 , further comprising machine readable instructions that when executed by the processor further cause the processor to: map functional entities to the components by separating each of the functional entities into distinct words; and determine, based on the separation of each of the functional entities into distinct words, a similarity between each of the functional entities and the components. 12. A non-transitory computer readable medium having stored thereon machine readable instructions for component discovery, the machine readable instructions, when executed, cause a processor to: determine business classes by excluding packages and classes in source code; extract code features from the business classes by extracting packaging information for each of the business classes, wherein extracting packaging information for each of the business classes includes extracting concept words embedded in business class names, extracting a packaging hierarchy as a string, and extracting a substring that describes the packaging hierarchy; estimate similarity for business class pairs based on the extracted features; cluster the business classes based on the similarity, wherein clusters generated by the clustering represent components of the source code; and determine interfaces for the components based on the clustering by identifying public methods of the business classes in a cluster of the generated clusters that are called by the business classes of other clusters from the generated clusters. 13. The non-transitory computer readable medium according to claim 12 , wherein the machine readable instructions to cluster the business classes based on the similarity further comprise machine readable instructions that when executed by the processor further cause the processor to: generate partitions for the clusters by determining, for each node in a cluster, whether the node belongs to the cluster or to a different cluster, wherein the node represents a business class; and move, based on the determination that the node belongs to the different cluster, the node to the different cluster. 14. The non-transitory computer readable medium according to claim 12 , wherein the machine readable instructions to estimate similarity for business class pairs based on the extracted features further comprise machine readable instructions that when executed by the processor further cause the processor to: determine, based on the extracted packaging information, packaging based similarity for the business class pairs. 15. The non-transitory computer readable medium according to claim 12 , wherein the machine readable instructions to cluster the business classes based on the similarity further comprise machi

Assignees

Inventors

Classifications

  • G06F8/74Primary

    Reverse engineering; Extracting design information from source code · CPC title

  • Program documentation · CPC title

  • Structural analysis for program understanding · CPC title

  • Object-oriented languages · CPC title

  • G06F8/70Primary

    Software maintenance or management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9836301B2 cover?
A method for component discovery from source code may include receiving source code, and determining business classes by excluding packages and classes in the source code identified as belonging to a presentation layer, as belonging to a data access layer, as models and/or as utilities. The method may further include extracting multi-dimensional features from the business classes, estimating si…
Who is the assignee on this patent?
Accenture Global Services Ltd
What technology area does this patent fall under?
Primary CPC classification G06F8/74. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).