Discovering high-level language data structures from assembler code

US10223085B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10223085-B2
Application numberUS-201715581292-A
CountryUS
Kind codeB2
Filing dateApr 28, 2017
Priority dateApr 28, 2017
Publication dateMar 5, 2019
Grant dateMar 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for transforming implicit data structures expressed by assembler code into high-level language structures includes analyzing a section of assembler code to identify a plurality of data items. The computer-implemented method further includes storing the plurality of data items in a plurality of groups. The computer-implemented method further includes modifying one or more groups in the plurality of groups based, at least in part, on a pair of adjacent groups having a non-identical overlap. The computer-implemented method further includes creating an overlap list for each group. The computer-implemented method further includes generating data modeling language for the section based, at least in part, on each overlap list. A corresponding computer system and computer program product are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product for transforming implicit data structures expressed by assembler code into high-level language structures, the computer program product comprising one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising instructions to: analyze a section of assembler code to identify a plurality of data items, wherein the section of assembler code has a beginning and an end; store the plurality of data items in a plurality of groups, wherein: the plurality of groups corresponds to the section of assembler code; and the plurality of data items is stored based, at least in part, on: their offset from a start of a group in the section; and an order in which the data items are added to the section; modify one or more groups in the plurality of groups based, at least in part, on a pair of adjacent groups having a non-identical overlap, wherein the instructions to modify the one or more groups in the plurality of groups further include instructions to perform at least one of splitting or padding the pair of adjacent groups based, at least in part, on: (i) a number of data items included in the pair of adjacent groups, and (ii) an offset position of each data item in the pair of adjacent groups; create an overlap list for each group, wherein the overlap list identifies those other groups in the plurality of groups that overlap with the group; and generate data modeling language for the section based, at least in part, on each overlap list. 2. The computer program product of claim 1 , wherein the instructions to perform at least one of splitting or padding the pair of adjacent groups include instructions to: split two data items stored in a first group of the pair of adjacent groups into two groups, such that: a first sub-group contains data items of the first group up to a split point, and a second sub-group contains data items of the first group after the split point. 3. The computer program product of claim 1 , wherein the instructions to perform at least one of splitting or padding the pair of adjacent groups include instructions to: pad a second group in the pair of adjacent groups to match a size of a first group in the pair of adjacent groups. 4. The computer program product of claim 1 , wherein two groups are determined to overlap if a data item in each group shares a common offset position in the section. 5. The computer program product of claim 1 , wherein the instructions to generate the data modeling language for the section is further based, at least in part, on instructions to determine a degree of overlap between two groups. 6. The computer program product of claim 1 , wherein the data modeling language defines hierarchical data model elements in the assembler code. 7. A computer system for transforming implicit data structures expressed by assembler code into high-level language structures, the computer system comprising: one or more computer processors; one or more computer readable storage media; and computer program instructions; the computer program instructions being stored on the one or more computer readable storage media for execution by the one or more computer processors; and the computer program instructions comprising instructions to: analyze a section of assembler code to identify a plurality of data items, wherein the section of assembler code has a beginning and an end; store the plurality of data items in a plurality of groups, wherein: the plurality of groups corresponds to the section of assembler code; and the plurality of data items is stored based, at least in part, on: their offset from a start of a group in the section; and an order in which the data items are added to the section; modify one or more groups in the plurality of groups based, at least in part, on a pair of adjacent groups having a non-identical overlap, wherein the instructions to modify the one or more groups in the plurality of groups further include instructions to perform at least one of splitting or padding the pair of adjacent groups based, at least in part, on: (i) a number of data items included in the pair of adjacent groups, and (ii) an offset position of each data item in the pair of adjacent groups; create an overlap list for each group, wherein the overlap list identifies those other groups in the plurality of groups that overlap with the group; and generate data modeling language for the section based, at least in part, on each overlap list. 8. The computer system of claim 7 , wherein the instructions to perform at least one of splitting or padding the pair of adjacent groups include instructions to: split two data items stored in a first group of the pair of adjacent groups into two groups, such that: a first sub-group contains data items of the first group up to a split point, and a second sub-group contains data items of the first group after the split point. 9. The computer system of claim 7 , wherein the instructions to perform at least one of splitting or padding the pair of adjacent groups include instructions to: pad a second group in the pair of adjacent groups to match a size of a first group in the pair of adjacent groups. 10. The computer system of claim 7 , wherein two groups are determined to overlap if a data item in each group shares a common offset position in the section. 11. The computer system of claim 7 , wherein the instructions to generate the data modeling language for the section are further based, at least in part, on instructions to determine a degree of overlap between two groups.

Assignees

Inventors

Classifications

  • Structural analysis for program understanding · CPC title

  • Trees · CPC title

  • G06F8/41Primary

    Compilation · CPC title

  • Extension of operand address space · CPC title

  • G06F8/53Primary

    Decompilation; Disassembly · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10223085B2 cover?
A computer-implemented method for transforming implicit data structures expressed by assembler code into high-level language structures includes analyzing a section of assembler code to identify a plurality of data items. The computer-implemented method further includes storing the plurality of data items in a plurality of groups. The computer-implemented method further includes modifying one o…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F8/41. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).