Method for extracting the structure of an input for a binary program

US12572332B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12572332-B2
Application numberUS-202318110835-A
CountryUS
Kind codeB2
Filing dateFeb 16, 2023
Priority dateFeb 18, 2022
Publication dateMar 10, 2026
Grant dateMar 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Herein disclosed is a method for automatically automatically infer a recursive state machine (RSM) describing the space of acceptable input of an arbitrary binary program. This method automatically identifies atomic fields of fixed and variable lengths and syntactic elements, such as separators and terminators, and generalizes them into regular expression tokens. It constructs an RSM of tokens to represent structures such as arrays and records. Further, it constructs nested states in RSM to represent complex, nested structures. The RSM may serve as an independent parser for the program's acceptable input.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A method for extracting a structure of a binary program, comprising: (a) providing a state machine; (b) providing one or more inputs comprising a serialized representation of a data structure, wherein said data structure comprises one or more fields and wherein such fields are atomic or composite; wherein each atomic field of the data structure comprises one or more values, comprising a sequence of bytes; wherein a token comprises an abstract representation of the values of the atomic field; (c) producing a taint trace comprising a sequence of tuples, wherein the sequence of tuples comprises: an instruction address; an abstracted calling context; and a set of taints, comprising the union of all taints of all operands of an instruction during an invocation of the instruction; (d) from the taint-trace, identifying the one or more field values and each field value's corresponding source index; (e) constructing a taint interval tree; (f) deriving one or more field tokens; (g) classifying each of a set of instructions that accesses the one or more values of the field; (h) inferring a field type for each set of instructions using the classifications of each set; (i) constructing a structure transition graph; and (j) applying each of these steps recursively to construct a recursive state machine. 2 . A method for extracting a structure of a binary program, comprising: (a) providing a state machine, further comprising constructing a state machine from a frontier, comprising: (i) ordering one or more taint interval graph nodes in the frontier according to their start offsets; (ii) for each source index in the frontier, creating a node in the state machine; (iii) marking the source index of the last node in the frontier as the final node of the state machine; (iv) adding a new node; (v) marking the new node as the start node; (vi) annotating each node with the token, type, and data encoding of its respective source index; (vii) adding edges to the taint interval graph; and (viii) adding a start edge; (b) providing one or more inputs comprising a serialized representation of a data structure, wherein said data structure comprises one or more fields and wherein such fields are atomic or composite, and wherein a field may be a numeric type or a string type; wherein each atomic field of the data structure comprises one or more values, comprising a sequence of bytes; wherein a token comprises an abstract representation of the values of the atomic field; (c) producing a taint trace comprising a sequence of tuples; (d) from the taint-trace, identifying the one or more field values and each field value's corresponding source index; (e) constructing a taint interval tree; (f) deriving one or more field tokens; (g) classifying each of a set of instructions that accesses the one or more values of the field; (h) inferring a field type for each set of instructions using the classifications of each set; (i) constructing a structure transition graph; and (i) applying each of these steps recursively to construct a recursive state machine.

Assignees

Inventors

Classifications

  • Reverse engineering; Extracting design information from source code · CPC title

  • G06F8/30Primary

    Creation or generation of source code · CPC title

  • G06F9/4498Primary

    Finite state machines · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12572332B2 cover?
Herein disclosed is a method for automatically automatically infer a recursive state machine (RSM) describing the space of acceptable input of an arbitrary binary program. This method automatically identifies atomic fields of fixed and variable lengths and syntactic elements, such as separators and terminators, and generalizes them into regular expression tokens. It constructs an RSM of tokens …
Who is the assignee on this patent?
Univ Of Louisiana Lafayette, Univ Louisiana At Lafayette
What technology area does this patent fall under?
Primary CPC classification G06F8/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).