Dynamic taint tracking in abstract syntax tree interpreters
US-2022067172-A1 · Mar 3, 2022 · US
US12572332B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12572332-B2 |
| Application number | US-202318110835-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 16, 2023 |
| Priority date | Feb 18, 2022 |
| Publication date | Mar 10, 2026 |
| Grant date | Mar 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Herein disclosed is a method for automatically automatically infer a recursive state machine (RSM) describing the space of acceptable input of an arbitrary binary program. This method automatically identifies atomic fields of fixed and variable lengths and syntactic elements, such as separators and terminators, and generalizes them into regular expression tokens. It constructs an RSM of tokens to represent structures such as arrays and records. Further, it constructs nested states in RSM to represent complex, nested structures. The RSM may serve as an independent parser for the program's acceptable input.
Opening claim text (preview).
The invention claimed is: 1 . A method for extracting a structure of a binary program, comprising: (a) providing a state machine; (b) providing one or more inputs comprising a serialized representation of a data structure, wherein said data structure comprises one or more fields and wherein such fields are atomic or composite; wherein each atomic field of the data structure comprises one or more values, comprising a sequence of bytes; wherein a token comprises an abstract representation of the values of the atomic field; (c) producing a taint trace comprising a sequence of tuples, wherein the sequence of tuples comprises: an instruction address; an abstracted calling context; and a set of taints, comprising the union of all taints of all operands of an instruction during an invocation of the instruction; (d) from the taint-trace, identifying the one or more field values and each field value's corresponding source index; (e) constructing a taint interval tree; (f) deriving one or more field tokens; (g) classifying each of a set of instructions that accesses the one or more values of the field; (h) inferring a field type for each set of instructions using the classifications of each set; (i) constructing a structure transition graph; and (j) applying each of these steps recursively to construct a recursive state machine. 2 . A method for extracting a structure of a binary program, comprising: (a) providing a state machine, further comprising constructing a state machine from a frontier, comprising: (i) ordering one or more taint interval graph nodes in the frontier according to their start offsets; (ii) for each source index in the frontier, creating a node in the state machine; (iii) marking the source index of the last node in the frontier as the final node of the state machine; (iv) adding a new node; (v) marking the new node as the start node; (vi) annotating each node with the token, type, and data encoding of its respective source index; (vii) adding edges to the taint interval graph; and (viii) adding a start edge; (b) providing one or more inputs comprising a serialized representation of a data structure, wherein said data structure comprises one or more fields and wherein such fields are atomic or composite, and wherein a field may be a numeric type or a string type; wherein each atomic field of the data structure comprises one or more values, comprising a sequence of bytes; wherein a token comprises an abstract representation of the values of the atomic field; (c) producing a taint trace comprising a sequence of tuples; (d) from the taint-trace, identifying the one or more field values and each field value's corresponding source index; (e) constructing a taint interval tree; (f) deriving one or more field tokens; (g) classifying each of a set of instructions that accesses the one or more values of the field; (h) inferring a field type for each set of instructions using the classifications of each set; (i) constructing a structure transition graph; and (i) applying each of these steps recursively to construct a recursive state machine.
Reverse engineering; Extracting design information from source code · CPC title
Creation or generation of source code · CPC title
Finite state machines · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.