-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enables ARM Thumb support #1122
Conversation
Neither heap nor stack exist on the level of abstraction of instruction/lifter, so there is no need to model it.
The lifter has to hide the PC register, so that
In BAP 2.0 each program label (aka address) has its own architecture, therefore we need an analysis that will identify branches that switch the architecture. There are two caveats:
|
This part can be solved with a superset assembler approach #944 |
It is not needed as disassembler in BAP 2.x already speculative and superset. It is driven by the knowledge base, so it may at the same time disassemble all possible substrings in all supported architectures. The main question is the performance, we in general, don't want to have the full superset, even with invalid chains pruned (which is automatically done by our disassembler). That would be the question, how to find the right balance between precision and performance. We don't really want to double the CFG of each ARM binary. |
But we need a linear memory representation for instructions like
Is the lifter responsible of linking the program labels to instructions? Like in the let block seq data ctrl =
Theory.Label.for_addr (Word.int seq) >>= fun label ->
blk label data ctrl which was called after each single instruction with current |
Some of the info. are statically deterministic, though, in ARM ELF ABI docs we have
Which could be defined as an ARM-only knowledge provided by the binary file (ELF etc.) loader. Still, malicious program could switch the |
Yes, machine instructions are fully self-contained (unlike bytecode instructions, which sometimes need extra modeling, because they are evaluated by a VM not a CPU). Whenever you will see a load or push instruction its operands will be fully defined.
No, it will be lined by the IR lifter. |
It is only relevant to the linker and the way how symbols are encoded in the symbol table (in this particular abi). The mode can be switched on any jump (that doesn't involve a symbol table) and both arm and thumb instructions can have even addresses (in fact they must have even addresses due to alignment requirements). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plugins/arm_thumb/thumb_flags.ml
Outdated
|
||
end | ||
|
||
(* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can safely remove this.
plugins/arm_thumb/thumb_mem.ml
Outdated
| _ -> raise (Lift_Error "`src` must be a register") | ||
) | ||
| _ -> raise (Lift_Error "`dest` must be a register") | ||
(* the `R` bit is automatically resolved *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please separate with a new line here an in the following code.
But it provides a way of initially determines the instruction set of a symbol (with certain ABI) at least. Btw, which way would you suggest to represent |
Your guess is absolutely correct. Yes, the address of the lifted instruction is a static constant (for the target language) and is a parameter (of type When you define the semantics for an instruction you build a value of type
And the lifter itself is the function of type
Our task is to provide a value for the let lifter label =
KB.collect Theory.Label.addr label >>= fun addr -> (* this is the address of the current instruction *)
KB.collect Disasm_expert.Basic.Insn label >>= fun insn -> (* the LLVM provided decoding *)
KB.collect Memory.slot label >>= fun mem -> (* the memory chunk, probably not needed *)
build_the_semantics_object addr insn Basically, you have the full access to the knowledge base in the lifter. Besides, as a side note, the value of the PC register in some architectures is not equal to the address of the current instruction, sometimes it is shifted by some number of bytes (so it is pointing ahead of instructions), in arm it is 4 or 8 bytes, I don't remember. Also, llvm may mean by PC either the actual value of the PC register or the current instruction address. So keep this in mind. You can also follow our discussion in the Aarch64 lifter PR (#1141), I think everything that we discuss there is applicable to this lifter as well. We may even end up with some code sharing. And if you have any questions, please don't hesitate to ask. |
This brand new Thumb lifter has been updated to cope with the structure of #1174 , and is prepared to be individually fully functional after proper tests. |
okay, let's close it, but keep in mind the discussions that have happened here. |
This pr presents a draft of Core Theory/KB based ARM Thumb instructions' lifter, which is mostly a incomplete skeleton presenting how the final lifter will be.
There's still some key feature not presenting, including:
Moreover, as issue #951 states, the ARM lifter and Thumb lifter should eventually share the same state (switch between them, precisely), the way how to integrate this lifter with the old ARM lifter still remains a problem. @ivg any idea how we can fix this?