Friday, January 4, 2013

Notes from 29th Chaos Communication Congress – day 4

See also day 1, day 2 and day 3.

The care and feeding of weird machines found in executable metadata

A "weird machine" = an unexpected source of computation e.g. return-oriented programming, heap crafting. This talk presents a way to perform computations through ELF relocation entries.
Why bother?
  • (Checksum of the complete executable file detects this.)
  • Philosophically, "composition kills".
  • Antivirii seem to focus on code
  • Some people apparently don't sign metadata...
  • It's not clear how to distinguish "good" from "evil" well-formed data [which is same problem antivirii have with code, nothing fundamentally different]
Interesting ELF relocations:
  • R_X86_64_COPY = memcpy
  • X86_64_64 = *(base + reloc.offset) = symbol + base + reloc.addend
  • X86_64_64_RELATIVE = *(base + reloc.offset) = base + reloc.addend
  • STT_IFUNC symbol type: instead of symbol value, contains a function pointer that is called, and return value used as the symbol metadata
=> we can use relocations as instructions, symbol table entries as variables.For jumps: modify our dynamic entry that points to the desired relocation[="instruction"], then modify dynamic linker's internal data so that the relocation table is processed "again".
Conditional branch: STT_IFUNC is only interpreted if "section header index" !=0 => can conditionally write based on that, and use it to conditionally terminate processing the relocation table (=> continue running a "different table" = target location).
Finished this exercise in spirit, with a brainfuck => ELF compiler...

OS-X MACH-O: Similar in concept. Relocation is called "binding"; relocation entries "compressed": bytecode operations to set various parameters, then "do bind" opcode. Probably can do similar things, or at very least, can "disable" functions (e.g. seteuid() to drop privileges) by changing the symbol name being looked up.
Windows PE: similar work already exists ("locreate": uses relocation entries as an unpacker, pe101.corkami.com contains a PE+PDF+ZIP combo file)

Suggested mitigations: [but there's no real reason why bother, just sign the whole file and be done with it]


  • Suggesting page-level permissions ("elfpack"?)
  • Look at limiting the expressiveness of the format
  • More loader sanity checking (e.g. prevent the relocation entries from overwriting linker's private variables or stack) might help

Page Fault Liberation Army or Gained in Translation

First summarized various creative uses of the page fault mechanism:
OpenWall: decrease code segment limit to exclude stack (need to recognize and handle gcc trampolines)
SEGMEXEC: use a 1.5G segment for data, and a 1.5G segment for code starting at 1.5G => separate page tables for data and code X restricted data space, using non-zero-base segments is very slow on modern CPUs
PaX: pages with "user"=0 => always cause a trap; then check whether it is because of EIP; if not, temporarily set user=1, read the page to fill TLB (data TLB only), set user=0 again.
OllyBone: Want to trap first execution after write (=> after malware unpacking), using same technique as PaX.
ShadowWalker rootkit: similarly, use different page frame number for code and data => rootkit detection won't see the code.
In general, "those TLB bits are memory, you can program with them".
"Labels & flow": PaX looks for implicit flows from userland to kernel. Spreads labels over the various bits in the page tables. "h2hc 2012 presentation absolutely recommended".
Contribution of this talk: a full programming environment [actually a Turing
machine with no I/O] in which "no instruction ever finishes executing":

  • Hardware task switching (task gate) used for memory storage. A task gate can be used as an interrupt handler (IDT->GDT->TSS); it reloads almost all CPU state from memory (incl. page tables); supposedly atomically, but actually dword at a time.
  • SP decrement is used for arithmetics (SP -= 4)
  • Double fault (when SP decremented from 0) is used as a branch mechanisms.

=> our single instruction: { x = y; if x < 4: goto b else {x -= 4; goto a} }
  • This single instruction is enough for a Turing-complete machine.
  • Needs one TSS descriptor per instruction.
  • Uses a different IDT per instruction.
  • "X" is stored as address of current task state.
  • "Y" are the addresses of TSS.
  • TSS "busy" bit problematic, we can overlay it over the GDT so that it is always cleared. This mechanism limits us to 16 instruction virtual addresses (but we can have more physical addresses => "16-color instruction CFG".
No publicly available emulator implements this correctly!
  • bochs can do the Turing machine, but not all.
  • Intel's Simics reboots the VM (triple-fault?)
  • KVM, Simics can be made to crash with a few changes (incl. the host in case of KVM?)
Limitations:
  • 32-bit only? "Working on 64-bit", which was cleaned up and the used legacy mechanisms were removed.
  • Not tested on AMD

1 comment:

  1. Wow that was odd. I just wrote an very long comment but
    after I cliccked submt my comment didn't show up.
    Grrrr... well I'm not writing all that over again. Anyway, just wanfed to say great blog!

    ReplyDelete