Tuesday, December 28, 2010

Notes from 27th Chaos Communication Congress - day 1

Here are some notes from the first day of the 27th Chaos Communication Congress. See also
day 2, day 3, day 4.

Keynote

5 years ago speaker said at CCC "we have lost the war": a perfect storm:
post-9/11 paranoia, EU data retention, climate change. As of today, war not
actually lost: German constitutional court started protecting privacy
- OTOH Netherlands has a constitution, but not a court, so the majority can
ignore constitution => fearmongering and other aspects of usual politics;
Netherlands has now become an cautionary tale WRT privacy.

Political recommendations: watch party funding, "literally by all means
defend the constitution".

Speaker "mildly bipolar", was recommended anti-depressants - "being unhappy
has become socially unacceptable". If depression is a force that pushes us
to make painful but necessary changes, antidepressants prevent necessary
change - perhaps there might be a political-pharmaceutical complex one day?

Success on voting machines: Illegal in both Germany and Netherlands (Germany
"safe" - constitutional court, in Netherlands will need to fight this war
over and over again with each single mayor) E-voting in Brazil: black box,
gets an ID card from each voter - future versions will event collect a
fingerprint.

Wikileaks: Speaker did not participate in the latest release, "possible
ramifications scared the bejezus out of me", "I can't live from a backpack".
Important, but outcome is uncertain: "Not sure what has been unleashed" -
attacks on Internet freedom will certainly increase. US proposes to be able
to get plaintext from any service - "Crypto war 2.0 starting". "Anonymous is
getting on my nerves" - "real hackers would not release real names in PDF
metadata", lacking a "level of maturity" - "we" [at CCC] might attack, but
nothing good comes out of it.

Politicians don't know what's going on, can't control it, can only pretend
they are in control for the voters. "Hackers don't have the answers", but
understand the dangers of complexity - "lack of slack"; "CCC does not cause
chaos - we have prevented some aspects, and we understand chaos a little".

Living in a world of separate viewpoints/narratives, from "Apple, Google,
Facebook and the geographically-challenged traditional governments"

Future: Basic story remains - we lost the war. "It's going to be a mess":
"difficult times, not end times" - build trust relationships, diverse skill
sets, be flexible.

CCC logistics: too many of us, need to move out to be able to attract new
people. DEFCON given as a counterexample for expansion, probably does not
apply here - CCC never had their problems..

Code deobfuscation by optimization


http://code.google.com/p/optimice/

An IDA plugin to decode/simplify semantics on an obfuscated code, optionally
"assemble" into a new code segment for further processing in IDA.

Handling obfuscation: Small basic blocks, "push+return" to break IDA graphing
are simplified/converted. Fake paths (conditional jumps to nonsensical
bytes) are simplified. Overlapping instructions are duplicated.

Implementation: Build a CFG. For instruction semantics, use "MazeGen's XML
at ref.x86asm.net" to track all inputs/outputs. Optimizations performed: JMP
threading, conditional jump simplification, dead code removal, various
heuristics, e.g. push+ret->jmp. Constant propagation/folding unimplemented.

Contemporary Profiling of Web Users


Research on defeating web proxies / anonymizers / Tor. etc. "In private
communication research, dummy traffic was researched in the last 20 years and
has never been a solution"

Proxies that remove JavaScript:


We want to limit JS: it can e.g. get screen size, local date (=>clock offset
and drift). Existing proxy projects are dead.

The proxies 1) remove <script> 2) move content out of <noscript>, both using
regular expressions. PHProxy attack: <noscript><scr</noscript>ipt> .
Glype: same attack on <object> . Also, Java can load JS as well:
...showDocument(..."javascript:..."...) .

When JavaScript is enabled, but filtered: When DOM modification are used to
hide objects, it is usually possible access the originals. NoScript can
forbid 3rd party JS for tracking, but code can still load 3rd party CSS.
Another specific problem is filtering <object archive=...>, but not
<object><param name=archive .../> .

Identifying users by web profile


Assuming an anonymizer, or watching users on a DNS server. Identifying the
user is easy with static IP; dynamic IP "should" protect (changing IP
address on DSL, or Tor changing the routing each 10 minutes).

Using standard machine learning using mechanisms for word frequencies to
learn on host access frequencies, using a "multinomial naive Bayes
classifier". In experiment, successfully identified 77% of "links" (user on
day D => user on day D+1). Accuracy is good even with 10-minute sessions
(i.e Tor). Longer time between learning and classification doesn't hurt
much.

Recommendations: Change IP address frequently and do not continue previous
activities after the change. Use _separate_ proxies for each activity.
Randomly distributing activity across multiple proxies does not help - each
proxy has similar data. Visiting only popular pages does not help much.

Detection of bots and other strange users



Motivation: heavy load by bots, proprietary databases crawled by
competitors.

  • If load balancing: make it deterministic (e.g. md5 of client's IP), look
    for "incorrectly" connecting users. This is trivial, but actually works -
    many bots are lazy and just connect to host 0.
  • Observe behavior: client that does not access images/JS, connects too often.
  • User-Agent: Fake user agents are often too old (IE 5.5). HTTP header
    characteristics (order/capitalization) allow quite specific user-agent
    detection, allow detection of faked User-Agents. These techniques are
    both easier to do "after" load balancing proxies because the proxies will
    defragment the input, making evasion more difficult.

    Local attacks on Tor


    "Local" = connection between client and Tor entry node. Attempting web site
    fingerprinting using traffic analysis only (timing, packet sizes), want to
    see if a specified site was visited. Timing information is mostly useless
    due to Tor's load and circuit changes; Tor "should" protect against size
    fingerprinting dues to fixed-size cells.

    Machine learning again: Need to train this with each browser separately, and
    extract separate requests (not mixed with unrelated sites). Using
    "Multinomial naive Bayes", "Support Vector Matrices", training on packet
    sizes and direction, "ignoring ACKs" = counting the total size of a transfer
    in one direction until the direction changes.

    Detection accuracy when training against "all possible sites" is >95% on
    OpenSSH,OpenVPN, IPec. Tor started with 3%, can now get 55% => feasible.
    "Jap": interestingly the problem is more
    difficult on the free version than on the premium one.

    Detection when distinguishing between a few "interesting" sites and "rest of
    Internet": training with a representative sample for "rest". With 5
    "interesting" sites need ~2000 samples for <1% false positives, will get
    67% false negatives.

    Recommendations: Do multiple things at the same time (tabs, Internet
    radio...) - decreases success to ~10%.

    Automatic identification of crypto primitives in software


    http://code.google.com/p/kerckhoffs/

    A master thesis, limitations: Assumes no obfuscation, no JIT/interpretation,
    only limited to crypto (nothing else, and not block cipher mode).

    Existing tools: signature based, or dynamic instrumentation measuring
    percentage of "bitwise" operations (globally / in a basis block / function),
    looking for loops that change entropy

    Implementation: Intel's PIN for dynamic instrumentation to get insn-level
    execution trace. Then reconstructs CFG ("dynamic" = including indirect
    jumps), jump taken/not taken statistics, detects loops, memory ranges/areas.

    Algorithm identification methods:

    • Excessive use of bitwise instructions.
    • Sequences of (instruction mnemonics, constant operand) pairs, find fingerprints - e.g. combinations unique for an implementation.
    • Loops (X often unrolled): observe (number of executions of the loop as a whole, number of iterations, number of instructions in a loop).
    • Look for a specific relation (e.g. AES (input, key, output)) between blocks of data with a suitable size


    libUSB


    A generic introduction to USB - releases, limitations, transfer types, endpoints, descriptors, usage of libusb on various OSs.

    Desktop on Linux


    From a PoV of an UNIX sysadmin... It seems difficult to keep up with the
    technology changes. Presentation overshadowed with explanations by Lennart
    Poettering.

    Distributions focusing on "Dumbest Assumable User", few sysadmin controls
    available/visible.

    MM frameworks: too many layers (=> loses relevant information on the way with
    Phonon+GStreamer+PulseAudio). GStreamer backend to Phonon unmaintained,
    still used by default in "some distributions"

    GDM: complicated - why do we need a full GNOME session? (Answer: a11y pulls
    audio, which pulls bluetooth, ...; g-p-m necessary for default power
    policy). GDM doesn't handle systems with many users well; Can't show all 3.5k
    names in any case. When that is disabled, still shows recent users and
    "Other" - users mistake login screen for a screen lock.

    ConsoleKit: Sorry state of documentation: "Defining the problem: To be
    written" after all these years. Intended to manage separate seats, but ACL
    changes illusory without revoke(). Not robust - changes persistently-stored
    ACLs but keeps only in-process state => ConsoleKit crash leaves around
    obsolete ACLs.

    D-Bus complaits: Nonsensical name spacing ("you need a domain", "narcissistic
    naming: based on project name, doesn't tell what it actually does). Would
    like implementation-independent interfaces [X where to get the
    implementations?]. TCP transport: "no authentication, no authorization, no
    encryption". (Lennart: ['we' agreed that] "D-Bus won't be used across the
    network full stop")

    IPv6


    Many "old" vulnerabilities were carried forward from IPv4, previously
    presented:

    • Neighbor discovery spoofing <-> ARP spoofing
    • Duplicate address detection DoS [answer "this is a duplicate" to everything]
    • Rogue autoconfiguration server <-> rogue DHCP server


    Routerless networks: Sending a router advertisement with 0 lifetime "kills"
    the router for clients. Per RFCs clients treat any address as _link-local_
    if no router exists.

    Unexpected RA on an IPv4-only network: switches on dual stacks. Thus we can
    bypass IPv4-only firewalls, can MITM on IPv6 because IPv6 transport is
    preferred to IPv4.

    RA flooding: 1m bogus RAs DoSes Cisco, Windows, old Linux (100% cpu).

    Remote ping scans: Originally thought infeasible due to large address space,
    broadcast doesn't exist. But we can still use search engines, DNS, common
    addresses. Randomly chose 17k DNS names. The following "host address" (= host part
    within network) sources exist:

    • Autoconfiguration: either link-local = based on MAC (can guess if you know
      the (company standard) manufacturer), or "Privacy option" = random and
      changing from time to time
    • DHCP: allocated sequentially! => "if you got one, you got all"; common
      ranges based on example documentation
    • Manually configured: ::1, ::2, ..., or ::service_port, IPv4 address, and
      simple variants of these.

    Overall, can easily guess ~70% of host addresses. A scan only needs to try
    ~2100 host addresses (1-20 seconds) to get 70-80% of hosts, similarly try
    ~1500 common host names. A scan may return a router's "not available"
    message for a different network, giving us more targets. We can iterate
    between guessing hosts on a network, and using reverse DNS to get more
    starting points. Altogether we can identify~90-95% of servers (not counting
    other kinds of hosts).


    Multicast DoS: Multicast background: A "query" router periodically prompts
    for confirmation of existence of multicast receivers. We can spoof
    "unsubscribing" message, but this will cause another prompt and resumption of
    traffic. If we become the "query" router, we can avoid sending the prompt.
    "Query" is voted by local link address => 0000000 wins [nobody configures a
    router on 0]. Then we can unsubscribe the network, except that other routers
    would assume the router is dead if no prompts were seen. To avoid this, send
    the prompts - but only to a "router-only" multicast MAC.

    To see if a Windows/Linux computer is sniffing the network: Send a packet
    (ping) to an _unused_ multicast address, see if the host responds.

    Side channels in IPv6: "IPv6 is a side channel" - too much functionality,
    cannot be reasonably filtered

    Code available at http://www.mh-sec.de/downloads/thc-ipv6-1.4.tar.gz .
    Will start www.ipv6{security,hacking}.info for secure configuration advice.

    Mitigations:

    • ACLs on L3 switch (e.g. don't allow RAs from client ports), if supported
    • IPSec, but a pain
    • Secure Encrypted neighbor Discovery - basically happening in the switches,
      not supported yet anywhere, still has problems.
    • More secure client configuration - not always possible
    • Detection of attacks is easy, prevention unknown

      Analyzing Stuxnet


      Presented by Microsoft "to set the facts straight". Analyzed within a few
      weeks after this came in, but not allowed to talk about it at the time.
      2 interesting things: _4_ 0-day vulnerabilities, attack on SCADA.

      Discovered: by VirusBlokAda (Belarus, not known by Microsoft) sent a sample
      in ~July, eventually got the original LNK files. Others are looking at this
      as well - need to "know ahead" about threats, ~1 MB of binary; full knowledge
      sharing with Kaspersky.

      Methodology: "initial triage" - identify surprising code, clues for
      vulnerabilities, then discuss details with developers of relevant code.
      Total time ~30-40 man-hours in 3-4 days to find the vulnerabilities. Later
      completely decompiled 2 components to buildable C.

      Attackers: "Don't ask me who the author was". Components were written by
      different people. Aiming for 100% reliability, high impact. Developed on
      removable media (path embedded in file is B:...). Shell code does not use
      simple "call" insns - always "ret".

      Attack 1: LNK files


      Dumped LNK as text, identified the buggy DLL; all done in ~1 hour. Bug:
      .CPL has icons inside => must do LoadLibrary(), which calls DllMain(). Fix:
      limit icon loading to only registered .CPLs.

      Impact: Arbitrary code execution without privilege escalation - only a
      foothold for further attacks. Looked at attack vectors - in addition to
      USB, could use WebDAV (remote attack) => fixed it "out of band",
      "telemetry" told them users were being affected. 100% reliable attack
      vector. Apparently some people knew about this for years.

      Attack 2: Task scheduler


      Debugging was not really helpful => using process monitor, event logs -
      noticed that task files were accessed. Bug: XML file storing task data
      (including the user to run it as => can escalate to LocalSystem) _writable
      by user_, authenticated using a CRC32 hash (which was protected against user
      access); CRC32 collisions are easy. Fix: use SHA256 (kept files writable
      "for compatibility" [with what??? the authentication would break writing
      anyway])

      100% reliable - but only works on >= Vista.

      Attack 3: Keyboard Layout


      Eventually found "not immediately obvious" code - searching in win32k.sys,
      NtVirtualAllocateMem, keyboard layout loading, some IDA-unidentified code.
      Tried various things, finally noticed the code looks like a shell code and
      inserted a break point in it to get a back trace. Bug: <=XP allowed loading
      keyboard layouts from any directory, indexed a function array using an
      unvalidated user-controlled integer. Attack looked for a suitable user-land
      address following the original table, copied attack code there.

      100% reliable - only <= XP, so we can assume the attacked environment is
      not monolithic.

      Attack 4: Printer spooler


      Kaspersky reported suspicious spooler RPC. Network trace: guest printing to
      files in %system%. Spooler should have switched to the client account, but
      it doesn't for Guest because it is too limited, so it uses System instead.
      Windows by design automatically runs a .MOF file dropped in there :)

      This all only works if anonymous connections are allowed, which is very
      uncommon in corporations

Sunday, May 2, 2010

Groovy Recipes

Groovy Recipes: Greasing the Wheels of Java, from the Pragmatic Programmers series, is really two books in one: First, it is what it promises: a collection of practical, easy to reuse code for common tasks.

Second, when read sequentially, it is a good tutorial to the language. Sometimes there are two or three "recipes" that really build an interface around some common piece of code, and each of the recipes dutifully repeats the explanation of the common code. This happens only in the latter parts of the book, when the job of explaining the language proper is done, so it is easy to forgive.

As for Groovy the language, the light-weight syntax and automatic hiding of getter/setter methods is a very welcome improvement over Java; for me, the availability of closures alone is a good enough reason to prefer Groovy to Java. When viewed as a "better Java" (as opposed to viewing it as a completely separate language), Groovy does suffer from the ".NET disease", extending the language too far outside of the "philosophy" of the language into areas that should be in libraries or omitted completely, e.g. the way every "delegate" in .NET can automatically call more than one function. (This disease is by no means restricted to .NET - see e.g. <tgmath.h> in C, or enum and other recent-ish additions to Java.) In Groovy, the methodMissing feature and the ability to add methods to a class at run time seems to me far too different from the original Java object model.

If you have used Java before and turned away with sore fingers and disgust, do look at Groovy; it may not be good enough to become your primary language, but it will make cooperation with Java software much more bearable. If only someone replaced the endless XML "configuration" in .war, .ear etc. archives by something usable...