Wednesday, December 21, 2005

Why Java is not the right language for web development

The recent rise of popularity of [Ruby on Rails](http://www.rubyonrails.org/) seems to be mainly a backlash
against "the enterprise design" that is overengineered for most applications and designed a priori, "by committee"; people are drawn to Rails because the whole framework is so simple to learn and convenient,
not because they are persuaded Ruby is a better language for the application.

While I don't think Ruby is the *only* language suitable for Web programming, I feel quite confident Java
is not suitable for this purpose at all.


I don't want to say Java is a completely unusable language, though: It was a quite good Pascal replacement before it's complexity increased in 1.5, and with good compilers it can be a viable replacement for C++
in some problem domains. I don't want to blame the relative appeal of Rails and Java frameworks only
on the language, either. (I certainly don't think the Java web frameworks are perfect. But I don't know enough about them to talk about it with some confidence; therefore this post will focus only on the language.)

There are two quite different types of tasks solved using a computer: algorithm-heavy tasks and data manipulation tasks. (I am simplifying a bit, there is some overlap---but this does not matter that much.)

Algorithm-heavy tasks:

- are computing more data than they receive
- are solved by programs that run comparatively long ("long" can mean only a fraction of a second with today's hardware)
- are quite complicated (can't be described in detail on a single sheet of paper)
- are very hard to test exhaustively
- are not trivial to debug; reproducing a bug does not guarantee ability to find and fix it
- have a comparatively long edit-compile-run-test-debug cycle simply because the programmer has to think
more while making changes

On the other hand data manipulation tasks:

- are mostly moving data around and reformatting them
- are solved by programs that run very fast on small data sets (of course you can make such a program
run arbitrarily long by giving it a large input)
- are conceptually simple and easy to describe
- can be easily tested because there are few code paths
- are often easy to debug after seeing a single instance of a problem
- can therefore have a very fast edit-(compile)-run-test-(debug) cycle

An operating system, a compiler, a version control system or a word processor is an algorithm-heavy task; a "report generator", or a typical web application is a data manipulation task.

The fundamental lower bound of speed of development of algorithm-heavy tasks is the human brain;
therefore a language unsuitable for those tasks will slow the programmer down, but it will not be a complete deal breaker. It also means a little bondage and discipline is actually helpful: shipping a complicated application written in a dynamic language to a customer and finding that it crashes because
there is a typo in an identifier is stupid. "It compiles, so ship it" is often ridiculed, but sometimes
that's the most testing you can get for an unlikely error path.

For a data manipulation task you don't need static name and type checks: you need only good access to
data structures often used during transformations (which boils down to lists and hashes), and you need the language to get out of your way when all the program does is moving the data around. The language
*must* have a very short edit-compile-run-test cycle because often the programmer is fine-tuning some
formatting and spends only a few seconds in the "edit" phase, so a slow language implementation can easily
take over 50% of that cycle. Likewise a `map`-like facility, very simple syntax for hash access,
and not having to explicitly specify types, can help a lot.

Java, with most static languages, is obviously "optimized" for algorithm-heavy tasks,
and therefore not well suited for data manipulation tasks. This is not that suprising when you consider
that algorithm-heavy tasks were an overwhelming majority of software written until the Internet became mainstream; for the few data manipulation tasks people knew they should use `awk`, or, later, Perl.
Then came the Internet and web applications, and Open Source databases became widely available; most
web applications became data manipulation tasks and we became stuck with Java and other languages
designed for a different world.

1 comment:

  1. For data manipulation libraries and IDE support are esential too. Mainly Java IDEs are great.
    And java isn't the only language for the java platform. There are ports of several scripting languages, Groovy is designed to be scripting lang just for Java platform and lots of domain specific languages, for example jsp for web UI design.

    ReplyDelete