Friday, December 23, 2005

Some Java programs are just LISP in disguise as well

The easiest way to write an expression parser that creates a syntax tree from a string is of course to use a suitable parser
generator. If that is not desirable for some reason, the second choice is a recursive-descent parser, a few
functions calling each other, walking over the string and keeping a pointer to the current position:

>
Tree parse(String expr) {
> return parseOr(expr, new int[] { 0 });
> }
>
> Tree parseOr(String expr, int pos[]) {
> … parseAnd(expr, pos) … pos[0]++; …
> }


This works, but passing the parameters to each function is redundant and using `pos[0]` is ugly. We can make the code nicer by using an object to hold the state:

>
class ParsingContext {
> private String expr;
> private int pos;
> ParsingContext(String s) { expr = s; }
> parse() { pos = 0; return parseOr(); }
> private parseOr() { … parseAnd() … pos++ … }
> }


Much better. It is stretching the object-oriented programming metaphor a bit, there is no real
"parsing context" object to model---I visualize parsing as a process with state, not as a state
with some ways to update it.

What we are really doing is

>
(let ((expr expr) (pos 0))
> (labels ((parse-or () … (parse-and) … (incf pos) … ))
> (parse-or)))


and only after writing the above I have realized that Pascal nested functions can do this as well.
(Remember Pascal?)

I was quite surprised---although I knew the language "could" do this, I would probably write the code
as a top-level functions passing parameters, like the first Java example. Using nested functions to keep shared data somehow didn't feel natural.

I became curious and started to look at Pascal tutorials and expression parsing tutorials.
This is what I have found:

- Some tutorials omit procedure/function nesting completely.
- Some mention it, usually describe the scoping rules, and provide examples that either don't use nested
variables or are just for the illustration (variable names `a`, `b`, `c`, ...) and show no practical application.
- Parsing tutorials usually show code that reads the input one token at a time, keeping one-token lookahead;
both the token reading function and the lookahead variable are global, if only because the parsers are standalone and not shown as a part of a larger program.

Another underappreciated Pascal feature is that you can pass nested procedures as procedure parameters
and the procedure can still access variables in its (static) parents. IIRC the Pascal course at my university doesn't mention this at all; nested procedures and procedure parameters are explained, leaving students to
discover this programming technique on their own.

1 comment:

  1. The pascal feature is great, but it requires a bit more complex stack management of the compiler. The static father of the procedure has possibly not the nearest stack element (in situations such as recursion...).

    ReplyDelete