Wednesday, March 20, 2013

Language Design

In this post, I'll start to lay out how Toyl syntax will work.

The main purpose served by programming language syntax is to unambiguously describe the structure of the program.  In addition to this main purpose, programming languages have "syntactic salt" that helps to prevent unintentional errors, and "syntactic sugar" to make code easier to read and write.

In my experience, functional programming languages tend to follow different syntactic traditions than imperative languages (which usually fall into the Pascal or C traditions).  Part of this is a cultural matter, but part of it is practical as well.  Let us take, for example, a function that returns the nth Fibonacci number:  It could easily be written in C-like syntax as follows:
int fibonacci(int n) {
  return
    (
    n <= 1
    ? n
    : fibonacci(n - 2) + fibonacci(n - 1)
    );
}

Fig. 8.1: C-like syntax
In Lisp, it could be written like this:

(defun fibonacci (n)
  (if (< n 2)
    n
    (+ (fibonacci (- n 1)) (fibonacci (- n 2)))))

Fig. 8.2: Lisp syntax
In Haskell, one way to write it would be:
fibonacci :: Integer -> Integer
fibonacci n
    | n == 0 = 0
    | n == 1 = 1
    | n > 1 = fibonacci (n-1) + fibonacci (n-2)

Fig. 8.3: Haskell syntax
None of these is functionally superior to the others.  The C-like syntax is more "salty" than the others because it requires parentheses to enclose the function parameters, commas to delimit them, and a semicolon to end the declaration.  This is extra work for the programmer, but the payoff is that it gives the compiler more opportunities to identify programmer mistakes in the code.

The Lisp syntax is a little less salty than the C-like syntax, but it is still clear and unambiguous to the compiler.  However, both the Lisp and C-like syntax allow the indentation to "lie" about the structure of the code, because the indentation is optional.  In a large program written in either language the indentation could easily become out of sync with the actual structure of the program and conceal a flaw in the logic.

In Haskell, the indentation is syntactically important, and conveys the same information as the block-enclosing tactics of Lisp and C.  When I first encountered a language that treated indentation as meaningful, it made me a little uncomfortable because I had been accustomed to whitespace being ignored by the compiler, and so free for the programmer to use as a form of documentation.  In particular, I had worked a lot in Unix where some programmers might use tabs and others might use spaces for indentation, and different editors assign different relative sizes to the tab as compared to the space.

However, I love indentation as documentation of program structure. I think I will design a syntax for Toyl that will work with or without indentation, but let the compiler warn, by default, if the indentation is lying about the program structure. Here is a preliminary prototype of what the Fibonacci function might look like in Toyl:
fibonacci = int function (int n) {
    n < 0: Exception("n may not be negative");
    n < 2: n;
    otherwise: recurse(n - 1) + recurse(n - 2);
}

Fig. 8.4: Preliminary Toyl prototype
In the next post I'll outline the basics of Toyl syntax using Backus-Naur form.

No comments:

Post a Comment