Friday, August 30, 2013

New York Spring Morning

A gilt and rusted nest of steel,
stone and brick and concrete,
haunted by its own electric soul,
awaits the tangled light of early spring.

Throughout the restless night,
heavy boats have cut across the bay
leaving scars upon its cold
and gray and gleaming skin.

But every sin committed
on the landscape by this city
suddenly appears to be forgiven
with the rising of the sun.

“Live again, and work.”
A million clocks have sprung.
The ragged threads of countless dreams
that clung to countless minds
are brushed away.

In unison they rise,
they wash,
they work.

The moon alone is left to take its subtle time,
to wander narrow strips of sky,
to climb the steel lattices and gaze away
across the moving clamor of the day.

Monday, August 19, 2013

Recursive call optimization

In a previous post, I wrote the map function in F#, compiled it, and disassembled it, and expressed my disappointment that it was recursive.  What I had hoped for was something like tail call optimization.

Often, in my imperative programming career, I have had to choose between implementing a process in a recursive way or a looping way.  In one language that I work with, the number of nested blocks allowed is limited, so I must be very careful about how I build recursive solutions.  I usually view the loop implementation as inferior because it lacks the beauty of the recursive one, but usually it is more efficient.  Since functional programming relies extensively on recursion, however, it is often optimal to turn recursive function calls into loops at compile time.

Functions require memory to process calculations.  For convenience, I'll refer to the units of memory used as variables, even if (strictly speaking) they are not variable.  The effective scope of a variable begins when it is assigned a value, and it ends at the last point that the value is referenced.  Since some control branches of a function may not reference a variable, the scope may cover parts of the expression tree, rather than a linear range of lines of code.
If no variables are in scope at the time of a recursive call, then the recursive call can be implemented as a pair of loops.  The first loop of the pair represents the descent into the recursive call, while the second loop represents the return.
Since most variables are out of scope by the end of a function, a recursive call at the end of the function can usually be implemented as a single loop (since nothing is done on return), which is called tail call optimization.  However, the general rule allows recursive calls to be implemented as a pair of loops anywhere, provided the recursive call falls outside the scope of all other variables.

Here is a contrived example of a recursive C# method that can be optimized in this way:

public int TestFunction(int n) {
  // Variable r is defined here, but not in scope yet,
  // because its initial value will not be used
  int r;

  // Start of scope for t
  int t = n * n + n;

  // Start of scope for j
  int j = 2 * t - 13;

  // Recursion should stop somewhere
  if (t < 100000 && j < 100000) {
    // Scope of t and j ends here on this execution branch.
    // Scope of r starts here
    r = TestFunction(t + j) - 7;
  } else {
    // Scope of t and j ends here on this execution branch.
    // Scope of r starts here
    r = t - j;
  }

  // Scope of r ends here
  // Scope of k starts here
  int k = 3 * r;

  // Scope of k ends here
  return k;
}

The optimized implementation of this recursive function is as follows:

public int TestFunction(int ntop) {
  int n;
  int t;
  int j;
  int r;
  int k;
  int loopCount = 0;

  // Descent loop
  n = ntop;
  while(true) {
    t = n * n + n;
    j = 2 * t - 13;
    if (t < 100000 && j < 100000) {
      n = t + j;
      ++loopCount;
    } else {
      break;
    }
  }

  // Bottom of call processing
  r = t - j;
  k = 3 * r;

  // Return loop
  while (loopCount > 0) {
    r = k;
    r -= 7;
    k = 3 * r;
    --loopCount;
  }

  return k;
}

Monday, August 12, 2013

Obfuscation in Chinese

I recently saw the following status posted on a social networking site:

铋顼褀轵廯朙哒返怼働薍

If you run this through Google Translate, you get gibberish, and rightly so, because it is gibberish. The key is that each character stands for another that sounds the same (or similar).  The plaintext message is:

bìxū qízhì xiānmíngde fǎnduì dòngluàn
必须奇志鲜明地反对动乱
"We must aspire to clearly oppose disorder"

This kind of obfuscation seems trivial at first, because any speaker of Chinese can probably see right through it--especially since the poster used characters that both looked similar and sounded similar. But it has an advantage:  Humans can understand it, yet simple search algorithms cannot.

If you needed to send a text message to another human being across a communication network that is monitored by a (potentially) hostile authoritarian government, without the opportunity to exchange keys or encryption algorithms ahead of time, this approach would let you slip past monitoring software that searches for keywords.

But it has a huge disadvantage if it is over-used.  Once the authorities figure out what you are doing, all they have to do is flag messages that use obfuscated versions of sensitive words (like 返怼 for "oppose"), and they will be able to select out the subset of messages that both contain sensitive content and are trying to hide it.

Thursday, August 8, 2013

Infinite lists and function composition in C#

Last night I was critical of the way that F# handled lists.  This morning, I decided to back up my critique by implementing the functionality I was talking about in C#.  I have put a simplified version of it in a gist.  This was easily one of the most rewarding mornings of coding I have had in a long time.

First, I created FunctionalList<T>, which can be initialized with a lambda expression.  So, where Haskell might have the following:

    let squares = [x | a <- [3,6..27], let x = a * a]
    show squares
    "[9,36,81,144,225,324,441,576,729]"

I can now do the following in C#:

    FunctionalList<int> squares = HigherOrderFunction<int, int>.Map
        (
        a => a * a,
        FunctionalList.Series(3, 6, 27)
        );
    Console.WriteLine(squares);
    [9, 36, 81, 144, 225, 324, 441, 576, 729]

So far, I have implemented the higher order functions Map, FoldLeft and FoldRight, as well as basic functionality for the lists such as Head, Tail, Take, Skip and Construct.  I can now do what I said F# should be able to do.  For the following Haskell expression:

    head $ map' (^2) [9,19..99999999]

I can write in C#:

    HigherOrderFunction<int, int>.Map
        (
        x => x * x,
        FunctionalList.Series(9, 19, 99999999)
        ).Head

Granted, my C# version is syntactically clunky, but it returns immediately with the correct result.  It does not generate a huge in-memory list, and it works by composing the square function (x => x * x) over the function that returns the elements of the underlying series (x => x * 10 + 9).

So there.

What did I expect?

In my last post, I expressed some disappointment that my version of the map function in F# processed every element of the list.  But why should I be disappointed?  Isn't that exactly what I asked it to do?

Here's why:  In Haskell, I can use my homegrown map function against a huge list, and get the head element of that list, as follows:

head $ map' (^2) [9,19..99999999]

This returns immediately with the answer "81".  If I try something similar in F#, I get a long pause and an out-of-memory exception.

So why doesn't Haskell do exactly what I told it to, and try to process every element of the list?  I've been playing with reproducing the behavior in C#, and here is what I think is going on:

As an imperative programmer, I expect my arrays (or lists, or whatever) to be data structures in memory.  When I try to get the nth element of a list, I expect the machine to start at the head, walk n elements, and then return whatever it finds.  However, in the case of something like [9,19..99999999], that model is pointless.  The reason is that the value I find at the nth element is dependent on n, so it is really more like a function, λ n → 10n + 9.

If we treat this huge list-like thing as a function instead of a data structure, then mapping another function onto it is just a matter of composition.  That's the magic I was hoping to find hidden in the FSharpList class, but I did not find it.

One last critique of the way my F# version of map was compiled:  It compiled into a recursive method, but that will end up using space on the stack every time it is called, and it could have been done more efficiently as a loop.  It is possible that the JIT compiler will resolve this when the procedure is actually rendered into runtime code, but I am not sure about that.

Tuesday, August 6, 2013

map in F# and IL assembly

I played with F# last night, and it made me realize how mature Haskell is compared to F#. It's too bad you can't compile Haskell code into a .NET assembly.

The map function in F# is pretty concise, compared to the wordy C# version.  Here it is:

let rec map (f : ('a -> 'b), ps : 'a list) =
    if ps = list.Empty
        then list.Empty
        else list.Cons(f(ps.Head), map(f, ps.Tail))

This is not so terribly different from the Haskell version of map, which could be written as follows:

map' :: (t -> a) -> [t] -> [a]
map' _ [] = []
map' f (p:ps) = (f p) : (map' f ps)

There are a number of differences (including the F# requirement that we use rec to indicate a recursive function), but the important similarities are that we can write a higher order function, and we have generic types.

So, what does the F# version look like in IL assembly?  To begin with, the method header is reassuring, because it shows that the generic type parameters go all the way down to the IL assembly level:

.method public static class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!b> map<a,b>
(
class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!a,!!b> f,
class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!a> p
) cil managed

We can see that F# provides its own list and function objects, which makes sense given that this functionality isn't standard for the rest of the .NET world.  However, the body of the function reveals something disappointing.  It processes the whole list!  Rather than paste the entire function body here, I'll just write a brief pseudocode paraphrase of it:

IL_0002: push ps.Length onto the stack
IL_0007: if non-zero branch to IL_0013
IL_000d: push an empty list onto the stack
IL_0012: return
IL_0013: push the head of ps onto the stack
IL_001a: invoke function f and push the result onto the stack
IL_0021: push the tail of ps onto the stack
IL_0026: call map and push the result onto the stack
IL_002b: call cons and push the result onto the stack
IL_0030: return

So, from what I have seen here, though F# may look like Haskell on the surface, underneath the covers it is far less advanced.  However, the IL asm natively supports some functionality (like generic type parameters) that suggest it would be possible to build a really powerful functional language that compiled into IL asm.

Monday, August 5, 2013

map in C#

I was recently at a meeting where someone remarked that you could not write a higher-order function like map in C# because of strong typing.  In fact, you can write the map (and similar) functions in C# using generic type parameters, but there are other weaknesses.

For the curious, here is one way to write map in C#:

class HigherOrderFunction<T1, T2> {
  public delegate T2 function(T1 p);

  public static IEnumerable<T2> Map(function f, IEnumerable<T1> ps) {
    List<T2> result = new List<T2>();

    foreach (T1 p in ps) {
      result.Add(f(p));
    }

    return result;
  }
}

The problem with this function is that it is not lazy.  If the input array is huge, but you only need a few values out of it, you still have to process all of the other values before you can get the one you want. If you want lazy evaluation here, the code becomes very convoluted.

In my next post, I'll write the map function in F#, compile it, disassemble it, and see how it works under the covers.

Saturday, August 3, 2013

North Korea: Nomadic Empire in a Bottle

I've been trying to understand North Korea lately.  It occurred to me last night that North Korea is a lot like a nomadic empire, but one that is so constrained geographically and politically that it cannot expand.  I wonder how it will evolve.

If you read the secret histories of the Mongols and the Manchus, you find that Genghis Khan and Nurhaci rose to power in similar ways.  Each one felt betrayed again and again by the outside world, to the point that his only recourse for justice was conquest, and Heaven pitied him and granted him victory.  Both were able to transform the fabric of their societies, building a military order that was integrated into the social order.

The narrative in North Korea seems similar: betrayed and humiliated by Japan and the West, the Glorious Leader formed the Juche Idea and the Songun policy.  The fabric of society has been transformed, and the military order is integrated into the social order.  But where does a nomadic empire go when it can't expand?  How many generations will the army stand, unable to move?

Kim Jong-un seems to be trying to reform North Korea's economy, but the Juche Idea permeates society, and probably shapes the way he thinks.  How can this situation possibly evolve from here?

Thursday, August 1, 2013

A good secret language

Just for fun, I thought I would say something about what could make a good secret language.

A secret language would be "good" to the extent that it met its requirements.  The obvious first requirement of a secret language is that it protect the communication that it encodes, so it should be different from known public languages.  Beyond mere strangeness, however, there are a number of supporting characteristics that a secret language might have.

1. Learnability.  In order to be used, a secret language must be learned.  The easier it is to learn, the more quickly it may be used.  Learnability is improved by:

1.1 Regularity.  Irregular forms make languages harder to learn, so a language with regular rules is easier to learn.

1.2 Phonological and Syntactic Familiarity.  If speakers of a secret language share a common public language, then the secret language will be easier to learn if it shares the phonology and syntax of the public language.  This occurs in Anglo-Romani and Thieves' Cant.

1.3 Redundancy.  A natural language makes relatively efficient use of the available phonological space for words.  For example, in Chinese, there is at least one morpheme for almost every possible Chinese monosyllable.  However, since the secret language is learned as a second language, it will benefit from making morphemes more distinct from each other than you would normally find in a natural language.  This occurs in Chinook Jargon.

2. Extensibility.  The core lexicon for a secret language may be relatively small, with a powerful derivational morphology to extend the lexicon over its intended domain of communication.  This makes the language easier to learn, because there are fewer morphemes to memorize, but also makes it possible to invent new terminology on the spot with a high likelihood that you will be understood.  I don't know of a secret language "in the wild" that does this.

3. Obfuscation. Some phonological features of natural languages could hypothetically be used for obfuscation.  Nearly all of these would come at the expense of lost redundancy.  For example:

3.1 Avoiding Labials.  If you wanted a secret language that could not be lip-read, you would avoid labials (including rounded vowels).  I have read a couple descriptions of the Chinook language that say Chinook speakers would hardly move their lips at all.

3.2 Avoiding Voiceless Fricatives, Affricates and Aspirates.  If you wanted a secret language whose sounds would not carry far, you might avoid sounds that are accompanied by high frequency formants, which carry farther.