The genome of a narrative

One of my hobbies is political forecasting. It's an interesting pursuit with many fascinating challenges, and one of them is the challenge of getting good information from unreliable sources.

The internet can be seen as a vast set of assertions of varying validity, produced and consumed by the "mindspace" of the networked world. Some forecasters try to get good information by aggregating many different assertions, on the principle that the process of aggregation will reduce the influence of errors. That is a good way to reduce noise, but it doesn't help when there are widespread misconceptions.

People don't like to change their minds, so very often the first idea they take up is the one they will stick with in the long run. That means that ideas which travel quickly can occupy territory in the mindspace ahead of ideas that travel more slowly. An idea travels quickly if it is easily passed on, so all it needs is to be simple, easy to explain, and make sense. Slower, more complex ideas lose the race.

I have also found that ideas carrying a strong emotional payload can effectively defend their territory in the mindspace against competitors. For example, Stars and Stripes recently published an article about a false story alleging that Obama wants to emasculate the US Marines by asking them to wear female covers. In this case, the falsehood triggers stronger emotions than the truth, so it gains and holds ground.

The end result is that the viability of an idea on the internet is not necessarily correlated with its truth, and a false idea may easily replicate enough to influence the results of aggregation.

To address this, I have adopted an approach that is similar to the narrative analysis used in the study of Folkloristics.  I try to identify the main narratives relating to a subject and trace the genealogy of each back to its original source (if possible). Then I attempt to explain why the original source released the narrative into the wild.

I am interested in the question of how (or whether) computational linguistics and other tools can be used to trace the genealogy of narratives on the internet. Among other things, I imagine this could lead to identifying large currents of thought--channels by which ideas spread from a small number of sources to a large audience.

