In Search of a Language
Illustrated by COURTNEY COONS
We talk. A lot. And over the millennia, our ability to communicate has been at the heart of the development of society and technology.
Yet most of us simply talk without realizing how complicated it is to produce and understand speech. Indeed, those complications form one of the biggest challenges of modern computer science.
One thing that computers can’t grasp, yet which we find so easy to decode, is context. A computer can’t deal with the difference between the seemingly harmless “Time flies like an arrow” and “Fruit flies like a banana.”
Although computers might seem to have minds of their own (especially when they unexpectedly crash), computers are algorithmic: all they know how to do is execute orders. That algorithmic talent would be great if languages followed perfectly algorithmic patterns. Unfortunately for computers, language is very unalgorithmic. Irregular verbs are a good example. The reason why “swim” is conjugated to “swam,” “bring” to “brought,” and “plan” to “planned” in the past tense has more to do with linguistic evolution––the idiosyncratic history of the people who have spoken English––than with logic.
On the other hand, what if one could construct a language whose grammar were completely logical and unambiguous? Could computers then understand it?
Creating a logical language that computers can grasp is one of the main goals of engelangs, or engineered languages, which are a type of constructed language, or Conlang. (The image to the left is of the Conlang Flag, which shows the Tower of Babel with a sun rising behind it. Besides engelands, there are other types of constructed languages as well, which we’ll explore below.)
One subset of engelangs are specifically called loglangs, or logical languages. The first logical language was described in 1955, in an article in Scientific American by Dr. James Cooke Brown that detailed a language he called “Loglan,” which would become a milestone in the development of logical languages. (Others had constructed languages before––the first constructed language is said to have been created by a 12th century German nun––but until Brown’s publication, anyone seriously considering language invention had been judged as a crackpot, because of a long history of failure in previous attempts at language creation. Brown, however, was praised for his cool, scientific neutrality in the matter.) The purpose for which Brown designed the language didn’t relate to computers, although this would later become a goal of Loglan. His purpose was to test the Sapir-Whorf hypothesis. The hypothesis states, roughly, that our cognitive abilities are limited by the language(s) we speak. Brown believed that if one spoke a very “free” language, where ideas were clear and logical, then one's thoughts would be clearer, too.
The article itself describes many major aspects of logical languages, defining their cornerstones. For instance, based on what Brown called “first-order predicate” logic, every sentence takes the form of:
P(x1, x2, ….)
P stands for a predicate – a.k.a a verb – and the x-places stand for “arguments,” which are like nouns. Take for instance the sentence
“John gives Sam the box.”
In Brown’s predicate notation, the above sentence could take the form of:
give(John, Sam, the box).
Here, x1 is the subject, x2 is the indirect object, and x3 is the direct object. I’ll save for another article the explanation of how Brown turns this form into a sentence in Loglan, but for your interest, he would write “John gives Sam the box” as
“lu djan. donsu lu sam. le bakso.”
In logical languages like Loglan, a small set of purely logical connectives is also used rather than regular conjunctions such as “and”. Consider the fragment:
“John went to the window and...”
What comes next? In English, a noun phrase, such as, “the door,” or maybe even a whole other sentence, such as, “Mary opened the door.” Loglan, however, uses more than ten different words for “and” in order to express every variety of the idea––logical implication, disjunction, conjunction, biconditionality, etc.
Furthermore, in natural languages (at least in ones we understand), the sounds of the language unambiguously form words. Take for example the phrase “cargo shipment.” That phrase can be understood as containing between two and four words. Context is the only means by which we can determine the phrase’s real meaning. On the other hand, in Loglan, stress––the natural accentuation of certain syllables and consonant clusters and series of adjacent consonants––makes splitting up sounds into words perfectly algorithmic.
Despite its promising nature, interest in Loglan slowly dwindled over time, and today, ten years after Dr. Brown’s death, there are presumably no speakers of Loglan anymore.
It was in the late 1980s, though, when Loglan was still very much alive, that two loglanists, Bob LeChevalier and Nora Tansky, began work on a computer flashcard program with the purpose of having the computer learn the vocabulary of Loglan. When they began to sell the program, however, Dr. Brown sent them a cease-and-desist order, claiming that the lexicon of the language was his intellectual property. (Which brings up the question: who owns a language?) As matters worsened between LeChevalier and Brown, who had before then been good friends, one of LeChevlaier's supporters proposed in 1987 to recreate the entire language, in order to avoid legal action. Of course, the revolutionaries still wanted to call their budding language “Loglan,” as they believed that they were still working on Loglan proper. Indeed, at the time of the split's beginning, the language was simply named “Loglan-88.” Dr. Brown fervently tried to protect the word “Loglan” in a court of law. But in 1991, the ruling was made in favour of LeChevalier and his new organization, The Logical Language Group, which had been founded some three years earlier. Dr. Brown realized then that he could no longer control the language.
Today, a freely available child language of Loglan, called Lojban, is developing at an accelerated rate, and aims for human use more than the purely scientific purpose originally envisioned by Dr. Brown. (For the Linux and free software enthusiasts out there: there is one lojbanist who enjoys referring to Lojban as GNU/Loglan.) Lojban implements all the features of Loglan, making improvements in places where its predecessor seemed to be lacking. Today, Lojban enjoys a presumably higher number of members than Loglan ever did, as the lojbanic online community still continues to grow today.
Most importantly, Lojban's grammar is completely logical and unambiguous. A formal grammar has been written for it much like a grammar is written for the C-programming language, which is used chiefly for operating system design. The complexity of Lojban's grammar exceeds the complexity of C's, but the result is worth it. A computer program can take a piece of Lojban text and produce a language tree based on its syntax, showing where every constructed part of the language begins and ends. While computers have trouble parsing English, they excel at parsing Lojban, thanks to its grammatically rigid structure. In other words––success! Computers can understand language, at least when its in Lojban.
One might assume that expressing emotion in Lojban be very difficult, but, au contraire, a whole range of particles – little words like ui, iu, oi, and au – are allocated to express various emotional responses that may be interjected at any point in a sentence.
Natural language processing remains a major challenge in modern computer science. Computers are “perfect” in some sense, being of a purely logical nature, even though at times it might appear otherwise. This logical perfection renders them incapable of dealing with anything “imperfect” such as a natural language like English, Mandarin, or Swahili. Perhaps someday computers will have evolved enough to process even these imperfect languages, but until that day, only the grammatically unambiguous lanugages such as Lojban will be truly understood by computers.
For your interest and for the sake of completeness, I mentioned above that there were two other kinds of constructed languages (conlangs) besides engineered languages (engelangs).
The first of these other two are artlangs, which are invented for artistic and aesthetic purposes, or to fit a fictional universe. Klingon and Sindarin, (a type of elvish language,) for instance, are spoken respectively in the worlds of Star Trek and The Lord of The Rings. Along with Na'vi from Avatar, they are among the most well-known artlangs.
The second are auxlangs, to which Esperanto, the reportedly most widely-spoken constructed language, belongs. The primary goal of an auxlang is to become an auxiliary, secondary language for all––a universal language, if you will––that would ease international communication. At the moment, English comes closest to serving this universal function.