"Gist" translations by commercially
available software translate with only 70 to 80 percent
accuracy. Will statistical-analysis techniques improve that
performance?
RYAN REID
|
The
Allies emerged out of World War II victorious--only to face
the Cold War immediately afterwards. Code breakers in Britain
and the United States, buoyed by their computer-aided triumphs
in the former war, sought new breakthroughs by turning the
processing power of machines not on codes, but on languages.
The mathematical techniques that cracked secret Axis
communiqués, went the logic, could prove invaluable in
gleaning intelligence from countless reams of Russian science
and news text.
It's more than 50 years later, and no foolproof Star
Trek-style universal translator technology has yet
materialized. The time is nevertheless ripe for such automated
translation. The $5-billion-plus worldwide
translation-services market is overburdened already, and
demand is expected to grow to $7.6 billion by 2006 as the
Internet becomes more pervasive.
In one of the latest efforts to crack the codes of language
with machines, developers of a prototype translation
technology hope to challenge the industry with a radically
different technique. It essentially throws books in a blender
to see how the comparative phrases in different languages
stick back together again. Known as EliMT after its inventor
Eli Abir of Meaningful Machines in New York City, the
statistical technique may prove key not only to making machine
translation, or MT, more accurate, but also in quickly
rendering translations for languages that are currently
neglected by the corporate world.
"The EliMT method is clearly the most promising and
theoretically important MT development in the past several
years, and probably since the advent of MT itself," claims
machine translation expert Jaime Carbonell of Carnegie Mellon
University.
The Trouble With Translations
Machine-translation services provided for free via
Altavista's Babelfish or Google by industry leader Systran
allow for so-called gist translation, where the translation
provides the basic idea, with an error rate of 20 to 30
percent. For commercial applications, the extra time to polish
out the inaccuracies in gist translations can prove costly: a
professional human translator is paid some $20 per hour, and
many are so busy that by the time they are available to take
on the job, it may be too late to be useful in the cutthroat
realm of international finance.
Most commercial MT systems work in much the same manner as
how a person at a library might seek to translate a foreign
language. First the systems analyze the unfamiliar text. Then
they refer to the appropriate bilingual dictionaries and
grammar guides. In a way, these "rule-based" schemes are
similar to how someone would read a coded text, once that
person knew the rules of the code.
However, after working under that assumption, in the 1950s,
scientists quickly realized that natural languages are far
more complex than artificial codes. This is due in large part
to the problem of a how a word's meaning varies with context.
The word "cool" used in regards to temperature, for instance,
means something different when used by Fonzie. One apocryphal
tale dating back to crude, early machine-translation attempts
had the idiom "The spirit is willing but the flesh is weak"
translated from English to Russian and back again only to
yield "The vodka is good but the meat is rotten."
While rule-based MT has improved substantially since then,
it's not foolproof. It can take a team years to develop and
debug the algorithms to translate any two languages, and every
language pair is a whole new endeavor--an English-to-Chinese
system won't necessarily help translate Chinese to English or
English to Swahili. Since roughly 20 to 30 languages are key
economically, there are roughly 400 to 800 language pairs
necessary for global finance. So far on Babelfish, only 19
language pairs are available, and other rule-based products do
not offer many more options.
Statistics and Words
The EliMT technique works on a different strategy. Imagine
a group of people going into a library, looking up the novel
Crime and Punishment in the original Russian and then
borrowing every English translation of Dostoevsky's work. If
they compared how each sentence was translated, they could
find statistically that certain phrases were often interpreted
the same way. They could then stitch together a translation
for a new sentence by recycling pieces of old translations,
taking two halves of a sentence from different books. "Instead
of translating from word to word, you're translating from
sentence fragments to sentence fragments," says Steve Klein,
Fluent Machines' chairman and CEO.