Machine Translation
Ron Graham
with John Grosh, Wolfgang Hees, Fred Klingener, Doug Milliken, and Mark Rogers
The benchmark for machine translation (as far as I know) is that attempt made at Carnegie Mellon back in the 60s to translate a Biblical phrase from English to Russian and back. It converted
"the spirit is willing, but the flesh is weak"
to
"the vodka is good, but the steak is lousy."
Here's the list of reasons computers have such difficulty translating between English and other languages:
  1. Irregularities. Even though Russian nouns and verbs have both masculine and feminine forms, they're pretty regular about it. English isn't regular in its construction or pronunciation.
  2. Idioms. Is there any other language on Earth that can translate expressions such as "the whole nine yards?" "kit and kaboodle?" (We generally don't use most idioms in technical rhetoric, but who can get completely away from it?)
  3. Reading between the lines. The ability to share mental imagery is language-dependent. What happens when you translate
    • any Monty Python performance
    • Alice in Wonderland
    • Anne of Green Gables
    • Harry Potter
    • Saturday Night Live
    • any corporate diversity statement?
  4. The sheer size of the vocabulary. Some languages won't translate English renderings of slang ("le Big Mac" in French, etc.), and may not translate some technical jargon either. I've heard that some terms basic to the Internet have different renderings in Hong Kong than in several different parts of China (for instance).
  5. Enumeration. Do you translate "the Three Stooges," "the Four Horsemen," or "the Seven Dwarfs" each as a single unit? Or as more than one of something? (I admit this is a small effect.)
  6. Cultural adaptations. The English language, as spoken in the USA, has regional variations. So does every large nation and many of the small ones.
  7. The wrong paradigm. In the early days of machine translation, developers were after breakthroughs instead of evolution; perfection instead of suitability; a big score instead of comprehension.

Translation really illustrates the classic battle the engineering writer fights between writing for a large possible audience and for a small certain audience; between using simplified language for readers and minimizing the writers' effort.

It is possible that BabelFish represents a shift in our history as far as machine translation is concerned: submit a Web page, get a translation.

Though literature uses rich figures of speech and shades of meaning, well-written scientific text mostly avoids the problems given above. How often do you find such well-written text?

When people translate technical documents, apart from the obligatory laughers, structure and content can come through the process clearly. This can be true with computer translation as well, though multi-lingual staff members may find that passing technical material through a translator like BabelFish is only a first step.

If the material the translation starts with is bad, fixing the translation may be more difficult than writing from scratch. Even human translators can make mistakes (that's where the laughers come from) -- how much more so the computers, which can't choose from multiple meanings? That's why it's possible for a computer to translate "hydraulic ram" into "water buffalo."

The "super-question": do we publish a single document and let users view it via translators, or do we translate ourselves and then fix and maintain our own translations? You may find you want to have some control over the result of translation.

References

CyCorp is keeping a low profile about its work in this area. They work not so much on translation per se as in determining context.


What you can do
  1. If your audience is international, write in language as plain as possible.
  2. Gain some familiarity with another language. It'll help you write with more sensitivity to international readers.
  3. Recognize the weaknesses inherent in machine translation. Keep people involved in the process as much as possible.

[Table of Contents] [Previous] [Next]