Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Machine translation


 Contents
Machine translation (MT) is a form of translation where a computer program analyses the text in one language - the "source text" - and then attempts to produce another, equivalent text in another language - the target text - without human intervention.

Currently the state of machine translation is such that it involves some human intervention, as it requires a pre-editing and a post-editing phase. Note that in machine translation, the translator supports the machine and not the other way around.

Nowadays most machine translation systems produce what is called a "gisting translation" - a rough translation that gives the "gist" of the source text, but is not otherwise usable.

However, in fields with highly limited ranges of vocabulary and simple sentence structure, for example weather reports, machine translation can deliver useful results.

Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains.
Source: www.eamt.org, European Association for Machine Translation, EAMT, 1997.

1 Machine translation vs. Computer-assisted translation

Although the two concepts are similar, machine translation (MT) should not be confused with computer-assisted translation (CAT) (also known as machine-assisted translation (MAT)).

In machine translation, the translator supports the machine, that is to say that the computer or program translates the text, which is then edited by the translator, whereas in computer-assisted translation, the computer program supports the translator, who translates the text himself, making all the essential decisions involved.

2 Introduction

The translation process, whether for translation per se or for interpreting, can be stated simply as:

  1. Decoding the meaning of the source text, and
  2. Re-encoding this meaning in the target language.

Behind this simple procedure there lies a complex cognitive operation. For example, to decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process which requires in-depth knowledge of both the grammar, semantics, syntax, idioms and the like of the source language, as well as the culture of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language.

Therein lies the challenge in machine translation: how to program a computer to "understand" a text as a human being does and also to "create" a new text in the source language that "sounds" as if it has been written by a human.

This problem can be tackled in a number of ways.

3 Linguistic approaches

It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first. However, a number of heuristic methods of machine translation are also used, including:

Generally, rule-based methods (the first three) parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. These methods require extensive lexiconA lexicon is a list of words together with additional word-specific information, i. a dictionary. In linguistics, a lexicon has a slightly more specialized definition, as it includes the lexemes used to actualize words. Lexemes are formed according to mors with morphologic, syntactic, and semantic information, and large sets of rules.

Statistical-based and example-based methods eschew manual lexicon building and rule-writing and instead try to generate translations based on bilingual text corpora, such as the Canadian HansardHansard is the traditional name for the printed transcripts of parliamentary debates in the Westminster system of government. The Parliament had long been a highly secretive body. The official record of the actions of the House were publicly available, bu corpus, the English-French record of the Canadian parliament. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare.

Given enough data, most machine translation programs work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker (i.e. producing a "gisting translation"). The difficulty is getting enough data of the right kind to support the particular method. The large multilingual corpusIn law a corpus ( Latin: "body") is a set, a collection of documents and sources. See Corpus Juris Civilis. In linguistics, corpus (plural corpora is a large and structured set of texts (now usually electronically stored and processed). A corpus may conta of data needed for statistical methods to work is not necessary for the grammar based methods, for example. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.



Read more »

Non User