Machine translation

The process of translating text from one language to another using computers instead of human translators.

Machine translation (MT) systems are unlike all of the other tools described in this glossary, because rather than assisting a human translator or other language professional in his or her work, MT ...


An n-gram is a subsequence of n number of (1, 2, 3, etc) items in a larger sequence. In an lm n-grams are sequences of tokens. In phrase tables and reordering tables, n-grams are sequences of pairs ...

parallel data

A linguistic corpus of two or more languages where each element in one language corresponds to an element with the same meaning in the other language(s). The original, authored language is identified ...

phrase table

A “phrase table” is a statistical description of a parallel corpus of source-target language sentence pairs. The frequencies that n-grams in a source language text co-occur with n-grams in a parallel ...


A “pipeline” is a toolchain of processes connected by standard streams, so that the output of each process (stdout) feeds directly as input (stdin) to the next one.

recaser model

A recaser model is a special translation model translates lower cased data to “natural” cased text (upper and lower casing).

reordering table

A “reordering table” contains the statistical frequencies that describe the changes in word order between source and target languages, such as “big house” versus “house big”. In practical terms, a ...

