Machine Translation/Statistics
Appearance
Statistical machine translation
[edit | edit source]Language models
[edit | edit source]Language models are used in MT for a) scoring arbitrary sequences of words (tokens) and b) given a sequence of tokens, they predict what token will likely to follow the sequence. Formally, language models are probability distributions over sequences of tokens in a given language.
N-gram models
[edit | edit source]Character-based models
[edit | edit source]Recently, it was shown that it is possible to use sub-words, characters or even bytes as basic units for language modelling[citation needed]. There are a few events focused particularly on such models and in general, processing language data on sub-word units, e.g. SCLem 2017.
Translation models
[edit | edit source]IBM models 1-5
[edit | edit source]Phrase-based models
[edit | edit source]Factored translation models
[edit | edit source]Syntax- and tree-based models
[edit | edit source]Synchronous phrase grammar
[edit | edit source]Parallel tree-banks
[edit | edit source]Syntactic rules extraction
[edit | edit source]Decoding
[edit | edit source]Beam search
[edit | edit source] This section is a stub. You can help Wikibooks by expanding it. |