Data Mining Algorithms In R/Packages/RWeka/Weka tokenizers
Appearance
Description
[edit | edit source]R interfaces to Weka tokenizers.
Usage
[edit | edit source]AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)
Arguments
[edit | edit source]x, a character vector with strings to be tokenized.
control, an object of class Weka_control, or a character vector of control options, or NULL (default).
Details
[edit | edit source]AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.
NGramTokenizer splits strings into n-grams with given minimal and maximal numbers of grams.
WordTokenizers is a simple word tokenizer.
Value
[edit | edit source]A character vector with the tokenized strings.