Jump to content

Data Mining Algorithms In R/Packages/RWeka/Weka tokenizers

From Wikibooks, open books for an open world

Description

[edit | edit source]

R interfaces to Weka tokenizers.

Usage

[edit | edit source]

AlphabeticTokenizer(x, control = NULL)

NGramTokenizer(x, control = NULL)

WordTokenizer(x, control = NULL)

Arguments

[edit | edit source]

x, a character vector with strings to be tokenized.

control, an object of class Weka_control, or a character vector of control options, or NULL (default).

Details

[edit | edit source]

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into n-grams with given minimal and maximal numbers of grams.

WordTokenizers is a simple word tokenizer.

Value

[edit | edit source]

A character vector with the tokenized strings.