public interface TextAnnotationBuilder
illinois-tokenizer
A class that implements this interface must create two views:
ViewNames.SENTENCE
and
ViewNames.TOKENS
.
To create a TextAnnotation
from pre-tokenized text (e.g. from training corpora) please use
BasicTextAnnotationBuilder
.
Modifier and Type | Field and Description |
---|---|
static String |
SPLIT_ON_DASH
define a configuration flag to specify behavior w.r.t.
|
Modifier and Type | Method and Description |
---|---|
TextAnnotation |
createTextAnnotation(String text)
A method for creating
TextAnnotation by
tokenizing the given text string. |
TextAnnotation |
createTextAnnotation(String corpusId,
String textId,
String text)
An overloaded version of
createTextAnnotation(String) which takes in a corpus Id and
text Id. |
TextAnnotation |
createTextAnnotation(String corpusId,
String textId,
String text,
Tokenizer.Tokenization tokenization)
A method for creating
TextAnnotation by
respecting the pre-tokenization of text passed as an instance of
Tokenizer.Tokenization . |
String |
getName() |
static final String SPLIT_ON_DASH
String getName()
TextAnnotation createTextAnnotation(String text) throws IllegalArgumentException
TextAnnotation
by
tokenizing the given text string.text
- Raw text stringIllegalArgumentException
TextAnnotation createTextAnnotation(String corpusId, String textId, String text) throws IllegalArgumentException
createTextAnnotation(String)
which takes in a corpus Id and
text Id. These strings can be used for bookkeeping.corpusId
- textId
- text
- Raw text string.IllegalArgumentException
TextAnnotation createTextAnnotation(String corpusId, String textId, String text, Tokenizer.Tokenization tokenization) throws IllegalArgumentException
TextAnnotation
by
respecting the pre-tokenization of text passed as an instance of
Tokenizer.Tokenization
.text
- Raw text stringtokenization
- An instance containing tokens, character offsets, and sentence
boundaries.IllegalArgumentException
Copyright © 2017. All rights reserved.