BasicExample.java |
|
---|---|
You can download the java file here. |
|
Creating a TextAnnotation |
|
In Edison, different annotations over text are called Views, each of which is a graph of Constituents and Relations. All the Views of a given text are managed by an object called a TextAnnotation. One key assumption in Edison is that the views can be defined in terms of the tokens of the text. In other words, the TextAnnotation object fixes a tokenization for the text when an object is created and all the views are defined in terms of this tokenization. |
|
This example shows some ways that can be used to create a
|
|
We always need to specify the corpus and text identifiers. In the current version of Edison, these identifiers are not used to perform any book-keeping, but this could be introduced in future versions. |
|
The simplest way to define a |
|
Another way to create a |
|
Print the text. This prints the raw text that was used to create the TextAnnotation object. In the case where the second constructor is used, the text is printed whitespace tokenized. |
|
Print the tokenized text. The tokenization scheme is specified by the constructor, which in the first example defaults to the LBJ tokenizer, and in the second one is specified manually. |
|
Print the de-tokenized text. This uses a normalization scheme to pretty-print text. Also, this can be used to normalize tokenized and un-tokenized text and can, hence, be used as a key to Maps. Note: The detokenization scheme is an evolving one. It handles several punctuation-related oddities, but not all. |
|
Print the list of views that this text annotation
contains. This will print Notes:
|
|
Print the sentences. The |
|