Data for Entity and Relation Recognition Experiments

Each file has several blocks, where each block denotes the information of the entities and relations in a sentence.

The format of each block is:

In the table of a sentence, each row represents an element (a single word or consecutive words) in the sentence. Meaningful columns include: Other columns can simply be ignored.

A relation descriptor has three fileds.


The following corpora have been used in D. Roth and W. Yih, "Probabilistic Reasoning for Entity & Relation Recognition" (abstract, PDF) COLING'02, Aug. 2002

kill.corp, m-kill.corp (updated by Mark Sammons) : sentences that have the kill relation.
birthplace.corp, m-birthplace.corp (updated by Mark Sammons) : sentences that have the born_in relation.
negative.corp : sentences that have NO relation.
all.corp : all the above three files compacted in one.


The following corpus has been used in D. Roth and W. Yih, "A Linear Programming Formulation for Global Inference in Natural Language Tasks" (abstract, PDF) CoNLL'04, May. 2004

conll04.corp : an updated corpus that has more relations, including located in, work for, organization based in, live in, and kill.