Semantic parsing of sentences is an important task toward natural language understanding, and has immediate applications in tasks such information extraction and question answering. We study the task of semantic role labeling (SRL) in which, for each verb in a sentence, the goal is to identify all constituents that fill a semantic role, and to determine their roles, such as Agent, Patient or Instrument, and their adjuncts, such as Locative, Temporal or Manner. For example, given a sentence "I left my pearls to my daughter-in-law in my will.", the goal is to identify different arguments of the verb left which yields the output:
[A0 I] [V left] [A1 my pearls] [A2 to my daughter-in-law] [AM-LOC in my will].
Here A0 represents the leaver, A1 represents the thing left, A2 represents the benefactor, AM-LOC is an adjunct indicating the location of the action, and V determines the verb. SRL is a difficult task, and one cannot expect high levels of performance from either purely manual classifiers or purely learned classifiers. Rather, supplemental linguistic information must be used to support and correct a learning system. So far, machine learning approaches to SRL have incorporated linguistic information only implicitly, via the classifiers' features. The key innovation in our approach is the development of a principled method to combine machine learning techniques with linguistic and structural constraints by explicitly incorporating inference into the decision process.
In the machine learning part, the system we present here is composed of two phases. First, a set of argument candidates is produced using two learned classifiers--one to discover beginning positions and one to discover end positions of each argument type. Hopefully, this phase discovers a small superset of all arguments in the sentence (for each verb). In a second learning phase, the candidate arguments from the first phase are re-scored using a classifier designed to determine argument type, given a candidate argument.
Unfortunately, it is difficult to utilize global properties of the sentence into the learning phases. However, at the inference level it is possible to incorporate the fact that the set of possible role-labelings is restricted by both structural and linguistic constraints--for example, arguments cannot structurally overlap, or, given a predicate, some argument structures are illegal. The overall decision problem must produce an outcome that consistent with these constraints. We encode the constraints as linear inequalities, and use integer linear programming (ILP) as an inference procedure to make a final decision that is both consistent with the constraints and most likely according to the learning system.
Previous approaches to the SRL task have made use of a full syntactic parse of the sentence in order to define argument boundaries and to determine the role labels. In this work, following the CoNLL-2004 shared task definition, we assume that the SRL system takes as input only partial syntactic information, and no external lexico-semantic knowledge bases. Specifically, we assume as input resources a part-of-speech tagger and a shallow parser that can process the input to the level of based chunks and clauses. We do not assume a full parse as input.
In this demo, following the definition of the PropBank, and CoNLL-2004 shared task, there are six different types of arguments labelled as A0-A5 and AA. These labels have different semantics for each verb as specified in the PropBank Frame files. In addition, there are also 13 types of adjuncts labelled as AM-XXX where XXX specifies the adjunct type. In some cases, an argument may span over different parts of a sentence, the label C-XXX is used to specify the continuity of the arguments, as shown in the example below.
[A1 The pearls] , [A0 I] [V said] , [C-A1 were left to my daughter-in-law].
Moreover in some cases, an argument might be a relative pronoun that in fact refers to the actual agent outside the clause. In this case, the actual agent is labeled as the appropriate argument type, XXX, while the relative pronoun is instead labeled as R-XXX. For example,
[A1 The pearls] [R-A1 which] [A0 I] [V left] , [A2 to my daughter-in-law]