Shallow Parsing


(87777 Views)

If you wish to cite this work, please use this publication


[Run Demo]

Shallow Parsing is the process of identifying syntactical phrases, such as noun phrases, in natural language sentences. In this demonstration, the system identifies several kinds of chunks - phrases that are derived from parse trees of English sentences b flattening down the structure of the parse trees. These chunks provide an intermediate step to natural language understanding. Although identifying whole parse trees can provide deeper analyses of the sentences that identifying the chunks, it is a much harder problem. Given today's technology, machines can identify chunks more accurately than they can identify parse trees.

For example, the sentence "He reckons the current account deficiit will narrow to only 1.8 billion in September" has the following parse tree:

(S (NP He)
(VP reckons
(S (NP the current account deficit)
(VP (VP will narrow)
(PP to
(NP only 1.8 billion))
(PP in
(September))))))

Here, S represents a clause, NP is a noun phrase, VP is a verb phrase, and PP is a prepositional phrase. By flattening down the parse tree, we have:

(NP He)
(VP reckons)
(NP the current account deficit)
(VP will narrow)
(PP to)
(NP only 1.8 billion)
(PP in)
(NP september)

This task, therefore, involves identifying these non-overlapping and non-embedded phrases. Given the above sentence, the system will produce the output like this:

[NP He] [VP reckons] [NP the current account deficit] [VP will narrow] [PP to] 
[NP only 1.8 billion] [PP in] [NP September]

Our system consists of two stages. In the first stage, the system uses SNoW, a machine learning architecture, to learn to decide whether each word in a sentence should be a beginning of a phrase and whether each word in a sentence should be an end. Similarly speaking, this stage has to predict where to put open brackets "[" and closing brackets "]" in the sentence such as:

[NP He] [VP reckons] [NP the current account deficit] [VP will narrow] [PP to]
[NP only 1.8 billion] [PP in] [NP September]

However, this stage may output imperfect bracket which result in invalid phrasings:

[NP He] [VP reckons] [NP the current ] account deficit] [VP will narrow] [PP to] 
[NP only [NP 1.8 billion] [PP in] [NP September]

Therefore, in the second stage, an inference process in used to resolve this conflict. Associated with each of the brackets is the confidence of the predictions from the first stage. The inference process choose among these brackets to produce the final output that is a valid phrasing and also maximizes the overall confidence of the output.