Web-scale surface and syntactic n-gram features for dependency parsing

D Ng, M Bansal, JR Curran - arXiv preprint arXiv:1502.07038, 2015 - arxiv.org
arXiv preprint arXiv:1502.07038, 2015arxiv.org
We develop novel first-and second-order features for dependency parsing based on the
Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from
scanned books. We also extend previous work on surface $ n $-gram features from Web1T
to the Google Books corpus and from first-order to second-order, comparing and analysing
performance over newswire and web treebanks. Surface and syntactic $ n $-grams both
produce substantial and complementary gains in parsing accuracy across domains. Our …
We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books. We also extend previous work on surface -gram features from Web1T to the Google Books corpus and from first-order to second-order, comparing and analysing performance over newswire and web treebanks. Surface and syntactic -grams both produce substantial and complementary gains in parsing accuracy across domains. Our best system combines the two feature sets, achieving up to 0.8% absolute UAS improvements on newswire and 1.4% on web text.
arxiv.org
Showing the best result for this search. See all results