Google Scholar

Web-scale surface and syntactic n-gram features for dependency parsing

D Ng, M Bansal, JR Curran - arXiv preprint arXiv:1502.07038, 2015 - arxiv.org

arXiv preprint arXiv:1502.07038, 2015•arxiv.org

We develop novel first-and second-order features for dependency parsing based on the
Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from
scanned books. We also extend previous work on surface $ n $-gram features from Web1T
to the Google Books corpus and from first-order to second-order, comparing and analysing
performance over newswire and web treebanks. Surface and syntactic $ n $-grams both
produce substantial and complementary gains in parsing accuracy across domains. Our …

We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books. We also extend previous work on surface -gram features from Web1T to the Google Books corpus and from first-order to second-order, comparing and analysing performance over newswire and web treebanks. Surface and syntactic -grams both produce substantial and complementary gains in parsing accuracy across domains. Our best system combines the two feature sets, achieving up to 0.8% absolute UAS improvements on newswire and 1.4% on web text.

arxiv.org

Show moreShow less

Save Cite Cited by 4 Related articles All 3 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Web-scale surface and syntactic n-gram features for dependency parsing