2018 Volume E101.D Issue 4 Pages 1180-1188
Large scale first-hand tweets motivate automatic event detection on Twitter. Previous approaches model events by clustering tweets, words or segments. On the other hand, event clusters represented by tweets are easier to understand than those represented by words/segments. However, compared to words/segments, tweets are sparser and therefore makes clustering less effective. This article proposes to represent events with triple structures called frames, which are as efficient as, yet can be easier to understand than words/segments. Frames are extracted based on shallow syntactic information of tweets with an unsupervised open information extraction method, which is introduced for domain-independent relation extraction in a single pass over web scale data. This is then followed by bursty frame element extraction functions as feature selection by filtering frame elements with bursty frequency pattern via a probabilistic model. After being clustered and ranked, high-quality events are yielded and then reported by linking frame elements back to frames. Experimental results show that frame-based event detection leads to improved precision over a state-of-the-art baseline segment-based event detection method. Superior readability of frame-based events as compared with segment-based events is demonstrated in some example outputs.