skip to main content
10.1145/3152494.3152508acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

Syncretic matching: story similarity between documents

Published: 11 January 2018 Publication History

Abstract

In several document matching applications like comparing across judgments, patent claims or movie plots, conventional bag-of-words models are insufficient. Bag of words are useful for computing lexical similarity; while in this case, there is a need to understand similarity with respect to the underlying narrative or "story." We call this the Syncretic matching problem. While bag-of-words can be enhanced by using techniques like dimensionality reduction or topic models, the syncretic matching problem is more involved. It requires modeling the underlying semantic "story" and comparing structural similarities across stories. In this paper, we address the problem of narrative similarity computation for given pair of input documents. The approach utilizes a general knowledge base in the form of a term co-occurrence graph (TCG) computed from all articles in Wikipedia, to help in creating a story model for comparison.

References

[1]
Serge Abiteboul, Mihai Preda, and Gregory Cobena. 2003. Adaptive on-line page importance computation. In Proceedings of the 12th international conference on World Wide Web. ACM, 280--290.
[2]
Tory S Anderson. 2015. From Episodic Memory to Narrative in a Cognitive Architecture. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.
[3]
Rie Kubota Ando. 2000. Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 216--223.
[4]
Daniel Bär, Torsten Zesch, and Iryna Gurevych. 2011. A Reflective View on Text Similarity. In RANLP. 515--520.
[5]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022.
[6]
Sorana-Daniela Bolboaca and Lorentz Jäntschi. 2006. Pearson versus Spearman, Kendall's tau correlation analysis on structure-activity relationships of biologic active compounds. Leonardo Journal of Sciences 5, 9 (2006), 179--200.
[7]
Thanyaporn Boonyoung and Anirach Mingkhwan. 2015. Document Similarity using Computer Science Ontology based on Edge Counting and N-Grams. In proceeding of the 15th Annual PostGraduate Symposium on the Convergence of Telecommunication, Networking and Broadcasting, PG NET. 23--24.
[8]
Fritz Breithaupt, Eleanor Brower, and Sarah Whaley. 2015. Optimal Eventfulness of Narratives. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.
[9]
Andrew Carstairs-McCarthy. 2013. Allomorphy in inflexion. Routledge.
[10]
Yun-Gyung Cheong and R Michael Young. 2006. A Computational Model of Narrative Generation for Suspense. In AAAI. 1906--1907.
[11]
Hung Chim and Xiaotie Deng. 2008. Efficient phrase-based document similarity for clustering. Knowledge and Data Engineering, IEEE Transactions on 20, 9 (2008), 1217--1229.
[12]
Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. JAsIs 41, 6 (1990), 391--407.
[13]
Chris HQ Ding. 1999. A similarity-based probability model for latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 58--65.
[14]
David Elson. 2012. DramaBank: Annotating Agency in Narrative Discourse. In LREC. 2813--2819.
[15]
David K Elson. 2012. Detecting story analogies from annotations of time, action and agency. In Proceedings of the LREC 2012 Workshop on Computational Models of Narrative, Istanbul, Turkey.
[16]
Matthew P Fay. 2012. Story comparison via simultaneous matching and alignment. In The Third Workshop on Computational Models of Narrative. 100--104.
[17]
Mark Alan Finlayson and Patrick Henry Winston. 2005. Intermediate features and informational-level constraint on analogical retrieval. In Proceedings of the twenty-seventh annual meeting of the cognitive science society. Stresa, Italy.
[18]
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. In IJCAI, Vol. 7. 1606--1611.
[19]
Wael H Gomaa and Aly A Fahmy. 2013. A survey of text similarity approaches. International Journal of Computer Applications 68, 13 (2013), 13--18.
[20]
Weiwei Guo, Hao Li, Heng Ji, and Mona T Diab. 2013. Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media. In ACL (1). Citeseer, 239--249.
[21]
Samer Hassan and Rada Mihalcea. 2011. Semantic Relatedness Using Salient Semantic Analysis. In AAAI.
[22]
Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 50--57.
[23]
Lan Huang, David Milne, Eibe Frank, and Ian H Witten. 2012. Learning a concept-based document similarity measure. Journal of the American Society for Information Science and Technology 63, 8 (2012), 1593--1608.
[24]
Krisztina Kehl-Bodrogi, Barbara Kellner Heinkele, and Anke Otter Beaujean. 1997. Syncretistic Religious Communities in the Near East: Collected Papers Od the International Symposium" Alevism in Turkey and Comparable Syncretistic Religious Communities in the Near East in the Past and Present" Berlin, 14--17 April 1955. Vol. 76. Brill.
[25]
Caryn Elizabeth Krakauer. 2012. Story retrieval and comparison using concept patterns. Master's thesis. Massachusetts Institute of Technology.
[26]
Sumant Kulkarni, Srinath Srinivasa, and Rajeev Arora. 2013. Cognitive modeling for topic expansion. In On the Move to Meaningful Internet Systems: OTM 2013 Conferences. Springer, 703--710.
[27]
Elektra Kypridemou and Loizos Michael. 2013. Narrative Similarity as Common Summary. In CMN. 129--146.
[28]
George Lakoff and Srini Narayanan. 2010. Toward a Computational Model of Narrative. In AAAI Fall Symposium: Computational Models of Narrative.
[29]
M Lee, Brandon Pincombe, and Matthew Welsh. 2005. An empirical evaluation of models of text document similarity. Cognitive Science (2005).
[30]
Yung-Shen Lin, Jung-Yi Jiang, and Shie-Jue Lee. 2014. A similarity measure for text classification and clustering. Knowledge and Data Engineering, IEEE Transactions on 26, 7 (2014), 1575--1590.
[31]
Loizos Michael. 2012. Similarity of Narratives. (2012), 105--113 pages.
[32]
Ben Miller, Ayush Shrestha, Jennifer Olive, and Shakthidhar Gopavaram. 2015. Cross-Document Narrative Frame Alignment. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.
[33]
Erik T. Mullar. 2013. Computational Models of Narratives. Sprache und Datenverarbeitung (International Journal for Language Data Processing) (2013).
[34]
Gereon Müller. 2004. A distributed morphology approach to syncretism in Russian noun inflection. In Proceedings of FASL, Vol. 12. 353--373.
[35]
Dong Nguyen, Dolf Trieschnigg, and Mariët Theune. 2014. Using crowdsourcing to investigate perception of narrative similarity. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 321--330.
[36]
Gottfried E Noether. 1981. Why kendall tau. Teaching Statistics 3, 2 (1981), 41--43.
[37]
Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity. In ACL (1). 1341--1351.
[38]
Aditya Ramana Rachakonda and Srinath Srinivasa. 2009. Finding the topical anchors of a context using lexical cooccurrence data. In CIKM '09. 1741--1746.
[39]
Aditya Ramana Rachakonda, Srinath Srinivasa, Sumant Kulkarni, and MS Srinivasan. 2014. A generic framework and methodology for extracting semantics from co-occurrences. Data & Knowledge Engineering 92 (2014), 39--59.
[40]
Nicolas Szilas. 2015. Towards Narrative-Based Knowledge Representation in Cognitive Systems. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.
[41]
Matt Thompson, Julian Padget, and Steve Battle. 2015. Governing Narrative Events With Institutional Norms. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.
[42]
George Tsatsaronis, Iraklis Varlamis, and Michalis Vazirgiannis. 2010. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37, 1 (2010), 1--40.
[43]
Songhao Zhu and Yuncai Liu. 2009. Automatic scene detection for advanced story retrieval. Expert Systems with Applications 36, 3 (2009), 5976--5986.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
January 2018
379 pages
ISBN:9781450363419
DOI:10.1145/3152494
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document similarity
  2. narrative similarity
  3. story matching
  4. story model
  5. story similarity
  6. syncretic matching

Qualifiers

  • Research-article

Conference

CoDS-COMAD '18

Acceptance Rates

CODS-COMAD '18 Paper Acceptance Rate 50 of 150 submissions, 33%;
Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 139
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media