skip to main content
10.1145/2063576.2063588acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Finding relevant information of certain types from enterprise data

Published: 24 October 2011 Publication History

Abstract

Search over enterprise data is essential to every aspect of an enterprise because it helps users fulfill their information needs. Similar to Web search, most queries in enterprise search are keyword queries. However, enterprise search is a unique research problem because, compared with the data in traditional IR applications (e.g., text data), enterprise data includes information stored in different formats. In particular, enterprise data include both unstructured and structured information, and all the data center around a particular enterprise. As a result, the relevant information from these two data sources could be complementary to each other. Intuitively, such integrated data could be exploited to improve the enterprise search quality. Despite its importance, this problem has received little attention so far. In this paper, we demonstrate the feasibility of leveraging the integrated information in enterprise data to improve search quality through a case study, i.e., finding relevant information of certain types from enterprise data. Enterprise search users often look for different types of relevant information other than documents, e.g., the contact information of per- sons working on a product. When formulating a keyword query, search users may specify both content requirements, i.e., what kind of information is relevant, and type requirements, i.e., what type of information is relevant. Thus, the goal is to find information relevant to both requirements specified in the query. Specifically, we formulate the problem as keyword search over structured or semistructured data, and then propose to leverage the complementary unstructured information in the enterprise data to solve the problem. Experiment results over real world enterprise data and simulated data show that the proposed methods can effectively exploit the unstructured information to find relevant information of certain types from structured and semistructured information in enterprise data.

References

[1]
Billion Triple Challenge 2009 dataset: http://vmlion25.deri.ie/.
[2]
The ClueWeb09 Dataset: http://www.lemurproject.org/clueweb09/.
[3]
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A System for Keyword-Based Search over Relational Databases. In ICDE, pages 5--16, 2002.
[4]
J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo. Sources of Evidence for Vertical Selection. In SIGIR, pages 315--322, 2009.
[5]
P. Bailey, N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the TREC 2007 Enterprise Track. In Proceedings of TREC'07, 2007.
[6]
P. Bailey, D. Hawking, and B. Matson. Secure Search in Enterprise Webs: Tradeoffs in Efficient Implementation for Document Level Security. In CIKM, pages 493--502, 2006.
[7]
K. Balog. People Search in the Enterprise. In SIGIR, pages 916--916, 2007.
[8]
K. Balog, L. Azzopardi, and M. de Rijke. Formal Models for Expert Finding in Enterprise Corpora. In SIGIR, pages 43--50, 2006.
[9]
K. Balog and M. de Rijke. Non-Local Evidence for Expert Finding. In CIKM, pages 489--498, 2008.
[10]
K. Balog, A. P. de Vries, P. Serdyukov, P. Thomas, and T. Westerveld. Overview of the TREC 2009 Entity Track. In Proceedings of TREC'09, 2009.
[11]
K. Balog, I. Soboroff, P. Thomas, P. Bailey, N. Craswell, and A. P. de Vries. Overview of the TREC 2008 Enterprise Track. In Proceedings of TREC'08, 2008.
[12]
J. Brunnert, O. Alonso, and D. Riehle. Enterprise People and Skill Discovery Using Tolerant Retrieval and Visualization. In ECIR, pages 674--677, 2007.
[13]
N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the TREC 2005 Enterprise Track. In Proceedings of TREC'05, 2005.
[14]
M. Şah and V. Wade. Automatic Metadata Extraction from Multilingual Enterprise Content. In CIKM, pages 1665--1668, 2010.
[15]
N. Dalvi, R. Kumar, B. Pang, R. Ramakrishnan, A. Tomkins, P. Bohannon, S. Keerthi, and S. Merugu. A Web of Concepts. In PODS, pages 1--12, 2009.
[16]
H. Fang and C. Zhai. An Exploration of Axiomatic Approaches to Information Retrieval. In SIGIR, pages 480--487, 2005.
[17]
H. Fang and C. Zhai. Semantic Term Matching in Axiomatic Approaches to Information Retrieval. In SIGIR, pages 115--122, 2006.
[18]
S. Feldman and C. Sherman. The High Cost of Not Finding Information. In Technical Report No. 29127, IDC, 2003.
[19]
L. Freund and E. G. Toms. Enterprise Search Behaviour of Software Engineers. In SIGIR, pages 645--646, 2006.
[20]
H. Garcia-Molina, J. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice-Hall, 2008.
[21]
H. Halpin, D. Herzig, P. Mika, J. Pound, H. Thompson, and T. T. Duc. SemSearch Evaluation 2011: Semantic Search Challenge. In WWW, 2011.
[22]
D. Hawking. Challenges in Enterprise Search. In Proceedings of ADC'04, pages 15--24, 2004.
[23]
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style Keyword Search over Relational Database. In VLDB, pages 850--861, 2003.
[24]
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword Search in Relational Databases. In VLDB, pages 670--681, 2002.
[25]
A. Kale, T. Burris, B. Shah, T. L. P. Venkatesan, L. Velusamy, M. Gupta, and M. Degerattu. iCollaborate: Harvesting Value from Enterprise Web Usage. In SIGIR, pages 699--699, 2010.
[26]
M. Kolla and O. Vechtomova. Retrieval of Discussions from Enterprise Mailing Lists. In SIGIR, pages 881--882, 2007.
[27]
G. Koutrika, Z. M. Zadeh, and H. Garcia-Molina. Data Clouds: Summarizing Keyword Search Results over Structured Data. In EDBT, pages 391--402, 2009.
[28]
G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. EASE: an Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data. In SIGMOD, pages 903--914, 2008.
[29]
X. Li. Understanding the Semantic Structure of Noun Phrase Queries. In ACL, pages 1337--1345, 2010.
[30]
C. Macdonald and I. Ounis. Combining Fields in Known-Item Email Search. In SIGIR, pages 675--676, 2006.
[31]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[32]
C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
[33]
J. Peng, C. Macdonald, B. He, and I. Ounis. A Study of Selective Collection Enrichment for Enterprise Search. In CIKM, pages 1999--2002, 2009.
[34]
P. Serdyukov, H. Rode, and D. Hiemstra. Modeling Multi-step Relevance Propagation for Expert Finding. In CIKM, pages 1133--1142, 2008.
[35]
I. Soboroff, A. P. de Vries, and N. Craswell. Overview of the TREC 2006 Enterprise Track. In Proceedings of TREC'06, 2006.
[36]
A. Spink, D. Wolfram, M. Jansen, and T. Saracevic. Searching the Web: The Public and Their Queries. Journal of the American Society for Information Science and Technology, pages 226--234, 2001.
[37]
C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979.
[38]
S. Yang, J. Jin, and Y. Xiong. Using Weighted Tagging to Facilitate Enterprise Search. In ECIR, pages 590--593, 2010.

Cited By

View all

Index Terms

  1. Finding relevant information of certain types from enterprise data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
    October 2011
    2712 pages
    ISBN:9781450307178
    DOI:10.1145/2063576
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. enterprise search
    2. retrieval
    3. structured information
    4. type requirements
    5. unstructured information

    Qualifiers

    • Research-article

    Conference

    CIKM '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 22 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media