skip to main content
10.1145/1516241.1516356acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

PicAChoo: a tool for customizable feature extraction utilizing characteristics of textual data

Published: 15 February 2009 Publication History

Abstract

Although documents have hundreds of thousands of unique words, only a small number of words are significantly useful for intelligent services. For this reason, feature extraction has become an important issue to be addressed in various fields, such as information retrieval, text mining, pattern recognition, etc. Numerous supporting tools for feature extraction are available, but most of them deal with text as a simple literal. Unfortunately, text is not just a literal, but a semantically significant unit including linguistic characteristics. So, we need customized extraction methods that consider the characteristics of source documents. PicAChoo stands for 'Pick And Choose', and it provides an environment which enables feature extraction methods using the structure of sentences and the part-of-speech information of words. Moreover, we suggest dynamic composition of different extraction methods without hard-coding.

References

[1]
Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Martin Scholz, and Timm Euler. 2006. YALE: rapid prototyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (Philadelphia, PA, USA), 935--940.
[2]
Hamish Cunningham, Yorick Wilks, Robert J. Gaizauskas, 1996, GATE: a General Architecture for Text Engineering, In Proceedings of the 16th conference on Computational linguistics, vol 2. 1057--1060.
[3]
Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, Sally Jo Cunningham, 1999, Weka: Practical Machine Learning Tools and Techniques with Java Implementations, Proceedings of ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, 192--196.
[4]
David D. Lewis, 1992, Feature selection and feature extraction for text categorization, Proceedings of the workshop on Speech and Natural Language, February 23-26, 1992 Harriman, New York.
[5]
Yiming Yang, Jan O. Pedersen, 1997, A Comparative Study on Feature Selection in Text Categorization, Proceedings of the 14th International Conference on Machine Learning, 412--420.
[6]
Wenqian Shang, Houkuan Huang, Haibin Zhu, Yongmin Lin, Youli Qu, Zhihai Wang, 2007, A novel feature selection algorithm for text categorization, Expert Systems with Applications: An International Journal, 1--5.
[7]
Shrikanth Shankar, George Karypis, 2000, A Feature Weight Adjustment Algorighm for Document Categorization, Proceedings of SIGKDD'00 Workshop on Text Mining.
[8]
William W. Cohen, Yoram Singer, 1999, Context-sensitive learning methods for text categorization, ACM Transactions on Information Systems, vol. 17, issue 2, 141--173.
[9]
Alessandro Moschitti, and Roberto Basili, 2004, Complex Linguistic Features for Text Classification: A Comprehensive Study, Lecture Notes in Computer Science, vol. 2997-2004, 181--196.
[10]
The Stanford Natural Language Processing Group, Stanford Log-linear Part-Of-Speech Tagger, http://nlp.stanford.edu/software/tagger.shtml
[11]
Martin Porter, The Porter Stemming Algorithm, http://tartarus.org/~martin/PorterStemmer/
[12]
http://ids.snu.ac.kr/wiki/PicAChoo_%28Pick_And_Choose%29

Cited By

View all

Recommendations

Reviews

Julien Velcin

In many applications that use text analysis, feature extraction is one of the most significant tasks. PicAChoo, the tool proposed in this paper, is a user-centered environment that considers characteristics of languages, such as structure and part of speech (POS). Unlike other widely used methods, such as RapidMiner or Weka, this environment is flexible, semi-automatic, and supports explicitly complex features through the notion of context. It is a shame that the authors do not explain more clearly (for instance, with a concrete illustration) what they mean precisely by those terms. The extraction of such features is of high interest but, unfortunately, it is not sufficiently detailed in this paper. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
February 2009
704 pages
ISBN:9781605584058
DOI:10.1145/1516241
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. complex feature
  2. customizable feature extraction
  3. feature storing model

Qualifiers

  • Research-article

Conference

ICUIMC '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 251 of 941 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media