Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleJuly 2002
Transforming classifier scores into accurate multiclass probability estimates
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 694–699https://doi.org/10.1145/775047.775151Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outputs ...
- ArticleJuly 2002
Making every bit count: fast nonlinear axis scaling
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 664–669https://doi.org/10.1145/775047.775146Existing axis scaling and dimensionality methods focus on preserving structure, usually determined via the Euclidean distance. In other words, they inherently assume that the Euclidean distance is already correct. We instead propose a novel nonlinear ...
- ArticleJuly 2002
Discovery net: towards a grid of knowledge discovery
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 658–663https://doi.org/10.1145/775047.775145This paper provides a blueprint for constructing collaborative and distributed knowledge discovery systems within Grid-based computing environments. The need for such systems is driven by the quest for sharing knowledge, information and computing ...
- ArticleJuly 2002
Item selection by "hub-authority" profit ranking
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 652–657https://doi.org/10.1145/775047.775144A fundamental problem in business and other applications is ranking items with respect to some notion of profit based on historical transactions. The difficulty is that the profit of one item not only comes from its own sales, but also from its ...
- ArticleJuly 2002
Single-shot detection of multiple categories of text using parametric mixture models
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 626–631https://doi.org/10.1145/775047.775140In this paper, we address the problem of detecting multiple topics or categories of text where each text is not assumed to belong to one of a number of mutually exclusive categories. Conventionally, the binary classification approach has been employed, ...
-
- ArticleJuly 2002
A robust and efficient clustering algorithm based on cohesion self-merging
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 582–587https://doi.org/10.1145/775047.775133Data clustering has attracted a lot of research attention in the field of computational statistics and data mining. In most related studies, the dissimilarity between two clusters is defined as the distance between their centroids, or the distance ...
- ArticleJuly 2002
SECRET: a scalable linear regression tree algorithm
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 481–487https://doi.org/10.1145/775047.775117Developing regression models for large datasets that are both accurate and easy to interpret is a very important data mining problem. Regression trees with linear models in the leaves satisfy both these requirements, but thus far, no truly scalable ...
- ArticleJuly 2002
A new two-phase sampling based algorithm for discovering association rules
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 462–468https://doi.org/10.1145/775047.775114This paper introduces FAST, a novel two-phase sampling-based algorithm for discovering association rules in large databases. In Phase I a large initial sample of transactions is collected and used to quickly and accurately estimate the support of each ...
- ArticleJuly 2002
A theoretical framework for learning from a pool of disparate data sources
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 443–449https://doi.org/10.1145/775047.775111Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic tool for some disease, one may wish to integrate data ...
- ArticleJuly 2002
Mining heterogeneous gene expression data with time lagged recurrent neural networks
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 415–421https://doi.org/10.1145/775047.775106Heterogeneous types of gene expressions may provide a better insight into the biological role of gene interaction with the environment, disease development and drug effect at the molecular level. In this paper for both exploring and prediction purposes ...
- ArticleJuly 2002
From run-time behavior to usage scenarios: an interaction-pattern mining approach
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 315–324https://doi.org/10.1145/775047.775095A key challenge facing IT organizations today is their evolution towards adopting e-business practices that gives rise to the need for reengineering their underlying software systems. Any reengineering effort has to be aware of the functional ...
- ArticleJuly 2002
Efficient handling of high-dimensional feature spaces by randomized classifier ensembles
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 307–313https://doi.org/10.1145/775047.775093Handling massive datasets is a difficult problem not only due to prohibitively large numbers of entries but in some cases also due to the very high dimensionality of the data. Often, severe feature selection is performed to limit the number of ...
- ArticleJuly 2002
Predicting rare classes: can boosting make any weak learner strong?
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 297–306https://doi.org/10.1145/775047.775092Boosting is a strong ensemble-based learning algorithm with the promise of iteratively improving the classification accuracy using any base learner, as long as it satisfies the condition of yielding weighted accuracy > 0.5. In this paper, we analyze ...
- ArticleJuly 2002
Privacy preserving mining of association rules
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 217–228https://doi.org/10.1145/775047.775080We present a framework for mining association rules from transactions consisting of categorical items where the data has been randomized to preserve privacy of individual transactions. While it is feasible to recover association rules and preserve ...
- ArticleJuly 2002
Pattern discovery in sequences under a Markov assumption
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 153–162https://doi.org/10.1145/775047.775070In this paper we investigate the general problem of discovering recurrent patterns that are embedded in categorical sequences. An important real-world problem of this nature is motif discovery in DNA sequences. We investigate the fundamental aspects of ...
- ArticleJuly 2002
Optimizing search engines using clickthrough data
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 133–142https://doi.org/10.1145/775047.775067This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant ...
- ArticleJuly 2002
Query, analysis, and visualization of hierarchically structured data using Polaris
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 112–122https://doi.org/10.1145/775047.775064In the last several years, large OLAP databases have become common in a variety of applications such as corporate data warehouses and scientific computing. To support interactive analysis, many of these databases are augmented with hierarchical ...
- ArticleJuly 2002
DualMiner: a dual-pruning algorithm for itemsets with constraints
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 42–51https://doi.org/10.1145/775047.775054Constraint-based mining of itemsets for questions such as "find all frequent itemsets where the total price is at least $50" has received much attention recently. Two classes of constraints, monotone and antimonotone, have been identified as very ...
- ArticleJuly 2002
MARK: a boosting algorithm for heterogeneous kernel models
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 24–31https://doi.org/10.1145/775047.775051Support Vector Machines and other kernel methods have proven to be very effective for nonlinear inference. Practical issues are how to select the type of kernel including any parameters and how to deal with the computational issues caused by the fact ...
- ArticleJuly 2002
Scalable robust covariance and correlation estimates for data mining
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 14–23https://doi.org/10.1145/775047.775050Covariance and correlation estimates have important applications in data mining. In the presence of outliers, classical estimates of covariance and correlation matrices are not reliable. A small fraction of outliers, in some cases even a single outlier, ...