Symbolic and algebraic algorithms

Applied Filters

People

Publications

Conferences

Publication Date

21 Results for: Book/Issue: KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,802,146 records)|Limit your search to The ACM Full-Text Collection (771,752 records)

Showing 1 - 20of21 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

Article
July 2002
Transforming classifier scores into accurate multiclass probability estimates
- Bianca Zadrozny,
- Charles Elkan
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 694–699https://doi.org/10.1145/775047.775151

Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outputs ...
596
4,047
Metrics
Total Citations596
Total Downloads4,047
Last 12 Months390
Last 6 weeks45
Get Access
Article
July 2002
Making every bit count: fast nonlinear axis scaling
- Leejay Wu,
- Christos Faloutsos
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 664–669https://doi.org/10.1145/775047.775146

Existing axis scaling and dimensionality methods focus on preserving structure, usually determined via the Euclidean distance. In other words, they inherently assume that the Euclidean distance is already correct. We instead propose a novel nonlinear ...
4
374
Metrics
Total Citations4
Total Downloads374
Last 12 Months2
Last 6 weeks0
Get Access
Article
July 2002
Discovery net: towards a grid of knowledge discovery
- V. Ćurčin,
- M. Ghanem,
- Y. Guo,
- M. Köhler,
- A. Rowe,
- J. Syed,
- P. Wendel
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 658–663https://doi.org/10.1145/775047.775145

This paper provides a blueprint for constructing collaborative and distributed knowledge discovery systems within Grid-based computing environments. The need for such systems is driven by the quest for sharing knowledge, information and computing ...
38
1,323
Metrics
Total Citations38
Total Downloads1,323
Last 12 Months2
Last 6 weeks0
Get Access
Article
July 2002
Item selection by "hub-authority" profit ranking
- Ke Wang,
- Ming-Yen Thomas Su
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 652–657https://doi.org/10.1145/775047.775144

A fundamental problem in business and other applications is ranking items with respect to some notion of profit based on historical transactions. The difficulty is that the profit of one item not only comes from its own sales, but also from its ...
38
537
Metrics
Total Citations38
Total Downloads537
Last 12 Months3
Last 6 weeks0
Get Access
Article
July 2002
Single-shot detection of multiple categories of text using parametric mixture models
- Naonori Ueda,
- Kazumi Saito
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 626–631https://doi.org/10.1145/775047.775140

In this paper, we address the problem of detecting multiple topics or categories of text where each text is not assumed to belong to one of a number of mutually exclusive categories. Conventionally, the binary classification approach has been employed, ...
31
602
Metrics
Total Citations31
Total Downloads602
Last 12 Months1
Last 6 weeks0
Get Access
Article
July 2002
A robust and efficient clustering algorithm based on cohesion self-merging
- Cheng-Ru Lin,
- Ming-Syan Chen
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 582–587https://doi.org/10.1145/775047.775133

Data clustering has attracted a lot of research attention in the field of computational statistics and data mining. In most related studies, the dissimilarity between two clusters is defined as the distance between their centroids, or the distance ...
14
819
Metrics
Total Citations14
Total Downloads819
Last 12 Months5
Last 6 weeks1
Get Access
Article
July 2002
SECRET: a scalable linear regression tree algorithm
- Alin Dobra,
- Johannes Gehrke
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 481–487https://doi.org/10.1145/775047.775117

Developing regression models for large datasets that are both accurate and easy to interpret is a very important data mining problem. Regression trees with linear models in the leaves satisfy both these requirements, but thus far, no truly scalable ...
58
940
Metrics
Total Citations58
Total Downloads940
Last 12 Months33
Last 6 weeks6
Get Access
Article
July 2002
A new two-phase sampling based algorithm for discovering association rules
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 462–468https://doi.org/10.1145/775047.775114

This paper introduces FAST, a novel two-phase sampling-based algorithm for discovering association rules in large databases. In Phase I a large initial sample of transactions is collected and used to quickly and accurately estimate the support of each ...
87
1,019
Metrics
Total Citations87
Total Downloads1,019
Last 12 Months13
Last 6 weeks6
Get Access
Article
July 2002
A theoretical framework for learning from a pool of disparate data sources
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 443–449https://doi.org/10.1145/775047.775111

Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic tool for some disease, one may wish to integrate data ...
29
599
Metrics
Total Citations29
Total Downloads599
Last 12 Months8
Last 6 weeks1
Get Access
Article
July 2002
Mining heterogeneous gene expression data with time lagged recurrent neural networks
- Yulan Liang,
- Arpad Kelemen
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 415–421https://doi.org/10.1145/775047.775106

Heterogeneous types of gene expressions may provide a better insight into the biological role of gene interaction with the environment, disease development and drug effect at the molecular level. In this paper for both exploring and prediction purposes ...
2
949
Metrics
Total Citations2
Total Downloads949
Last 12 Months0
Last 6 weeks0
Get Access
Article
July 2002
From run-time behavior to usage scenarios: an interaction-pattern mining approach
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 315–324https://doi.org/10.1145/775047.775095

A key challenge facing IT organizations today is their evolution towards adopting e-business practices that gives rise to the need for reengineering their underlying software systems. Any reengineering effort has to be aware of the functional ...
36
818
Metrics
Total Citations36
Total Downloads818
Last 12 Months4
Last 6 weeks1
Get Access
Article
July 2002
Efficient handling of high-dimensional feature spaces by randomized classifier ensembles
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 307–313https://doi.org/10.1145/775047.775093

Handling massive datasets is a difficult problem not only due to prohibitively large numbers of entries but in some cases also due to the very high dimensionality of the data. Often, severe feature selection is performed to limit the number of ...
7
679
Metrics
Total Citations7
Total Downloads679
Last 12 Months1
Last 6 weeks0
Get Access
Article
July 2002
Predicting rare classes: can boosting make any weak learner strong?
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 297–306https://doi.org/10.1145/775047.775092

Boosting is a strong ensemble-based learning algorithm with the promise of iteratively improving the classification accuracy using any base learner, as long as it satisfies the condition of yielding weighted accuracy > 0.5. In this paper, we analyze ...
60
636
Metrics
Total Citations60
Total Downloads636
Last 12 Months26
Last 6 weeks0
Get Access
Article
July 2002
Privacy preserving mining of association rules
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 217–228https://doi.org/10.1145/775047.775080

We present a framework for mining association rules from transactions consisting of categorical items where the data has been randomized to preserve privacy of individual transactions. While it is feasible to recover association rules and preserve ...
330
1,715
Metrics
Total Citations330
Total Downloads1,715
Last 12 Months25
Last 6 weeks2
Get Access
Article
July 2002
Pattern discovery in sequences under a Markov assumption
- Darya Chudova,
- Padhraic Smyth
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 153–162https://doi.org/10.1145/775047.775070

In this paper we investigate the general problem of discovering recurrent patterns that are embedded in categorical sequences. An important real-world problem of this nature is motif discovery in DNA sequences. We investigate the fundamental aspects of ...
24
1,149
Metrics
Total Citations24
Total Downloads1,149
Last 12 Months13
Last 6 weeks2
Get Access
Article
July 2002
Optimizing search engines using clickthrough data
- Thorsten Joachims
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 133–142https://doi.org/10.1145/775047.775067

This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant ...
2,350
10,491
Metrics
Total Citations2,350
Total Downloads10,491
Last 12 Months259
Last 6 weeks31
Get Access
Article
July 2002
Query, analysis, and visualization of hierarchically structured data using Polaris
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 112–122https://doi.org/10.1145/775047.775064

In the last several years, large OLAP databases have become common in a variety of applications such as corporate data warehouses and scientific computing. To support interactive analysis, many of these databases are augmented with hierarchical ...
48
1,451
Metrics
Total Citations48
Total Downloads1,451
Last 12 Months12
Last 6 weeks0
Get Access
Article
July 2002
DualMiner: a dual-pruning algorithm for itemsets with constraints
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 42–51https://doi.org/10.1145/775047.775054

Constraint-based mining of itemsets for questions such as "find all frequent itemsets where the total price is at least $50" has received much attention recently. Two classes of constraints, monotone and antimonotone, have been identified as very ...
76
576
Metrics
Total Citations76
Total Downloads576
Last 12 Months4
Last 6 weeks0
Get Access
Article
July 2002
MARK: a boosting algorithm for heterogeneous kernel models
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 24–31https://doi.org/10.1145/775047.775051

Support Vector Machines and other kernel methods have proven to be very effective for nonlinear inference. Practical issues are how to select the type of kernel including any parameters and how to deal with the computational issues caused by the fact ...
48
880
Metrics
Total Citations48
Total Downloads880
Last 12 Months15
Last 6 weeks2
Get Access
Article
July 2002
Scalable robust covariance and correlation estimates for data mining
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningPages 14–23https://doi.org/10.1145/775047.775050

Covariance and correlation estimates have important applications in data mining. In the presence of outliers, classical estimates of covariance and correlation matrices are not reliable. A small fraction of outliers, in some cases even a single outlier, ...
42
1,421
Metrics
Total Citations42
Total Downloads1,421
Last 12 Months24
Last 6 weeks3
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Transforming classifier scores into accurate multiclass probability estimates

Making every bit count: fast nonlinear axis scaling

Discovery net: towards a grid of knowledge discovery

Item selection by "hub-authority" profit ranking

Single-shot detection of multiple categories of text using parametric mixture models

A robust and efficient clustering algorithm based on cohesion self-merging

SECRET: a scalable linear regression tree algorithm

A new two-phase sampling based algorithm for discovering association rules

A theoretical framework for learning from a pool of disparate data sources

Mining heterogeneous gene expression data with time lagged recurrent neural networks

From run-time behavior to usage scenarios: an interaction-pattern mining approach

Efficient handling of high-dimensional feature spaces by randomized classifier ensembles

Predicting rare classes: can boosting make any weak learner strong?

Privacy preserving mining of association rules

Pattern discovery in sequences under a Markov assumption

Optimizing search engines using clickthrough data

Query, analysis, and visualization of hierarchically structured data using Polaris

DualMiner: a dual-pruning algorithm for itemsets with constraints

MARK: a boosting algorithm for heterogeneous kernel models

Scalable robust covariance and correlation estimates for data mining