Article

Aggregation-based feature invention and relational concept classes

Authors:

Claudia Perlich,

Foster ProvostAuthors Info & Claims

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 167 - 176

https://doi.org/10.1145/956750.956772

Published: 24 August 2003 Publication History

Abstract

Model induction from relational data requires aggregation of the values of attributes of related entities. This paper makes three contributions to the study of relational learning. (1) It presents a hierarchy of relational concepts of increasing complexity, using relational schema characteristics such as cardinality, and derives classes of aggregation operators that are needed to learn these concepts. (2) Expanding one level of the hierarchy, it introduces new aggregation operators that model the distributions of the values to be aggregated and (for classification problems) the differences in these distributions by class. (3) It demonstrates empirically on a noisy business domain that more-complex aggregation methods can increase generalization performance. Constructing features using target-dependent aggregations can transform relational prediction tasks so that well-understood feature-vector-based modeling algorithms can be applied successfully.

References

[1]

J. M. Aronis and F. J. Provost. Efficiently constructing relational features from background knowledge for inductive machine learning. In U. Fayyad and R. Uthurusamy, editors, In Working Notes of the AAAI-94 Workshop on Knowledge Discovery in Databases (KDD-94), pages 347--358, 1994.]]

[2]

Hendrik Blockeel and Luc De Raedt. Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1--2):285--297, 1998.]]

[3]

A. P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. In Pattern Recognition, volume 30(7), pages 1145--1159, 1997.]]

[4]

L. De Raedt, H. Blockeel, L. Dehaspe, and W. Van Laer. Three companions for data mining in first order logic. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 105--139. Springer-Verlag, 2001.]]

Digital Library

[5]

J. Fürnkranz. Dimensionality reduction in ILP: A call to arms. In Raedt L. and Muggleton S., editors, Proceedings of the IJCAI-97 Workshop on Frontiers of Inductive Logic Programming, 1997.]]

[6]

D. Jensen and P. R. Cohen. Multiple comparisons in induction algorithms. In Machine Learning, volume 38, pages 309--338, 2000.]]

Digital Library

[7]

D. Jensen and J. Neville. Data mining in social networks. In Dynamic Social Networks Modeling and Analysis, 2002.]]

[8]

D. Jensen and J. Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In 19th International Conference on Machine Learning, 2002.]]

Digital Library

[9]

M. Kirsten, S. Wrobel, and T. Horvath. Distance based approaches to relational learning and clustering. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 213--232. Springer Verlag, 2001.]]

Digital Library

[10]

A. Knobbe, M. De Haas, and A. Siebes. Propositionalisation and aggregates. In LNAI, volume 2168, pages 277--288, 2001.]]

Digital Library

[11]

D. Koller and A. Pfeffer. Probabilistic frame-based systems. In AAAI/IAAI, pages 580--587, 1998.]]

Digital Library

[12]

S. Kramer, N. Lavrac, and P. Flach. Propositionalization approaches to relational data mining. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 262--291. Springer-Verlag, 2001.]]

Digital Library

[13]

S. Kramer, B. Pfahringer, and C. Helma. Stochastic propositionalization of non-determinate background knowledge. In International Workshop on Inductive Logic Programming, pages 80--94, 1998.]]

Digital Library

[14]

P. C. Mahalanobis. On the generalized distance in statistics. In Proc. Natl. Institute of Science of India, volume 12, pages 49--55, 1936.]]

[15]

T. Masters. Practical Neural Network Recipes in C++. San Diego: Academic Press, 1993.]]

Digital Library

[16]

E. McCreath. Induction in first order logic from noisy training examples and fixed example set size. In PhD Thesis, 1999.]]

Digital Library

[17]

S. H. Muggleton. Cprogol4.4: a tutorial introduction. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 105--139. Springer-Verlag, 2001.]]

[18]

S. H. Muggleton and L. DeRaedt. Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19 & 20:629--680, May 1994.]]

[19]

A. Popescul, L. H. Ungar, S. Lawrence, and D. M. Pennock. Structural logistic regression: Combining relational and statistical learning. In Proceedings of the Workshop on Multi-Relational Data Mining (MRDM-2002), pages 130--141. University of Alberta, Edmonton, Canada, July 2002.]]

[20]

F. Provost, C. Perlich, and S. Macskassy. Relational learning problems and simple models. In Lise Getoor and David Jensen, editors, Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data, pages 116--120, 2003.]]

[21]

J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Los Altos, California, 1993.]]

Digital Library

[22]

J. R. Quinlan and R. M. Cameron-Jones. Foil: A midterm report. In P. Brazdil, editor, Proceedings of the 6th European Conference on Machine Learning, volume 667, pages 3--20. Springer-Verlag, 1993.]]

Digital Library

[23]

M. Smith. Neural Networks for Statistical Modeling. Boston: International Thomson Computer Press, 1996.]]

Digital Library

[24]

A. Srinivasan and R. D. King. Feature construction with inductive logic programming: A study of quantitative predictions of biological activity aided by structural attributes. In S. Muggleton, editor, Proceedings of the 6th International Workshop on Inductive Logic Programming, pages 352--367. Stockholm University, Royal Institute of Technology, 1996.]]

Digital Library

Cited By

Stankova MPraet SMartens DProvost F(2020)Node classification over bipartite graphs through projectionMachine Learning10.1007/s10994-020-05898-0Online publication date: 28-Jul-2020
https://doi.org/10.1007/s10994-020-05898-0
Schouterden JDavis JBlockeel H(2020)LazyBum: Decision Tree Learning Using Lazy PropositionalizationInductive Logic Programming10.1007/978-3-030-49210-6_9(98-113)Online publication date: 5-Jun-2020
https://doi.org/10.1007/978-3-030-49210-6_9
Cao SYang XChen CZhou JLi XQi Y(2019)TitAntProceedings of the VLDB Endowment10.14778/3352063.335212612:12(2082-2093)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.14778/3352063.3352126
Show More Cited By

Index Terms

Aggregation-based feature invention and relational concept classes
1. Computing methodologies
  1. Machine learning

Recommendations

Relational tree ensembles and feature rankings
Abstract
As the complexity of data increases, so does the importance of powerful representations, such as relational and logical representations, as well as the need for machine learning methods that can learn predictive models in such ...
Distribution-based aggregation for relational learning with identifier attributes

Identifier attributes--very high-dimensional categorical attributes such as particular product ids or people's names--rarely are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative ...
Complex aggregates in relational learning
Recommender Systems

In relational learning, one learns patterns from relational databases, which usually contain multiple tables that are interconnected via relations. Thus, an example for which a prediction is to be given may be related to a set of objects that are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

August 2003

736 pages

ISBN:1581137370

DOI:10.1145/956750

Conference Chair:
Lise Getoor
University of Maryland, College Park
,
General Chair:
Ted Senator
DARPA
,
Program Chairs:
Pedro Domingos
University of Washington
,
Christos Faloutsos
Carnegie Mellon University

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD03

Sponsor:

KDD03: The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 24 - 27, 2003

Washington, D.C.

Acceptance Rates

KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
575
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Stankova MPraet SMartens DProvost F(2020)Node classification over bipartite graphs through projectionMachine Learning10.1007/s10994-020-05898-0Online publication date: 28-Jul-2020
https://doi.org/10.1007/s10994-020-05898-0
Schouterden JDavis JBlockeel H(2020)LazyBum: Decision Tree Learning Using Lazy PropositionalizationInductive Logic Programming10.1007/978-3-030-49210-6_9(98-113)Online publication date: 5-Jun-2020
https://doi.org/10.1007/978-3-030-49210-6_9
Cao SYang XChen CZhou JLi XQi Y(2019)TitAntProceedings of the VLDB Endowment10.14778/3352063.335212612:12(2082-2093)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.14778/3352063.3352126
Dong SLiu DOuyang RZhu YLi LLi TLiu J(2019)Second-Order Markov Assumption Based Bayes Classifier for Networked Data with HeterophilyIEEE Access10.1109/ACCESS.2019.2892757(1-1)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2892757
Kimmig AMihalkova LGetoor L(2015)Lifted graphical modelsMachine Language10.1007/s10994-014-5443-299:1(1-45)Online publication date: 1-Apr-2015
https://dl.acm.org/doi/10.1007/s10994-014-5443-2
Ferilli SFatiguso G(2015)WPI: Markov Logic Network-Based Statistical Predicate InventionFoundations of Intelligent Systems10.1007/978-3-319-25252-0_12(112-121)Online publication date: 30-Dec-2015
https://doi.org/10.1007/978-3-319-25252-0_12
Ferilli SFatiguso G(2015)An Approach to Predicate Invention Based on Statistical Relational ModelAI*IA 2015 Advances in Artificial Intelligence10.1007/978-3-319-24309-2_21(274-287)Online publication date: 17-Oct-2015
https://doi.org/10.1007/978-3-319-24309-2_21
Bayer INagel URendle S(2015)Graph Based Relational Features for Collective ClassificationAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-18032-8_35(447-458)Online publication date: 9-May-2015
https://doi.org/10.1007/978-3-319-18032-8_35
Schulte ORoutley K(2014)Aggregating predictions vs. aggregating features for relational classification2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)10.1109/CIDM.2014.7008657(121-128)Online publication date: Dec-2014
https://doi.org/10.1109/CIDM.2014.7008657
Verbeke WMartens DBaesens B(2014)Social network analysis for customer churn predictionApplied Soft Computing10.1016/j.asoc.2013.09.01714(431-446)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1016/j.asoc.2013.09.017
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents