skip to main content
article

Markov logic networks

Published: 01 February 2006 Publication History

Abstract

We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a first-order formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudo-likelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach.

References

[1]
Bacchus, F. (1990). <i>Representing and reasoning with probabilistic knowledge</i>, Cambridge, MA: MIT Press.]]
[2]
Bacchus, F., Grove, A. J., Halpern, J. Y., & Koller, D. (1996). From statistical knowledge bases to degrees of belief. <i>Artificial Intelligence, 87</i>, 75-143.]]
[3]
Bergadano, F., & Giordana, A. (1988). A knowledge-intensive approach to concept induction. <i>Proceedings of the Fifth International Conference on Machine Learning</i> (pp. 305-317). Ann Arbor, MI: Morgan Kaufmann.]]
[4]
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. <i>Scientific American, 284: 5</i>, 34-43.]]
[5]
Besag, J. (1975). Statistical analysis of non-lattice data. <i>The Statistician. 24</i>, 179-195.]]
[6]
Buntine, W. (1994). Operations for learning with graphical models. <i>Journal of Artificial Intelligence Research, 2</i>, 159-225.]]
[7]
Byrd, R. H., Lu, P., & Nocedal, J. (1995). A limited memory algorithm for bound constrained optimization. <i>SIAM Journal on Scientific and Statistical Computing, 16</i>, 1190-1208.]]
[8]
Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. <i>Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data</i> (pp. 307-318). Seattle, WA: ACM Press.]]
[9]
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In <i>Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing</i>. Philadelphia, PA.]]
[10]
Cumby, C., & Roth, D. (2003). Feature extraction languages for propositionalized relational learning. <i>Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data</i> (pp. 24-31). Acapulco, Mexico: IJCAII.]]
[11]
Cussens, J. (1999). Loglinear models for first-order probabilistic reasoning. In <i>Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence</i> (pp. 126-133). Stockholm, Sweden: Morgan Kaufmann.]]
[12]
Cussens, J. (2003). Individuals, relations and structures in probabilistic models. In <i>Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data</i> (pp. 32-36). Acapulco, Mexico: IJCAII.]]
[13]
De Raedt, L., & Dehaspe, L. (1997). Clausal discovery. <i>Machine Learning, 26</i>, 99-146.]]
[14]
DeGroot, M. H., & Schervish, M. J. (2002). <i>Probability and statistics</i>. Boston, MA: Addison Wesley, 3rd edition.]]
[15]
Dehaspe, L. (1997). Maximum entropy modeling with clausal constraints. <i>Proceedings of the Seventh International Workshop on Inductive Logic Programming</i> (pp. 109-125). Prague, Czech Republic: Springer.]]
[16]
Della Pietra, S., Della Pietra, V., & Lafferty, J. (1997). Inducing features of random fields. <i>IEEE Transactions on Pattern Analysis and Machine Intelligence, 19</i>, 380-392.]]
[17]
Dietterich, T., Getoor, L., & Murphy, K. (Eds.). (2003). <i>Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields</i>. Banff, Canada: IMLS.]]
[18]
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. <i>Machine Learning, 29</i>, 103-130.]]
[19]
Džeroski, S., & Blockeel, H. (Eds.). (2004). <i>Proceedings of the Third International Workshop on Multi-Relational Data Mining</i>. Seattle, WA: ACM Press.]]
[20]
Džeroski, S., & De Raedt, L. (2003). Special issue on multi-relational data mining: The current frontiers. <i>SIGKDD Explorations</i>, 5.]]
[21]
Džeroski, S., De Raedt, L., & Wrobel, S. (Eds.). (2002). <i>Proceedings of the First International Workshop on Multi-Relational Data Mining</i>. Edmonton, Canada: ACM Press.]]
[22]
Džeroski, S., De Raedt, L., & Wrobel, S. (Eds.). (2003). <i>Proceedings of the Second International Workshop on Multi-Relational Data Mining</i>. Washington, DC: ACM Press.]]
[23]
Edwards, R., & Sokal, A. (1988). Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. <i>Physics Review</i> D (pp. 2009-2012).]]
[24]
Flake, G. W., Lawrence, S., & Giles, C. L. (2000). Efficient identification of Web communities. <i>Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</i> (pp. 150-160). Boston, MA: ACM Press.]]
[25]
Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models. In <i>Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence</i> (pp. 1300-1307). Stockholm, Sweden: Morgan Kaufmann.]]
[26]
Genesereth, M. R., & Nilsson, N. J. (1987). <i>Logical foundations artificial intelligence</i>. San Mateo, CA: Morgan Kaufmann.]]
[27]
Getoor, L., & Jensen, D. (Eds.). (2000). In <i>Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data</i>. Austin, TX: AAAI Press.]]
[28]
Getoor, L., & Jensen, D. (Eds.). (2003). In <i>Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data</i>. Acapulco, Mexico: IJCAII.]]
[29]
Geyer, C. J., & Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood for dependent data. <i>Journal of the Royal Statistical Society</i>, Series B, 54, 657-699.]]
[30]
Gilks, W. R., Richardson, S., & Spiegelhalter. D. J. (Eds.). (1996). <i>Markov chain Monte Carlo in practice</i>. London, UK: Chapman and Hall.]]
[31]
Halpern, J. (1990). An analysis of first-order logics of probability. <i>Artificial Intelligence, 46</i>, 311-350.]]
[32]
Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., & Kadie, C. (2000). Dependency networks for inference, collaborative filtering, and data visualization. <i>Journal of Machine Learning Research. 1</i>, 49- 75.]]
[33]
Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. <i>Machine Learning. 20</i>, 197-243.]]
[34]
Heckerman, D., Meek, C., & Koller, D. (2004). Probabilistic entity-relationship models. PRMs, and plate models. In <i>Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields</i> (pp. 55-60). Banff, Canada: IMLS.]]
[35]
Hulten, G., & Domingos, P. (2002). Mining complex models from arbitrarily large databases in constant time. In <i>Proceedings of the Eighth ACM SIGKDD International Conference on the Knowledge Discovery, and Data Mining</i> (pp. 525-531). Edmonton. Canada: ACM Press.]]
[36]
Jaeger, M. (1998). Reasoning about infinite random structures with relational Bayesian networks. <i>Proceedings of the Sixth International Conference on Priniciples of Knowledge Representation and Reasoning</i>, Trento, Italy: Morgan Kaufmann.]]
[37]
Jaeger, M. (2000). On the complexity of inference about probabilistic relational models. <i>Artificial Intelligence. 117</i>, 297-308.]]
[38]
Kautz, H., Selman, B., & Jiang, Y. (1997). A general stochastic approach to solving problems with hard and soft constraints. In D. Gu, J. Du & P. Pardalos (Eds.), <i>The satisfiability problem: Theory and applications</i>, (pp. 573-586). New York, NY: American Mathematical Society.]]
[39]
Kersting, K., & De Raedt, L. (2000). Towards combining inductive logic programming with Bayesian networks. In <i>Proceedings of the Eleventh International Conference on Inductive Logic Programming</i> (pp. 118-131). Stransbourg, France: Springer.]]
[40]
Laffar, J., & Lassez, J. (1987). Constraint logic programming. <i>Proceedings of the Fourteenth ACM Conference on Principles of Programming Languages</i> (pp. 111-119). Munich, Germany: ACM Press.]]
[41]
Lavrač, N., & Džeroski, S. (1994). <i>Inductive Logic Programming: Techniques and Application</i>, Chichester, UK: Ellis Horwood.]]
[42]
Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. <i>Mathematical Programming, 45</i>, 503-528.]]
[43]
Lloyd, J. W. (1987). <i>Foundations of logic programming</i>. Berlin, Germany: Springer.]]
[44]
Lloyd-Richardson, E., Kazura, A., Stanton, C., Niaura, R., & Papandonatos, G. (2002). Differentiating stages of smoking intensity among adolescents: Stage specific psychological and social influences. <i>Journal of Consulting and Clinical Psychology</i>, 70.]]
[45]
Milch, B., Marthi, B., & Russell, S. (2004). BLOG: Relational modeling with unknown objects. <i>Proceedings of the ICML-2004 Warkshop on Statistical Relational Learning and its Connections to Other Fields</i> (pp. 67-73). Banff, Canada: IMLS.]]
[46]
Muggleton, S. (1996). Stochastic logic programs. In L. De Raedt (Ed.), <i>Advances in inductive logic programming</i>, (pp.254-264). Amsterdam, Netherlands: IOS Press.]]
[47]
Neville, J., & Jensen, D. (2003). Collective classification with relational dependency networks. <i>Proceedings of the Second International Workshop on Multi-Relational Data Mining</i> (pp. 77-91). Washington, DC: ACM Press.]]
[48]
Ngo, L., & Haddawy, P. (1997). Answering queries from context-sensitive probabilistic knowledge bases. <i>Theoretical Computer Science. 171</i>, 147-177.]]
[49]
Nilsson, N. (1986). Probabilistic logic. <i>Artificial Intelligence, 28</i>, 71-87.]]
[50]
Nocedal. J., & Wright, S. J. 11999). <i>Numerical Optimization</i>. New York, NY: Springer.]]
[51]
Ourston, D., & Mooney, R. J. (1994). Theory refinement combining analytical and empirical methods. <i>Artificial Intelligence, 66</i>. 273-309.]]
[52]
Parag, & Domingos, P. (2004). Multi-relational record linkage. In <i>Proceeding of the Third International Workshop on Multi-Relational Data Mining</i>. Seattle, WA: ACM Press.]]
[53]
Paskin, M. (2002). <i>Maximum entropy, probabilistic logic (Technical Report UCB/CSD-O1-1161)</i>. Computer Science Division, University of California, Berkeley, CA.]]
[54]
Pasula. H., & Russell, S. (2001). Approximate inference for first-order probabilistic languages. In <i>Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence</i> (pp. 741-748). Seattle. WA: Morgan Kaufmann.]]
[55]
Pazzani, M., & Kibler, D. (1992). The utility of knowledge in inductive learning. <i>Machine Learning. 9</i>, 57- 94.]]
[56]
Pearl. J. (1988). <i>Probabilistic reasoning in intelligent systems: Networks of plausible inferencee</i>. San Francisco CA: Morgan Kaufmann.]]
[57]
Poole, D. (1993). Probabilistic Horn abduction and Bayesian networks. <i>Artificial Intelligence. 64</i>, 81-129.]]
[58]
Poole, D. (2003), First-order probabilistic inference. <i>Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence</i> (pp. 985-991). Acapulco, Mexico: Morgan Kaufmann.]]
[59]
Popescul, A., & Ungar, L. H. (2003). Structural logistic regression for link analysis. In <i>Proceedings of the Second International Workshop on Multi-Relational Data Mining</i> (pp. 92-106). Washington, DC: ACM Press.]]
[60]
Pucch, A., & Muggleton, S. (2003). A comparison of stochastic logic programs and Bayesian logic programs. <i>Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data</i> (pp. 121-129). Acapulco, Mexico: IJCAII.]]
[61]
Richardson, M., & Domingos, P. (2003). Building large knowledge bases by mass collaboration. <i>Proceedings of the Second International Conference on Knowledge Capture</i> (pp. 129-137). Sanibel Island, FL: ACM Press.]]
[62]
Riezler, S. (1998). <i>Probabilistic constraint logic programming</i>. Doctoral dissertation, University of Tubingen, Tubingen, Germany.]]
[63]
Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. <i>Journal of the ACM, 12</i>, 23-41.]]
[64]
Roth, D. (1996). On the hardness of approximate reasoning. <i>Artificial Intelligence, 82</i>, 273-302.]]
[65]
Sanghai, S., Domingos, P., & Weld, D. (2003). Dynamic probabilistic relational models. <i>Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence</i> (pp. 992-997). Acapulco, Mexico: Morgan Kaufmann.]]
[66]
Santos Costa, V., Page, D., Qazi, M., & Cussens, J. (2003). CLP(BN): Constraint logic programming for probabilistic knowledge. In <i>Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence</i> (pp. 517-524). Acapulco, Mexico: Morgan Kaufmann.]]
[67]
Sato, T., & Kameya, Y. (1997). PRISM: A symbolic-statistical modeling language. In <i>Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence</i> (pp. 1330-1335). Nagoya, Japan: Morgan Kaufmann.]]
[68]
Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In <i>Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence</i> (pp. 485-492). Edmonton, Canada: Morgan Kaufmann.]]
[69]
Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based artificial neural networks. <i>Artificial Intelligence, 70</i>, 119-165.]]
[70]
Wasserman, S., & Faust, K. (1994). <i>social Network Analysis: Methods and Applications</i>. Cambridge, UK: Cambridge University Press.]]
[71]
Wellman, M., Breese, J. S., & Goldman, R. P. (1992). From knowledge bases to decision models. Knowledge Engineering Review, 7.]]
[72]
Winkler, W. (1999). <i>The state of record linkage and current research problems</i>. Technical Report, Statistical Research Division, U.S. Census Bureau.]]
[73]
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2001). Generalized belief propagation. In T. Leen, T. Dietterich and V. Tresp (Eds.), <i>Advances in neural information processing systems 13</i>, 689-695. Cambridge, MA: MIT Press.]]
[74]
Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-BFGSB, FORTRAN routines for large scale bound constrained optimization. <i>ACM Transactoions on Mathematical Software, 23</i>, 550-560.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Machine Language
Machine Language  Volume 62, Issue 1-2
February 2006
164 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 February 2006

Author Tags

  1. First-order logic
  2. Graphical models
  3. Inductive logic programming
  4. Knowledge-based model construction
  5. Link prediction
  6. Log-linear models
  7. Markov chain Monte Carlo
  8. Markov networks
  9. Markov random fields
  10. Pseudo-likelihood
  11. Satisfiability
  12. Statistical relational learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media