Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access October 1, 2020

Model Selection by Balanced Identification: the Interplay of Optimization and Distributed Computing

  • Alexander V. Sokolov and Vladimir V. Voloshinov EMAIL logo
From the journal Open Computer Science

Abstract

The technology of formal quantitative estimation of the conformity of mathematical models to the available dataset is presented. The main purpose of the technology is to make the model selection decision-making process easier for the researcher. The method is a combination of approaches from the areas of data analysis, optimization and distributed computing including: cross-validation and regularization methods, algebraic modeling in optimization and methods of optimization, automatic discretization of differential and integral equations, and optimization REST-services. The technology is illustrated by a demo case study. A general mathematical formulation of the method is presented. It is followed by a description of the main aspects of algorithmic and software implementation. The list of success stories of the presented approach is substantial. Nevertheless, the domain of applicability and important unresolved issues are discussed.

References

[1] A.V. Sokolov and V.V. Voloshinov. Choice of mathematical model: balance between complexity and proximity to measurements. International Journal of Open Information Technologies, 6(9), 2018.Search in Google Scholar

[2] A.N. Tikhonov. On mathematical methods for automating the processing of observations. In Problems of Computational Mathematics, pages 3–17, 1980.Search in Google Scholar

[3] R. Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145. Montreal, Canada, 1995.Search in Google Scholar

[4] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: data mining, inference and prediction. Springer, 2 edition, 2009.10.1007/978-0-387-84858-7Search in Google Scholar

[5] M. Kuhn and K. Johnson. Applied predictive modeling, volume 26. Springer, 2013.10.1007/978-1-4614-6849-3Search in Google Scholar

[6] A.I. Rozhenko. Theory and Algorithms of Variational Spline-Approximations. Novosibirsk State Technical University, 2005. (in Russian).Search in Google Scholar

[7] W. Härdle. Applied nonparametric regression. Number 19. Cambridge university press, 1990.10.1017/CCOL0521382483Search in Google Scholar

[8] V. Strijov and G.-W. Weber. Nonlinear regression model generation using hyperparameter optimization. Computers & Mathematics with Applications, 60(4):981–988, 2010.10.1016/j.camwa.2010.03.021Search in Google Scholar

[9] O. Sysoev and O. Burdakov. A smoothed monotonic regression via L2 regularization. Knowledge and Information Systems, 59(1):197–218, 2019.10.1007/s10115-018-1201-2Search in Google Scholar

[10] S. Dempe. Foundations of bilevel programming. Springer Science & Business Media, 2002.Search in Google Scholar

[11] B.N. Pshenichnyi and A.A. Sosnovsky. The linearization method: Principal concepts and perspective directions. Journal of Global Optimization, 3(4):483–500, 1993.Search in Google Scholar

[12] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.10.1561/2200000016Search in Google Scholar

[13] S. Smirnov, V. Voloshinov, and O. Sukhoroslov. Distributed optimization on the base of AMPL modeling language and Everest platform. Procedia Computer Science, 101:313–322, 2016.10.1016/j.procs.2016.11.037Search in Google Scholar

[14] S. Smirnov and V. Voloshinov. On domain decomposition strategies to parallelize branch-and-bound method for global optimization in Everest distributed environment. Procedia Computer Science, 136:128–135, 2018.10.1016/j.procs.2018.08.245Search in Google Scholar

[15] O. Sukhoroslov, S. Volkov, and A. Afanasiev. A web-based platform for publication and distributed execution of computing applications. In Parallel and Distributed Computing (ISPDC), 2015 14th International Symposium on, pages 175–184, June 2015.10.1109/ISPDC.2015.27Search in Google Scholar

[16] R. Fourer, D.M. Gay, and B.W. Kernighan. AMPL: A Modeling Language for Mathematical Programming. Second edition. Duxbury Press/Brooks/Cole Publishing Company, 2003. https://ampl.com/resources/the-ampl-book.Search in Google Scholar

[17] W.E. Hart, C.D. Laird, J.P. Watson, D.L. Woodruff, G.A. Hackebeil, B.L. Nicholson, and J.D. Siirola. Pyomo–optimization modeling in Python. 2nd edition, volume 67. Springer, 2017.10.1007/978-3-319-58821-6Search in Google Scholar

[18] A. Forrester, A. Sobester, and A. Keane. Engineering design via surrogate modelling: a practical guide. John Wiley & Sons, 2008.10.1002/9780470770801Search in Google Scholar

[19] A. Wächter and L.T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming, 106(1):25–57, 2006.10.1007/s10107-004-0559-ySearch in Google Scholar

[20] A. Gleixner, M. Bastubbe, L. Eifler, T. Gally, G. Gamrath, R. L. Gottwald, G. Hendel, C. Hojny, T. Koch, M. E. Lübbecke, S. J. Maher, M. Miltenberger, et al. The SCIP Optimization Suite 6.0. Technical Report 18-26, ZIB, Takustr. 7, 14195 Berlin, 2018.Search in Google Scholar

[21] A.V. Sokolov, V.K. Bolondinsky, and V.V. Voloshinov. Technologies for constructing mathematical models from experimental data: applying the method of balanced identification using the example of choosing a pine transpiration model. In National Supercomputer Fjrum (NSCF-2018), 2018.Search in Google Scholar

[22] A.V. Sokolov, V.V. Mamkin, V.K. Avilov, D.L. Tarasov, Y.A. Kurbatova, and A. V. Olchev. Application of a balanced identification method for gap-filling in CO2 flux data in a sphagnum peat bog. Computer Research and Modeling, 11(1):153–171, 2019.10.20537/2076-7633-2019-11-1-153-171Search in Google Scholar

[23] Yu.E. Lavruhin, A.V. Sokolov, and D.S. Grozdov. Monitoring of volume activity in the atmospheric surface layer based on the testimony of the spectrometer seg-017: error analysis. In Radioactivity after nuclear explosions and accidents: consequences and ways to overcome, pages 359–368, 2016.Search in Google Scholar

[24] V.G. Linnik, A.V. Sokolov, and I.V. Mironenko. 137cs patterns and their transformation in landscapes of the opolye of the bryansk region. Modern trends in the development of biogeochemistry, pages 423–434, 2016.Search in Google Scholar

[25] A.V. Sokolov, A.A. Sokolov, and Hervé Delbarre. Method of balanced identification in the inverse problem of transport and diffusion of atmospheric pollution. In EGU2019-15175, volume 26, 2019.Search in Google Scholar

[26] A.V. Sokolov and L.A. Sokolova. Building mathematical models: quantifying the significance of accepted hypotheses and used data. In XXI International Conference on Computational Mechanics and Modern Applied Software Systems (CMMASS’2019), pages 114–115, 2019.Search in Google Scholar

[27] A.P. Afanasiev, V.V. Voloshinov, and A.V. Sokolov. Inverse problem in the modeling on the basis of regularization and distributed computing in the Everest environment. In CEUR Workshop Proceedings, pages 100–108, 2017.10.1016/j.procs.2017.05.207Search in Google Scholar

[28] A.B. Kukushkin, A.A. Kulichenko, P.A. Sdvizhenskii, A.V. Sokolov, and V.V. Voloshinov. A model of recovering parameters of fast non-local heat transport in magnetic fusion plasma. Problems of Atomic Science and Technology, Ser. Thermonuclear Fusion, 40(1):45–55, 2017.10.21517/0202-3822-2017-40-1-45-55Search in Google Scholar

[29] A.V. Sokolov. Mechanisms of regulation of the speed of evolution: The population level. Biophysics, 61(3):513–520, 2016.Search in Google Scholar

[30] Y. Shinano, T. Achterberg, T. Berthold, S. Heinz, and T. Koch. ParaSCIP: a parallel extension of SCIP. In Competence in High Performance Computing 2010, pages 135–148. Springer, 2011.10.1007/978-3-642-24025-6_12Search in Google Scholar

[31] Y. Shinano, S. Heinz, S. Vigerske, and M. Winkler. FiberSCIP – a shared memory parallelization of SCIP. INFORMS Journal on Computing, 30(1):11–30, 2017.10.1287/ijoc.2017.0762Search in Google Scholar

[32] B. Nicholson, J.D. Siirola, J.-P. Watson, V.M. Zavala, and L.T. Biegler. pyomo.dae: a modeling and automatic discretization framework for optimization with differential and algebraic equations. Mathematical Programming Computation, 10(2):187–223, 2018.Search in Google Scholar

[33] C. Chen and O.L. Mangasarian. A class of smoothing functions for nonlinear and mixed complementarity problems. Computational Optimization and Applications, 5(2):97–138, 1996.10.1007/BF00249052Search in Google Scholar

[34] Z. Zhou and Y. Peng. The locally Chen–Harker–Kanzow–Smale smoothing functions for mixed complementarity problems. Journal of Global Optimization, 74(1):169–193, 2019.10.1007/s10898-019-00739-4Search in Google Scholar

[35] A.T. Fuller. Relay control systems optimized for various performance criteria. volume 1, pages 520–529. Elsevier, 1960.10.1016/S1474-6670(17)70097-3Search in Google Scholar

Received: 2019-08-02
Accepted: 2020-03-03
Published Online: 2020-10-01

© 2020 Alexander V. Sokolov et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 2.2.2025 from https://www.degruyter.com/document/doi/10.1515/comp-2020-0116/html
Scroll to top button