Next Article in Journal
An Improved Future Land-Use Simulation Model with Dynamically Nested Ecological Spatial Constraints
Next Article in Special Issue
Long-Term Dynamics of Sandy Vegetation and Land in North China
Previous Article in Journal
Estimating Crown Biomass in a Multilayered Fir Forest Using Airborne LiDAR Data
 
 
Technical Note
Peer-Review Record

Uncertainty of Partial Dependence Relationship between Climate and Vegetation Growth Calculated by Machine Learning Models

by Boyi Liang 1, Hongyan Liu 2,*, Elizabeth L. Cressey 3, Chongyang Xu 4, Liang Shi 5,6, Lu Wang 2, Jingyu Dai 2, Zong Wang 1 and Jia Wang 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Submission received: 1 May 2023 / Revised: 27 May 2023 / Accepted: 31 May 2023 / Published: 3 June 2023
(This article belongs to the Special Issue Machine Learning in Global Change Ecology: Methods and Applications)

Round 1

Reviewer 1 Report

This interesting paper examines the uncertainty of partial dependence relationships between climatic factors and the growth of deciduous forests in the Northern Hemisphere based on a Partial dependence plot (PDP). The authors found that the PDP of temperature, the dominant factor, showed the smallest relative variation and was stable among the six models. However, the other non-dominant factors showed much larger variations. It is a topic of interest to researchers in the related area. These results have important implications for the quantitative assessment of vegetation response to climate change at large scales based on machine learning. However, the manuscript may need some improvements. In particular, methods are not clearly explained to be evaluated and reproduced. 

 

L43: only one reference; please list more to prove it is a hot topic.

L87, delete the comma after “NDVI” L90 add a comma before ”and windspeed” 

L132: “Xs in” what? 

L130-L144: This section is similar to section 8.1 in the following website, with few changes, making it more difficult to understand. Are these formulas necessary? Is it possible to describe the method in your language to make it easier for readers to understand? 

 https://christophm.github.io/interpretable-ml-book/pdp.html 

L158: Providing the names of these packages may help the readers to repeat the experiment., 

L154: Change points?

L147-L157. Need to be clearer. It is difficult for me to figure out how each indicator was calculated and what these indicators are used to evaluate. For example, what is the linear trend of PDP? Is it a least squares regression? What are the independent and dependent variables? Here you can provide some formulas.

It is recommended to get a native English speaker to revise the sentences, some of which do not make sense and contain numerous errors of missing or misused crowns.

Author Response

This interesting paper examines the uncertainty of partial dependence relationships between climatic factors and the growth of deciduous forests in the Northern Hemisphere based on a Partial dependence plot (PDP). The authors found that the PDP of temperature, the dominant factor, showed the smallest relative variation and was stable among the six models. However, the other non-dominant factors showed much larger variations. It is a topic of interest to researchers in the related area. These results have important implications for the quantitative assessment of vegetation response to climate change at large scales based on machine learning. However, the manuscript may need some improvements. In particular, methods are not clearly explained to be evaluated and reproduced. 

 

L43: only one reference; please list more to prove it is a hot topic.

Response: Thank you for you advise. We have added 6 new references related and published in the influential journals. Please find it in Line 48 in the newest version.

 

L87, delete the comma after “NDVI” L90 add a comma before ”and windspeed” 

Response: Thank you for your advice. We have revised this point based on the comment.

 

L132: “Xs in” what? 

Response: Thank you for your comment. We have deleted the word “in”, which may cause confusion.

 

L130-L144: This section is similar to section 8.1 in the following website, with few changes, making it more difficult to understand. Are these formulas necessary? Is it possible to describe the method in your language to make it easier for readers to understand? 

 https://christophm.github.io/interpretable-ml-book/pdp.html 

Response: Thank you for your great advice. We suppose it is necessary to keep some of the key equations for calculating and creating partial dependence plot, based on which we conducted programming and it might be also valuable for later researchers. Meanwhile, we added some new sentences in the beginning of this paragraph to summarize the function and principle of this method. We hope the revised part is easier for understanding. Please check it in Line 141-148 of the newest version.

 

L158: Providing the names of these packages may help the readers to repeat the experiment., 

Response: Thank you for your comment. As we used plenty of packages in Matlab, and we gave two key names of related packages for building machine learning models as examples.

 

L154: Change points?

Response: Thank you for your comment. We have revised it in the newest version.

 

L147-L157. Need to be clearer. It is difficult for me to figure out how each indicator was calculated and what these indicators are used to evaluate. For example, what is the linear trend of PDP? Is it a least squares regression? What are the independent and dependent variables? Here you can provide some formulas.

Response: Thank you for your advice. We rewrote this part and add the formula of linear regression model. Please check it in the Line 169-186 in the newest version.

 

  1. Gevrey, M.; Dimopoulos, I.; Lek, S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological modelling 2003, 160, 249-264.
  2. Olden, J.D.; Joy, M.K.; Death, R.G. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecological modelling 2004, 178, 389-397.

Reviewer 2 Report

Dear Authors, thank you for submitting your work to Remote Sensing.

Please find some comments on your work.

Line 43: what do you mean by global change research? What kind of change?

The main comment is related to the statement in lines 82-83 "This study is vital for revealing the uncertainties and assessing the accuracy behind the emerging “black box” models." It is not clear if this has been achieved. In fact in the conclusions, you stated that "three elements affecting the uncertainty PDP should be fully evaluated. Why was not evaluated in this study?

While several times our research might not be comparable to other published works, here seems that the discussion is just a results description.  I would suggest improving as much as possible.

Why are you suggesting in lines 237-238 that the index can be Pearson's correlation, partial correlation coefficient, or factor importance? Are not the results affected by the index choice? explain.

lines 231-238: Why are these suggested methodological steps not stated in the conclusions as a main aspect to be considered and tested in future research?

I can not see the authors providing answers to their own research statements. This is reflected in the discussion and conclusions. 

Author Response

Dear Authors, thank you for submitting your work to Remote Sensing.

Please find some comments on your work.

Line 43: what do you mean by global change research? What kind of change?

Response: Thank you for the comment. This did make some confusion in the early version. We have changed the expression to climate change research. Indeed, it may refer more to the ecological studies on climate and terrestrial vegetations.

The main comment is related to the statement in lines 82-83 "This study is vital for revealing the uncertainties and assessing the accuracy behind the emerging “black box” models." It is not clear if this has been achieved. In fact in the conclusions, you stated that "three elements affecting the uncertainty PDP should be fully evaluated. Why was not evaluated in this study?

Response: Thank you for your comments on our manuscript. We appreciate your feedback and suggestions. Regarding your comment on lines 82-83, we agree that the statement "This study is vital for revealing the uncertainties and assessing the accuracy behind the emerging “black box” models" may have been too strong. Actually, the uncertainty of the interpretation approaches has been revealed in previous studies, so we changed the statement of “revealing the uncertainties” to “understanding the uncertainties”, to reflect the limitations of our study.

In addition, for the other factors which were not evaluated in this study may also impact the variation of PDP, we added new paragraph in the Discussion to explain. Please check Line 391-326 in the newest version.

 

 

While several times our research might not be comparable to other published works, here seems that the discussion is just a results description.  I would suggest improving as much as possible.

Response: Thank you for your comments on our manuscript. We appreciate your feedback and suggestions. Regarding your comment on the discussion section, we agree that the discussion could be improved. We have revised the discussion section to provide more context and interpretation of our results. Please see the new version of the paper.

Regarding your comment on the comparability of our research to other published works, we acknowledge that our study may not be directly comparable to other studies due to differences in methodology. Indeed, the systematic analysis on the uncertainty of PDP in ecological studies was firstly conducted in our study. However, we believe that our study makes a valuable contribution to the field by providing new insights into more potential uncertainty of novel interpretation methods of machine learning models. We hope that these revisions address your concerns.

 

Why are you suggesting in lines 237-238 that the index can be Pearson's correlation, partial correlation coefficient, or factor importance? Are not the results affected by the index choice? explain.

Response: Thank you for your comments. We agree that index choice (or methods) has great impact on the result of identifying dominant factors in the specific study case. Assume that, if we chose Pearson's correlation as the index for quantitatively analyzing the relationship between climate and NDVI, the temperature might be identified as the dominant factor for influencing vegetation growth. However, if we use machine learning model to simulate NDVI with climate factors and calculating factor importance of each, precipitation may replace temperature as the dominant factor due to this different selected index. This variation has been proved in previous studies [1,2].

lines 231-238: Why are these suggested methodological steps not stated in the conclusions as a main aspect to be considered and tested in future research?

Response: Thank you for your comment. We have added the relative content in the Conclusion. Please check it in line 343-345 in the newest version.

I can not see the authors providing answers to their own research statements. This is reflected in the discussion and conclusions. 

Response: Thank you for your comments on our manuscript. This question is similar to the third comments. We have revised the discussion section to provide more context and interpretation of our results. Please see the new version of the paper.

Reviewer 3 Report

Title:” Uncertainty of partial dependence relationship between climate and vegetation growth calculated by machine learning models”

 

 

 

This is a great manuscript that deserves to be published. It presents a vary sophisticate technique of machine learning. Using Partial Dependence Plot (PDP), the authors have analysed the effect of four environmental variables (temperature, rainfall, wind speed, rainfall and solar radiation) on NDVI.  The basic assumption of PDP is that the futures in C are not correlated with the features in S; this assumption is perfectly respected. The seven models show a good level of fitting. The article shows that the machine learning methodology have a greater fitting in the interpretation of environmental remote sensing variable. Results show that the PDP of the dominant climate factor (mean air temperature) and vegetation growth parameter (indicated by normalized difference vegetation index, NDVI) has the smallest relative variation and the whole linear trend of the partial dependence plots was comparably stable across the different models. The result of MANOVA is very significative.

The conclusions are clearly demonstrated by the results obtained from the analysis

Accepted.

Author Response

Title:” Uncertainty of partial dependence relationship between climate and vegetation growth calculated by machine learning models”

 This is a great manuscript that deserves to be published. It presents a vary sophisticate technique of machine learning. Using Partial Dependence Plot (PDP), the authors have analysed the effect of four environmental variables (temperature, rainfall, wind speed, rainfall and solar radiation) on NDVI.  The basic assumption of PDP is that the futures in C are not correlated with the features in S; this assumption is perfectly respected. The seven models show a good level of fitting. The article shows that the machine learning methodology have a greater fitting in the interpretation of environmental remote sensing variable. Results show that the PDP of the dominant climate factor (mean air temperature) and vegetation growth parameter (indicated by normalized difference vegetation index, NDVI) has the smallest relative variation and the whole linear trend of the partial dependence plots was comparably stable across the different models. The result of MANOVA is very significative.

The conclusions are clearly demonstrated by the results obtained from the analysis

Accepted.

Response: Thank you for your comment and review. Except for the comments of all the reviewers, we also improved the English expression for the newest version, by one of our co-authors as native English speaker (Dr. Cressey).

Reviewer 4 Report

The fundamental principle at work when using PDP involves assuming that there is a clear cut relationship between variables and more specifically we assume that linkages are purely linear. Unfortunately outside of theoretical settings or simplistic examples this isn't always true. There are times when relationships have non linear characteristics which can make relying solely on linear trends counterproductive.

PDPs rely on linear trends, which assume that there is a consistent relationship between a feature and target variable outside the observed data range. Nonetheless this assumption cannot always be upheld, specifically when extending beyond the boundary of training data. Extrapolation can result in less credible predictions and interpretations.

How do you capture complex interactions and non-linear patterns present in the data. 

Author Response

The fundamental principle at work when using PDP involves assuming that there is a clear cut relationship between variables and more specifically we assume that linkages are purely linear. Unfortunately outside of theoretical settings or simplistic examples this isn't always true. There are times when relationships have non linear characteristics which can make relying solely on linear trends counterproductive.

PDPs rely on linear trends, which assume that there is a consistent relationship between a feature and target variable outside the observed data range. Nonetheless this assumption cannot always be upheld, specifically when extending beyond the boundary of training data. Extrapolation can result in less credible predictions and interpretations.

How do you capture complex interactions and non-linear patterns present in the data. 

Response: Thank you for your comments. As you said, usually we make an assumption that the independent features and the output of the machine learning models is linear correlated. However, the non-linearity is more commonly seen in the research. In this case, the partial dependence plot was describing the quantitative relationship between independent variables and dependent variable in the way of irregular curve form (Figure 2). Actually, the PDP curve was produced by connecting the discontinuous scatter which representing the corresponding relations between x and y in different point. Nevertheless, we still conducted the linear regression for each PDP curve, in order to estimate the overall linear trend of y with the increase of x.

 

In terms of interaction, PDP cannot demonstrate this relationship well. However, some other methods like Friedman’s H-statistic were designed to investigate the interaction among different independent variables in machine learning models [3]. First, a two-way interaction measure of Friedman’s H-statistic that tells us whether and to what extent two features in the model interact with each other; second, a total interaction measure that tells us whether and to what extent a feature interacts in the model with all the other features. Another method is ALE plot, which can show us how the model predictions change in a small “window” of the feature around v for data instances in that window while excluding the interactions among the features” [4]. However, these methods are not the key problem in our study although we believe the uncertainty of them is also valuable for future research.

 

  1. Gevrey, M.; Dimopoulos, I.; Lek, S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological modelling 2003, 160, 249-264.
  2. Olden, J.D.; Joy, M.K.; Death, R.G. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecological modelling 2004, 178, 389-397.
  3. Inglis, A.; Parnell, A.; Hurley, C.B. Visualizing variable importance and variable interaction effects in machine learning models. Journal of Computational and Graphical Statistics 2022, 31, 766-778.
  4. Apley, D.W.; Zhu, J. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B: Statistical Methodology 2020, 82, 1059-1086.
Back to TopTop