Stepwise AIC - Does there exist controversy surrounding this topic?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}






up vote
15
down vote

favorite
6












I've read countless posts on this site that are incredibly against the use of stepwise selection of variables using any sort of criterion whether it be p-values based, AIC, BIC, etc.



I understand why these procedures are in general, quite poor for the selection of variables. gung's probably famous post here clearly illustrates why; ultimately we are verifying a hypothesis on the same dataset we used to come up with the hypothesis, which is just data dredging. Furthermore, p-values are affected by quantities such as collinearity and outliers, which heavily skew results, etc.



However, I've been studying time series forecasting quite a bit lately and have come across Hyndman's well respected textbook in which he mentions here the use of stepwise selection to find the optimal order of ARIMA models in particular. In fact, in the forecast package in R the well known algorithm known as auto.arima by default uses stepwise selection (with AIC, not p-values). He also criticizes p-value based feature selection which aligns well with multiple posts on this website.



Ultimately, we should always cross validate in some way at the end if the goal is to develop good models for forecasting/prediction. However, surely this is somewhat of a disagreement here when it comes to the procedure itself for evaluation metrics other than p-values.



Does anyone have any opinions on the use of stepwise AIC in this context, but also in general out of this context? I have been taught to believe any stepwise selection is poor, but to be honest, auto.arima(stepwise = TRUE) has been giving me better out of sample results than auto.arima(stepwise = FALSE) but perhaps this is just coincidence.










share|cite|improve this question




















  • 4




    Why do you call gung's answer "infamous"?
    – amoeba
    2 days ago






  • 1




    I thought it was a well respected answer and it has a lot of upvotes, which I thought would imply that many who frequent here would be familiar with the post. It wasn't meant to say that his post is in any way controversial or bad. Perhaps I hold that post too highly since I learned a lot from it personally.
    – aranglol
    yesterday






  • 1




    So you probably meant "famous", not "infamous"? "Infamous" means "well known for some bad quality".
    – amoeba
    yesterday










  • One of the few things that forecasters can agree on is that selecting one "best" model usually works less well than combining multiple different models.
    – Stephan Kolassa
    yesterday






  • 1




    @amoeba I changed it to famous as suggested.
    – aranglol
    yesterday

















up vote
15
down vote

favorite
6












I've read countless posts on this site that are incredibly against the use of stepwise selection of variables using any sort of criterion whether it be p-values based, AIC, BIC, etc.



I understand why these procedures are in general, quite poor for the selection of variables. gung's probably famous post here clearly illustrates why; ultimately we are verifying a hypothesis on the same dataset we used to come up with the hypothesis, which is just data dredging. Furthermore, p-values are affected by quantities such as collinearity and outliers, which heavily skew results, etc.



However, I've been studying time series forecasting quite a bit lately and have come across Hyndman's well respected textbook in which he mentions here the use of stepwise selection to find the optimal order of ARIMA models in particular. In fact, in the forecast package in R the well known algorithm known as auto.arima by default uses stepwise selection (with AIC, not p-values). He also criticizes p-value based feature selection which aligns well with multiple posts on this website.



Ultimately, we should always cross validate in some way at the end if the goal is to develop good models for forecasting/prediction. However, surely this is somewhat of a disagreement here when it comes to the procedure itself for evaluation metrics other than p-values.



Does anyone have any opinions on the use of stepwise AIC in this context, but also in general out of this context? I have been taught to believe any stepwise selection is poor, but to be honest, auto.arima(stepwise = TRUE) has been giving me better out of sample results than auto.arima(stepwise = FALSE) but perhaps this is just coincidence.










share|cite|improve this question




















  • 4




    Why do you call gung's answer "infamous"?
    – amoeba
    2 days ago






  • 1




    I thought it was a well respected answer and it has a lot of upvotes, which I thought would imply that many who frequent here would be familiar with the post. It wasn't meant to say that his post is in any way controversial or bad. Perhaps I hold that post too highly since I learned a lot from it personally.
    – aranglol
    yesterday






  • 1




    So you probably meant "famous", not "infamous"? "Infamous" means "well known for some bad quality".
    – amoeba
    yesterday










  • One of the few things that forecasters can agree on is that selecting one "best" model usually works less well than combining multiple different models.
    – Stephan Kolassa
    yesterday






  • 1




    @amoeba I changed it to famous as suggested.
    – aranglol
    yesterday













up vote
15
down vote

favorite
6









up vote
15
down vote

favorite
6






6





I've read countless posts on this site that are incredibly against the use of stepwise selection of variables using any sort of criterion whether it be p-values based, AIC, BIC, etc.



I understand why these procedures are in general, quite poor for the selection of variables. gung's probably famous post here clearly illustrates why; ultimately we are verifying a hypothesis on the same dataset we used to come up with the hypothesis, which is just data dredging. Furthermore, p-values are affected by quantities such as collinearity and outliers, which heavily skew results, etc.



However, I've been studying time series forecasting quite a bit lately and have come across Hyndman's well respected textbook in which he mentions here the use of stepwise selection to find the optimal order of ARIMA models in particular. In fact, in the forecast package in R the well known algorithm known as auto.arima by default uses stepwise selection (with AIC, not p-values). He also criticizes p-value based feature selection which aligns well with multiple posts on this website.



Ultimately, we should always cross validate in some way at the end if the goal is to develop good models for forecasting/prediction. However, surely this is somewhat of a disagreement here when it comes to the procedure itself for evaluation metrics other than p-values.



Does anyone have any opinions on the use of stepwise AIC in this context, but also in general out of this context? I have been taught to believe any stepwise selection is poor, but to be honest, auto.arima(stepwise = TRUE) has been giving me better out of sample results than auto.arima(stepwise = FALSE) but perhaps this is just coincidence.










share|cite|improve this question















I've read countless posts on this site that are incredibly against the use of stepwise selection of variables using any sort of criterion whether it be p-values based, AIC, BIC, etc.



I understand why these procedures are in general, quite poor for the selection of variables. gung's probably famous post here clearly illustrates why; ultimately we are verifying a hypothesis on the same dataset we used to come up with the hypothesis, which is just data dredging. Furthermore, p-values are affected by quantities such as collinearity and outliers, which heavily skew results, etc.



However, I've been studying time series forecasting quite a bit lately and have come across Hyndman's well respected textbook in which he mentions here the use of stepwise selection to find the optimal order of ARIMA models in particular. In fact, in the forecast package in R the well known algorithm known as auto.arima by default uses stepwise selection (with AIC, not p-values). He also criticizes p-value based feature selection which aligns well with multiple posts on this website.



Ultimately, we should always cross validate in some way at the end if the goal is to develop good models for forecasting/prediction. However, surely this is somewhat of a disagreement here when it comes to the procedure itself for evaluation metrics other than p-values.



Does anyone have any opinions on the use of stepwise AIC in this context, but also in general out of this context? I have been taught to believe any stepwise selection is poor, but to be honest, auto.arima(stepwise = TRUE) has been giving me better out of sample results than auto.arima(stepwise = FALSE) but perhaps this is just coincidence.







forecasting predictive-models arima aic stepwise-regression






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited yesterday

























asked 2 days ago









aranglol

1738




1738








  • 4




    Why do you call gung's answer "infamous"?
    – amoeba
    2 days ago






  • 1




    I thought it was a well respected answer and it has a lot of upvotes, which I thought would imply that many who frequent here would be familiar with the post. It wasn't meant to say that his post is in any way controversial or bad. Perhaps I hold that post too highly since I learned a lot from it personally.
    – aranglol
    yesterday






  • 1




    So you probably meant "famous", not "infamous"? "Infamous" means "well known for some bad quality".
    – amoeba
    yesterday










  • One of the few things that forecasters can agree on is that selecting one "best" model usually works less well than combining multiple different models.
    – Stephan Kolassa
    yesterday






  • 1




    @amoeba I changed it to famous as suggested.
    – aranglol
    yesterday














  • 4




    Why do you call gung's answer "infamous"?
    – amoeba
    2 days ago






  • 1




    I thought it was a well respected answer and it has a lot of upvotes, which I thought would imply that many who frequent here would be familiar with the post. It wasn't meant to say that his post is in any way controversial or bad. Perhaps I hold that post too highly since I learned a lot from it personally.
    – aranglol
    yesterday






  • 1




    So you probably meant "famous", not "infamous"? "Infamous" means "well known for some bad quality".
    – amoeba
    yesterday










  • One of the few things that forecasters can agree on is that selecting one "best" model usually works less well than combining multiple different models.
    – Stephan Kolassa
    yesterday






  • 1




    @amoeba I changed it to famous as suggested.
    – aranglol
    yesterday








4




4




Why do you call gung's answer "infamous"?
– amoeba
2 days ago




Why do you call gung's answer "infamous"?
– amoeba
2 days ago




1




1




I thought it was a well respected answer and it has a lot of upvotes, which I thought would imply that many who frequent here would be familiar with the post. It wasn't meant to say that his post is in any way controversial or bad. Perhaps I hold that post too highly since I learned a lot from it personally.
– aranglol
yesterday




I thought it was a well respected answer and it has a lot of upvotes, which I thought would imply that many who frequent here would be familiar with the post. It wasn't meant to say that his post is in any way controversial or bad. Perhaps I hold that post too highly since I learned a lot from it personally.
– aranglol
yesterday




1




1




So you probably meant "famous", not "infamous"? "Infamous" means "well known for some bad quality".
– amoeba
yesterday




So you probably meant "famous", not "infamous"? "Infamous" means "well known for some bad quality".
– amoeba
yesterday












One of the few things that forecasters can agree on is that selecting one "best" model usually works less well than combining multiple different models.
– Stephan Kolassa
yesterday




One of the few things that forecasters can agree on is that selecting one "best" model usually works less well than combining multiple different models.
– Stephan Kolassa
yesterday




1




1




@amoeba I changed it to famous as suggested.
– aranglol
yesterday




@amoeba I changed it to famous as suggested.
– aranglol
yesterday










1 Answer
1






active

oldest

votes

















up vote
15
down vote



accepted










There are a few different issues here.




  • Probably the main issue is that model selection (whether using p-values or AICs, stepwise or all-subsets or something else) is primarily problematic for inference (e.g. getting p-values with appropriate type I error, confidence intervals with appropriate coverage). For prediction, model selection can indeed pick a better spot on the bias-variance tradeoff axis and improve out-of-sample error.

  • For some classes of models, AIC is asymptotically equivalent to leave-one-out CV error [see e.g. http://www.petrkeil.com/?p=836 ], so using AIC as a computationally efficient proxy for CV is reasonable.

  • Stepwise selection is often dominated by other model selection (or averaging) methods (all-subsets if computationally feasible, or shrinkage methods). But it's simple and easy to implement, and if the answer is clear enough (some parameters corresponding to strong signals, others weak, few intermediate), then it will give reasonable results. Again, there's a big difference between inference and prediction. For example if you have a couple of strongly correlated predictors, picking the incorrect one (from a "truth"/causal point of view) is a big problem for inference, but picking the one that happens to give you the best AIC is a reasonable strategy for prediction (albeit one that will fail if you try to forecast a situation where the correlation of the predictors changes ...)


Bottom line: for moderately sized data with a reasonable signal-to-noise ratio, AIC-based stepwise selection can indeed produce a defensible predictive model; see Murtaugh (2009) for an example.



Murtaugh, Paul A. "Performance of several variable‐selection methods applied to real ecological data." Ecology letters 12, no. 10 (2009): 1061-1068.






share|cite|improve this answer























  • (+1) Very informative. The approach using AIC/BIC or other information criteria shouldn't be mixed with inferential statistics using $p$-values in any case according to Burnham & Anderson's book "Model selection and multimodel inference: A practical information-theoretic approach."
    – COOLSerdash
    2 days ago












  • Please don't get me started on Burnham and Anderson. github.com/bbolker/discretization
    – Ben Bolker
    2 days ago











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f377527%2fstepwise-aic-does-there-exist-controversy-surrounding-this-topic%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
15
down vote



accepted










There are a few different issues here.




  • Probably the main issue is that model selection (whether using p-values or AICs, stepwise or all-subsets or something else) is primarily problematic for inference (e.g. getting p-values with appropriate type I error, confidence intervals with appropriate coverage). For prediction, model selection can indeed pick a better spot on the bias-variance tradeoff axis and improve out-of-sample error.

  • For some classes of models, AIC is asymptotically equivalent to leave-one-out CV error [see e.g. http://www.petrkeil.com/?p=836 ], so using AIC as a computationally efficient proxy for CV is reasonable.

  • Stepwise selection is often dominated by other model selection (or averaging) methods (all-subsets if computationally feasible, or shrinkage methods). But it's simple and easy to implement, and if the answer is clear enough (some parameters corresponding to strong signals, others weak, few intermediate), then it will give reasonable results. Again, there's a big difference between inference and prediction. For example if you have a couple of strongly correlated predictors, picking the incorrect one (from a "truth"/causal point of view) is a big problem for inference, but picking the one that happens to give you the best AIC is a reasonable strategy for prediction (albeit one that will fail if you try to forecast a situation where the correlation of the predictors changes ...)


Bottom line: for moderately sized data with a reasonable signal-to-noise ratio, AIC-based stepwise selection can indeed produce a defensible predictive model; see Murtaugh (2009) for an example.



Murtaugh, Paul A. "Performance of several variable‐selection methods applied to real ecological data." Ecology letters 12, no. 10 (2009): 1061-1068.






share|cite|improve this answer























  • (+1) Very informative. The approach using AIC/BIC or other information criteria shouldn't be mixed with inferential statistics using $p$-values in any case according to Burnham & Anderson's book "Model selection and multimodel inference: A practical information-theoretic approach."
    – COOLSerdash
    2 days ago












  • Please don't get me started on Burnham and Anderson. github.com/bbolker/discretization
    – Ben Bolker
    2 days ago















up vote
15
down vote



accepted










There are a few different issues here.




  • Probably the main issue is that model selection (whether using p-values or AICs, stepwise or all-subsets or something else) is primarily problematic for inference (e.g. getting p-values with appropriate type I error, confidence intervals with appropriate coverage). For prediction, model selection can indeed pick a better spot on the bias-variance tradeoff axis and improve out-of-sample error.

  • For some classes of models, AIC is asymptotically equivalent to leave-one-out CV error [see e.g. http://www.petrkeil.com/?p=836 ], so using AIC as a computationally efficient proxy for CV is reasonable.

  • Stepwise selection is often dominated by other model selection (or averaging) methods (all-subsets if computationally feasible, or shrinkage methods). But it's simple and easy to implement, and if the answer is clear enough (some parameters corresponding to strong signals, others weak, few intermediate), then it will give reasonable results. Again, there's a big difference between inference and prediction. For example if you have a couple of strongly correlated predictors, picking the incorrect one (from a "truth"/causal point of view) is a big problem for inference, but picking the one that happens to give you the best AIC is a reasonable strategy for prediction (albeit one that will fail if you try to forecast a situation where the correlation of the predictors changes ...)


Bottom line: for moderately sized data with a reasonable signal-to-noise ratio, AIC-based stepwise selection can indeed produce a defensible predictive model; see Murtaugh (2009) for an example.



Murtaugh, Paul A. "Performance of several variable‐selection methods applied to real ecological data." Ecology letters 12, no. 10 (2009): 1061-1068.






share|cite|improve this answer























  • (+1) Very informative. The approach using AIC/BIC or other information criteria shouldn't be mixed with inferential statistics using $p$-values in any case according to Burnham & Anderson's book "Model selection and multimodel inference: A practical information-theoretic approach."
    – COOLSerdash
    2 days ago












  • Please don't get me started on Burnham and Anderson. github.com/bbolker/discretization
    – Ben Bolker
    2 days ago













up vote
15
down vote



accepted







up vote
15
down vote



accepted






There are a few different issues here.




  • Probably the main issue is that model selection (whether using p-values or AICs, stepwise or all-subsets or something else) is primarily problematic for inference (e.g. getting p-values with appropriate type I error, confidence intervals with appropriate coverage). For prediction, model selection can indeed pick a better spot on the bias-variance tradeoff axis and improve out-of-sample error.

  • For some classes of models, AIC is asymptotically equivalent to leave-one-out CV error [see e.g. http://www.petrkeil.com/?p=836 ], so using AIC as a computationally efficient proxy for CV is reasonable.

  • Stepwise selection is often dominated by other model selection (or averaging) methods (all-subsets if computationally feasible, or shrinkage methods). But it's simple and easy to implement, and if the answer is clear enough (some parameters corresponding to strong signals, others weak, few intermediate), then it will give reasonable results. Again, there's a big difference between inference and prediction. For example if you have a couple of strongly correlated predictors, picking the incorrect one (from a "truth"/causal point of view) is a big problem for inference, but picking the one that happens to give you the best AIC is a reasonable strategy for prediction (albeit one that will fail if you try to forecast a situation where the correlation of the predictors changes ...)


Bottom line: for moderately sized data with a reasonable signal-to-noise ratio, AIC-based stepwise selection can indeed produce a defensible predictive model; see Murtaugh (2009) for an example.



Murtaugh, Paul A. "Performance of several variable‐selection methods applied to real ecological data." Ecology letters 12, no. 10 (2009): 1061-1068.






share|cite|improve this answer














There are a few different issues here.




  • Probably the main issue is that model selection (whether using p-values or AICs, stepwise or all-subsets or something else) is primarily problematic for inference (e.g. getting p-values with appropriate type I error, confidence intervals with appropriate coverage). For prediction, model selection can indeed pick a better spot on the bias-variance tradeoff axis and improve out-of-sample error.

  • For some classes of models, AIC is asymptotically equivalent to leave-one-out CV error [see e.g. http://www.petrkeil.com/?p=836 ], so using AIC as a computationally efficient proxy for CV is reasonable.

  • Stepwise selection is often dominated by other model selection (or averaging) methods (all-subsets if computationally feasible, or shrinkage methods). But it's simple and easy to implement, and if the answer is clear enough (some parameters corresponding to strong signals, others weak, few intermediate), then it will give reasonable results. Again, there's a big difference between inference and prediction. For example if you have a couple of strongly correlated predictors, picking the incorrect one (from a "truth"/causal point of view) is a big problem for inference, but picking the one that happens to give you the best AIC is a reasonable strategy for prediction (albeit one that will fail if you try to forecast a situation where the correlation of the predictors changes ...)


Bottom line: for moderately sized data with a reasonable signal-to-noise ratio, AIC-based stepwise selection can indeed produce a defensible predictive model; see Murtaugh (2009) for an example.



Murtaugh, Paul A. "Performance of several variable‐selection methods applied to real ecological data." Ecology letters 12, no. 10 (2009): 1061-1068.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited yesterday

























answered 2 days ago









Ben Bolker

21.7k15887




21.7k15887












  • (+1) Very informative. The approach using AIC/BIC or other information criteria shouldn't be mixed with inferential statistics using $p$-values in any case according to Burnham & Anderson's book "Model selection and multimodel inference: A practical information-theoretic approach."
    – COOLSerdash
    2 days ago












  • Please don't get me started on Burnham and Anderson. github.com/bbolker/discretization
    – Ben Bolker
    2 days ago


















  • (+1) Very informative. The approach using AIC/BIC or other information criteria shouldn't be mixed with inferential statistics using $p$-values in any case according to Burnham & Anderson's book "Model selection and multimodel inference: A practical information-theoretic approach."
    – COOLSerdash
    2 days ago












  • Please don't get me started on Burnham and Anderson. github.com/bbolker/discretization
    – Ben Bolker
    2 days ago
















(+1) Very informative. The approach using AIC/BIC or other information criteria shouldn't be mixed with inferential statistics using $p$-values in any case according to Burnham & Anderson's book "Model selection and multimodel inference: A practical information-theoretic approach."
– COOLSerdash
2 days ago






(+1) Very informative. The approach using AIC/BIC or other information criteria shouldn't be mixed with inferential statistics using $p$-values in any case according to Burnham & Anderson's book "Model selection and multimodel inference: A practical information-theoretic approach."
– COOLSerdash
2 days ago














Please don't get me started on Burnham and Anderson. github.com/bbolker/discretization
– Ben Bolker
2 days ago




Please don't get me started on Burnham and Anderson. github.com/bbolker/discretization
– Ben Bolker
2 days ago


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f377527%2fstepwise-aic-does-there-exist-controversy-surrounding-this-topic%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

Mangá

 ⁒  ․,‪⁊‑⁙ ⁖, ⁇‒※‌, †,⁖‗‌⁝    ‾‸⁘,‖⁔⁣,⁂‾
”‑,‥–,‬ ,⁀‹⁋‴⁑ ‒ ,‴⁋”‼ ⁨,‷⁔„ ‰′,‐‚ ‥‡‎“‷⁃⁨⁅⁣,⁔
⁇‘⁔⁡⁏⁌⁡‿‶‏⁨ ⁣⁕⁖⁨⁩⁥‽⁀  ‴‬⁜‟ ⁃‣‧⁕‮ …‍⁨‴ ⁩,⁚⁖‫ ,‵ ⁀,‮⁝‣‣ ⁑  ⁂– ․, ‾‽ ‏⁁“⁗‸ ‾… ‹‡⁌⁎‸‘ ‡⁏⁌‪ ‵⁛ ‎⁨ ―⁦⁤⁄⁕