Humans possess a repertoire of decision strategies. This raises the question how we decide how to decide. Behavioral experiments suggest that the answer includes metacognitive reinforcement learning: rewards reinforce not only our behavior but also the cognitive processes that lead to it. Previous theories of strategy selection, namely SSL and RELACS, assumed that model-free reinforcement learning identifies the cognitive strategy that works best on average across all problems in the environment. Here we explore the alternative: model-based reinforcement learning about how the differential effectiveness of cognitive strategies depends on the features of individual problems. Our theory posits that people learn a predictive model of each strategy’s accuracy and execution time and choose strategies according to their predicted speed-accuracy tradeoff for the problem to be solved. We evaluate our theory against previous accounts by fitting published data on multi-attribute decision making, conducting a novel experiment, and demonstrating that our theory can account for people’s adaptive flexibility in risky choice. We find that while SSL and RELACS are sufficient to explain people’s ability to adapt to a homogeneous environment in which all decision problems are of the same type, model-based strategy selection learning can also explain people’s ability to adapt to heterogeneous environments and flexibly switch to a different decision-strategy when the situation changes.