Angrist and Pischke (available for free) argue that the major improvement is the introduction of better research design, mostly experiments and experiment-like methods. They cite a decent sample of macroeconomic papers that head in this direction, but the fact remains that this is difficult advice to follow at the macro level. In responding to the backlash against RCTs, they contend
Empirical evidence on any given causal effect is always local, derived from a particular time, place, and research design. ... Extrapolation of causal effects to new settings is always speculative. ... Accumulating empirical evidence ... is the necessary road along which results become more general. ... [Generally] the channels ... are less clear than the finding that there is an effect. ... But inconclusive or incomplete evidence on mechanisms does not void empirical evidence of predictive value.Leamer (ungated), however, complains that they missed his main point and that the profession has not picked up on it either. Most of our methods rely on theories that only work asymptotically, but even as we accumulate empirical evidence we never actually reach the asymptote. "The only way to create credible inferences with doubtful assumptions is to perform a sensitivity analysis that separates the fragile inferences fragile infereences from the sturdy ones..." He speaks out against nonparametric estimation and consistent standard errors for "disguising the assumptions." He agrees with Angrist and Pischke in complaining about instrumental variables "thoughtlessly chosen. I think we would make progress if we stopped using the words 'instrumental variables' and used instead 'surrogates' - meaning surrogates for the experiment that we wish we could have conducted."
Leamer then argues against RCTs as well. He discusses additive and interactive confounders that experiments with finite data cannot overcome, though he also admits that our concerns about extrapolation of RCTs is a sign of progress. He attacks the "myth of the data generating process," and insists that we take our ignorance seriously and admit it openly. He also believes they are too optimistic about what experiments can do for macro: all we can do is "seek patterns and tell stories." "Mission Accomplished ... [is] never gonna happen."
Keane (ungated) notes that economists can learn from marketers who have spent a good deal of time developing natural experiments. "Where I most strongly disagree ... is their notion that empirical work can exist independently from, or occur prior to, economic theory." Leamer's question is how we can produce credible results in a subjective world. "The idea is that the researcher ... provides evidence of value to all ... audiences, given their prior views, and also reveals to what extent estimates are determined by a set of prior beliefs. ... We know perfectly well that our models aren't true. Validation exercises are used purely as a way to persuade the audience (and ourselves) that a model may be a useful tool for prediction and policy evaluation."
Sims (ungated) takes on the question at a deeper level, contending that "economics is not an experimental science and cannot be. 'Natural' experiments and 'quasi' experiments are not in fact experiments ... They are rhetorical devices that are often invoked to avoid having to confront real econometric difficulties." The best we can do is "to use data to narrow the range of substantive disagreement." He also argues that experiments have done nothing for macro. Instead he believes that VAR and DSGE models (which Leamer laughed at) have done the most good for bringing about broad policy consensus. The experiments, and particularly IV estimates, do a poor job analyzing nonlinearity.
Stock (ungated) puts the macroeconomics question very well by classifying three types of questions addressed by macroeconomics: Why do we observe the dynamics we see? How do changes in rules, institutions, and preferences affect the dynamics we see? What are the effects of one-off policy interventions within the institutional context? Stock argues that experiments do best with the third question, while its ability to handle the other questions is much more limited.