Zhitao Zhang
University College London

Regression analysis uses mathematical equations to express economic hypotheses and estimates how a policy affects a population based on observational data. For certain estimation algorithms to generate unbiased estimations, zero conditional mean must hold. Endogeneity occurs when this condition is violated. To address this problem, an Instrument variable (IV) is usually introduced. Unbiased estimation in econometrics, is both defined and proven statistically, while regression analysis is often used for causal inference. The lack of a proper causal interpretation, limits econometricians’ use of causal terms. Specifically, they must rely on economic behaviour or introspection to identify suitable IVs (Wooldridge 2018, p.497).
The Rubin Causal Model (RCM) is a currently adopted causal interpretation of regression analysis. By viewing regression analysis as a type of ‘natural experiment’ (Angrist and Krueger 1991, p.979) the RCM offers the following key interpretations. 1)Zero conditional mean is necessary since randomization is needed for unbiased results in experiments. 2) Violation of zero conditional mean can be understood as analogous to noncompliance. 3) IVs are the assignment of treatments, while IV estimations give the Local Average Causal Effect (LATE)—the average causal effect for the subpopulation that complies with the assignment (Angrist et.al., 1996). I argue that the RCM explains statistical settings in regression analysis only through experimental analogies and focuses on mathematical reasoning. It fails to provide causal principles for causal reasoning. As such, It can hardly become a causal interpretation.
I propose an alternative interpretation using (Markovian) Structural Equation Models (SEMs). Having the same functional form, regression models directly correspond to SEMs. I introduce the Causal Markov Condition (CMC) as the causal principle. The CMC not only explains zero conditional mean but also offers a way to detect how observations might deviate from reality, further allowing us to distinguish the possible meaning of our estimation under different causal structures. I then interpret IVs as intervention variables following Woodward’s account. Intervention variables secure valid IVs and provides a further guideline for identifying IVs. Through my interpretation, regression analysis is surely grounded on causal models.
References
Angrist, Joshua Imbens, Guido and Rubin, Donald. “Identification of Causal Effects Using Instrumental Variables”. Journal of the American Statistical Association, Jun., Vol. 91, No. 434 (1996): 444-455.
Angrist, Joshua and Krueger, Alan. “Does Compulsory School Attendance Affect Schooling and Earnings?”. The Quarterly Journal of Economics, Vol. CVI, November, Issue 4 (1991): 979-1014.
Scheines, Richard. “An Introduction to Causal Inference”. Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA. Accessed Aug 14, 2024. https://www.cmu.edu/dietrich/philosophy/docs/scheines/introtocausalinference.pdf
Woodward, James. “Causation and Manipulability”, The Stanford Encyclopaedia of Philosophy (Summer 2023 Edition), edited by Edward N. Zalta & Uri Nodelman. https://plato.stanford.edu/archives/sum2023/entries/causation-mani/
Wooldridge, Jeffrey. Introductory econometrics: A Modern Approach (7th edition), Cengage, 2018.

Chair: Celine Lechaux
Time: 04 September, 10:00 – 10:30
Location: SR 1.005
