인과와 인과추론 | Causality and Causal inference
인과, 인과추론의 개념과 관련 이론 (Back-door, Do-calculus) 들을 알아봅시다.
구조적 인과 모델(SCM)은 관측과 개입을 통해 인과 관계를 설명하는데 도움을 줍니다.
Back-door 기준은 인과 관계를 확인하고 혼동을 줄이는 데 도움이 되며, Do-계산법은 확률을 처리하는 데 유용한 규칙을 제공합니다.
Causality
- Influence by shich one event, process, state, or object a contributes to the production of another event, process, state, or object where the cause is partly responsible for the effect, and the effect is partly dependent on the cause.
- Causality in various academic disciplines
- Physics, chemistry,biology, climate science
- Psychology, social science, economics
- Epidemiology, public health
- Relation to AI, ML, DS
- AI : a rational agent performing actions to achieve a goal (reinforcement learning)
- ML : currently focused on learning correlations
- DS : capture, process, analyze, communicate with data
Structural Causal Model (SCM)
- SCM \(M = <U,V,F,P(U)>\) provides a formal framework.
- SCM induces observational, interventional, and counterfactual distributions.
- SCM induces a causal graph \(g\), which implies conditional independencies testable via d-separation (blockage).
- The underlying model \(M\) is unknown but the causal graph \(g\) can be given from common sense or domain knowledge.
- Intervention do(X=x) as a submodel Mx, which induces a manipulated causal graph \(g_\bar{x}\).
- Causal effect of \(X=x\) on \(Y=y\) is defined as \(P(y\mid{do(x)})\).
Remark
- Identifiability : causal effect may be computable from existing observational data for some causal graphs.
- In a Markovian case an singleton X, a causal effect can be easily derivable by canceling output \(P(x\mid{pa_x})\)
Back-door Criterion
DefinitionㅣBack-door
- Find a set \(Z\) s.t. it can sufficiently explain ‘confounding’ between \(X\) and \(Y\). Then,
DefinitionㅣBack-door criterion
- A set \(Z\) satisfies the back-door criterion with respect to a pair of variables \(X, Y\) in causal diagram \(g\) if;
- (i) no node in \(Z\) is a descendant of \(X\); and
- (ii) $Z$ blocks every path between X ∈ \(X\) and Y ∈ \(Y\) that contains an arrow into X.
- A set \(Z\) satisfies the back-door criterion with respect to a pair of variables \(X, Y\) in causal diagram \(g\) if;
A back-door adjustment formula is simple and widely used but limited.
Back-door sets as substitutes of the direct parents of X
- Rain satisfies the back-door criterion relative to Sprinkler ans Wet:
- (i) Rain is not descendant of Sprinkler, and
- (ii) Rain blocks the only back-door path from Sprinkler to Wet.
- Adjusting for the direct parents of Sprinkler, we have:
Rules of Do-calculus
Backdoor criterion results in a very specific form of indentification formula.
Do-calculus (Pearl, 1995) provides general machinery to manipulate observational and interventional distributions.
TheoremㅣRules of Do-calculus (simplified)
- Rule 1 : Adding/removing observations
- Rule 2 : Action/observation exchange
- Rule 3 : Adding/removing actions
- Do-calculus is sound and complete but it has no algorithmic insight
A graphical condition and an efficient algorithmic procedure have developed for identifiability.
- Do-calculus is a set of rules to manipulate observational or interventional probabilites. (Do-calculus is complete)
Modern Identification Tasks
Experimental conditions ➔ Generalized identification
Combining datasets of different experimental conditions
- The identifiability of any expression of the form \(P(y\mid{do(x), z})\) can be determined given any causal graph \(g\) and an arbitrary combination of observational and experimental studies.
- If the query is identifiable, then its estimand can be derived in polynomial time.
Environmental conditions ➔ Transportability
Combining datasets from different sources
- Non-parametric transportability can be determined provided that the problem instance is encoded in selection diagrams.
- When transportability is feasible, the transport formula can be derived in polynomial time.
- The causal calculus and the corresponding transportation algorithm are complete.
Sampling conditons ➔ Recovering from selection bias
- Nonparametric recoverability of selection bias from causal and statistical settings can be determined provided that an augmented causal graph is available.
- When recoverability is feasible, the estimated can be derived in polynomial time.
- The result is complete for pure recoverability, and sufficient for recoverability with external information.
Responding conditons ➔ Recovering from missingness
Reference
본 포스팅은 LG Aimers 프로그램에서 학습한 내용을 기반으로 작성된것입니다. (전체 내용 X)
- LG Aimers AI Essential Course Module 5. 인과추론, 서울대학교 이상학 교수