m-LiNGAM
Model
Missingness-LiNGAM (m-LiNGAM) [1] extends the basic LiNGAM model [2] to handle datasets affected by missing values, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR) cases. It enables the identification of the true underlying causal structure and provides unbiased parameter estimates even when data are not fully observed.
The model combines the principles of LiNGAM and the graphical representation of missingness mechanisms using missingness graphs (m-graphs) [3]. In this framework, variables can be fully observed or partially observed, and each partially observed variable is associated with a missingness mechanism and a proxy variable.
Let the set of variables be:
where:
\(V_o\) are fully observed variables,
\(V_m\) are partially observed variables,
\(U\) are latent variables (here assumed empty),
\(V^*\) are proxy variables (what is actually observed, corresponding to dataset columns with missing values)
\(R\) are missingness mechanism.
The induced subgraph \(G[V_o \cup V_m]\) follows a LiNGAM model, meaning that for every variable \(X_i \in (V_o \cup V_m)\):
where \(i\in\{1,\dots,n\}\mapsto k(i)\) denotes a causal order, and the non-gaussian error terms are independent.
The induced subgraph \(G[V_o \cup V_m \cup R]\) follows a LiM model. The missingness mechanisms \(R_i \in R\) follow a logistic model as for binary variables in LiM [4]:
Assumptions
The following assumptions are made to ensure identifiability:
No latent confounders (\(U = \emptyset\)).
No causal interactions between missingness mechanisms (\(R_i \notin Pa(R_j)\) for all \(i \neq j\)).
No self-masking (\(X_i \notin Pa(R_i)\) for any \(X_i \in V_m\)).
Note that even if self-masking is not allowed, indirect self-masking is: a partially observed variable can be an indirect cause (an ancestor) of its own missingness mechanism. Under these assumptions, m-LiNGAM guarantees identifiability of both the causal structure and parameters from observational data in the large-sample limit.
An example Python notebook demonstrating m-LiNGAM is available here.