Data Missing Not At Random, Illustrated

Data can be missing in three ways. First, it can be Missing Completely At Random (MCAR). This means truly randomly. We can analyze it as we please, although we lose some efficiency and standard statistical solutions will help (see King et al. [PDF]).

Second, it can be Missing At Random (MAR). This means missing in a “knowable” way. If you can plausibly model the mechanism that leads the data to be missing using other data on hand (the observables), then you can still make inferences. King et al.’s methods work here too.

Third, it can be Missing Not At Random (MNAR). That means, loosely, that even with the observables on hand, the value of the missing observation itself is still related to the probability that it is missing. You cannot model the missingness mechanism using observables alone, so still more assumptions are required to make inferences.

Without any further comment, here is an illustration of some MNAR data, from Indonesia’s PODES 2008.

Graph

We are looking at variables that code each village in Indonesia as having experienced violence of some sort (blue bars). For each village with violence, a long series of further questions classifies the type of violence (between villages, among ethnic groups, with security forces, etc.), the number killed, and so forth. In some cases these questions were not completed (red bar).