Double Machine Learning: Advancing Causal Inference with Machine Learning Techniques

18 July, 2024

Inspired by Brady Neal’s video where he humorously suggests that Double Machine Learning (DML) “is maybe twice as cool” as regular machine learning, we dive into this advanced causal framework. DML, also known as debiased machine learning or orthogonal machine learning, offers a robust method for causal inference by combining flexibility, low bias, and valid confidence intervals. Let’s explore what makes DML a compelling approach in the realm of machine learning.

What is Double Machine Learning?

DML, also known as debiased machine learning or orthogonal machine learning, is a causal inference framework that uses two separate machine learning models to estimate different parts of the relationships in the data. Proposed by MIT statistician and economist Victor Chernozhukov and colleagues, DML aims to combine the flexibility of non-parametric machine learning models with the statistical properties necessary for causal inference, such as low bias and valid confidence intervals.

The Motivation Behind DML

The primary motivation behind DML is to create a causal estimator that can achieve root-n consistency, where the estimation error decreases at a rate of 1/√n as the sample size increases. This consistency ensures that the estimator converges to the true value as more data becomes available, making it a reliable tool for causal inference.

Addressing Bias in DML

DML tackles two main sources of bias: overfitting and regularization. Overfitting bias is managed through a technique called cross-fitting, which involves splitting the data into partitions and using these partitions to fit and estimate models iteratively. This approach is similar to cross-validation and helps in reducing the overfitting bias.

Overfitting Bias

Cross-fitting involves the following steps:

Split the data into two partitions, D0 and D1.
Fit models on D0 and estimate the quantity of interest on D1.
Fit models on D1 and estimate on D0.
Average the estimates from both partitions to get the final estimate.

This technique helps in correcting the overfitting bias by masking parts of the data, allowing the models to generalize better.

Regularization Bias

Regularization bias is addressed using orthogonalization, inspired by the Frisch-Waugh-Lovell (FWL) theorem. Orthogonalization separates the estimation of the causal parameter from the nuisance parameter, allowing for flexible modeling of complex non-linear relationships in the data.

Implementing DML with DoWhy and EconML

To demonstrate DML, we use DoWhy’s API and the linear DML estimator from EconML. The process involves passing the ‘backdoor.econml.dml.LinearDML’ method to the estimation function, specifying that the treatment is discrete. Here’s a simplified implementation:

estimate = model.estimate_effect(
    identified_estimand=estimand,
    method_name='backdoor.econml.dml.LinearDML',
    target_units='ate',
    method_params={
        'init_params': {
            'model_y': LGBMRegressor(n_estimators=500, max_depth=10),
            'model_t': LogisticRegression(),
            'discrete_treatment': True
        },
        'fit_params': {}
    }
)

After fitting the model, we predict on test data and compute the error. Initially, the error might be higher, but by reducing the complexity of the outcome model and increasing the number of cross-fitting folds, we can improve the performance significantly.

Hyperparameter Tuning

Hyperparameter tuning is a crucial step to optimize the models further. This can be done using cross-validation classes like GridSearchCV, HalvingGridSearchCV, or RandomizedSearchCV. Here’s an example of how to wrap the models in a grid search wrapper:

model_y = GridSearchCV(    estimator=LGBMRegressor(),
    param_grid={
        'max_depth': [3, 10, 20, 100],
        'n_estimators': [10, 50, 100]
    },
    cv=10, n_jobs=-1, scoring='neg_mean_squared_error'
)

model_t = GridSearchCV(
    estimator=LGBMClassifier(),
    param_grid={
        'max_depth': [3, 10, 20, 100],
        'n_estimators': [10, 50, 100]
    },
    cv=10, n_jobs=-1, scoring='accuracy'
)

estimate = model.estimate_effect(
    identified_estimand=estimand,
    method_name='backdoor.econml.dml.LinearDML',
    target_units='ate',
    method_params={
        'init_params': {
            'model_y': model_y,
            'model_t': model_t,
            'discrete_treatment': True,
            'cv': 4
        },
        'fit_params': {}
    }
)

By passing these wrapped models into the estimation function, we achieve a significantly lower error rate, demonstrating the effectiveness of hyperparameter tuning.

Limitations and Considerations

While DML is a powerful method, it’s not without limitations:

Hidden Confounding: Like many causal inference methods, DML assumes no hidden confounding. Violations of this assumption can lead to significant biases.
Extrapolation: The performance of DML on out-of-distribution data depends on the base learners used. Tree-based models, for instance, don’t extrapolate beyond the range of the training data.
Feature Selection: Unlike in predictive modeling, adding more features to a causal model can sometimes introduce bias by opening non-causal paths in the underlying causal graph.

DML vs. Doubly Robust Methods

Both DML and Doubly Robust (DR) methods like DR-Learner have their strengths:

DML works for both categorical and continuous treatments, while DR methods are typically limited to categorical treatments.
DR methods might perform better when the outcome model is misspecified.
DML often has lower variance, especially in regions where some treatments have a small probability of being assigned.
DML might outperform DR methods under sparsity in high-dimensional settings.

Conclusion

Double Machine Learning represents a significant advancement in causal inference, allowing us to leverage the power of machine learning while maintaining the statistical properties necessary for causal estimation. By addressing key sources of bias and allowing for flexible modeling, DML opens up new possibilities for accurate causal effect estimation in complex, high-dimensional settings.

However, like all methods, DML is not a silver bullet. It requires careful consideration of assumptions, particularly regarding hidden confounding, and thoughtful application of machine learning techniques. When applied correctly, though, DML can provide robust, accurate estimates of causal effects, making it a valuable tool in the modern causal inference toolkit.

As you apply these methods in your own work, remember that the choice between DML, DR methods, or other approaches often depends on the specific characteristics of your data and research question. Always validate your results, compare different methods, and consider the underlying causal structure of your problem. With these considerations in mind, DML can be a powerful addition to your causal inference arsenal.

18 July, 2024 ahmed.ismail2013

Ahmed Dawoud