Handling Missing Data in Decision Trees: A Probabilistic Approach
Pasha Khosravi, Antonio Vergari, YooJung Choi, Yitao Liang, and Guy Van den Broeck.
In The Art of Learning with Missing Values Workshop at ICML (Artemiss), 2020
TL;DR
We address the problem of handling missing data in decision trees via a probabilistic approach, using tractable density estimators to compute the “expected prediction” of the model.
Abstract
Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine learning models. As such, handling missing data in decision trees is a well studied problem. In this paper, we tackle this problem by taking a probabilistic approach. At deployment time, we use tractable density estimators to compute the “expected prediction” of our models. At learning time, we fine-tune parameters of already learned trees by minimizing their “expected prediction loss” w.r.t. our density estimators. We provide brief experiments showcasing effectiveness of our methods compared to few baselines.
Citation
@inproceedings{KhosraviArtemiss20, author = {Khosravi, Pasha and Vergari, Antonio and Choi, YooJung and Liang, Yitao and Van den Broeck, Guy}, title = {Handling Missing Data in Decision Trees: A Probabilistic Approach}, booktitle = {The Art of Learning with Missing Values Workshop at ICML (Artemiss)}, month = {jul}, year = {2020}, }