Team Early Stopping presents its recidivism predictions for years 2 and 3 of the parole period for the NIJ Recidivism 2021 Challenge.
Regarding the variables selected for the calculation of recidivism, the team did not include many new features, but rather converted features to ordered numeric values whenever possible. Before fitting the final model, the team fit a series of preliminary boosted tree models to impute missing data and to predict secondary outcomes; for example, to predict Year 2, the team first predicted Year 3 and Any Year and then used out-of-field predictions for them as input features. The two imputed supplementary targets ended up being the most important variables by a wide margin, which makes sense, given their interrelationships with the primary target. The team used ensembles of gradient boosted tree models from three packages: XGBoost, LightGBM, and Catboost. It used five-fold cross-validation to create out-of-fold predictions, and then a linear model as the second level model to determine final blending weights. The team used Brier Score and Binary Cross Entropy as loss functions and made no special adjustment for male/female or for thresholding. Future considerations are discussed.
