This is the report submitted by Team PASDA in the 2021 NIJ Recidivism Forecasting Challenge, a competition hosted by the U.S. Justice Department’s National Institute of Justice (NIJ), with the goal of “increasing public safety and improving the fair administration of justice across the United States.”
The challenge focused on data from the State of Georgia on individuals released from prison to parole supervision for the period January 1, 2013, through December 31, 2015. Challenge participants were tasked with constructing a predictive model of 1-, 2-, and 3-year recidivism upon release from prison based on variables such as age, gender, race, education, prior arrests, and convictions, as well as other variables. In reporting which variables were statistically significant, some were from the top-performing Catboost model. Several of the handcrafted features were top predictors, including total arrests normalized by release age and the difference between the percentage of days employed and jobs per year. The team considered several model families throughout the competition. These included unpenalized linear models (lasso, ridge, elasticnet, and relaxed lasso), generalized additive models (GAM), boosted trees (GBM, xgboost, and catboost), and bagged trees (random forest). Select Interaction effects were considered in the linear mode. The report’s conclusion considers whether there are practical/applied findings that could help the field based on the team’s work. The team advises that event-level data available after parole seemed to be stronger features than static demographic data. This suggests that generating a good feature set will be important for building accurate forecasts. This is also revealed in recent research in which humans can outperform models with limited features; however, algorithms outperform humans when the feature set is expanded.
