The authors report on their winning use of CATBOOST models in predicting recidivism under the parameters of the U.S. Justice Department’s National Institute of Justice’s (NIJ’s) Recidivism Forecasting Challenge.
In 2021, NIJ hosted a competition to forecast recidivism using person and place-based variables with the goal of improving outcomes for those serving a community supervision sentence.The contest used data from Georgia on persons released from prison to parole supervision for the period January 1, 2013 through December 31, 2015. Contestants in the Challenge submitted forecasts (percent likelihoods) of whether individuals in the dataset recidivated within 1 year, 2 years, or 3 years after release. Each of these three categories was judged separately by gender and the average across gender for nine categories, focusing on raw accuracy that varied according to team size. Another six categories judged the forecasts with a metric penalizing racial bias. The current report describes the authors’ models, results, and observations. The literature section provides an overview of some relevant topics from the literature. The section on variables and models describes the implementation of the machine learning algorithm CATBOOST to forecast recidivism. This is followed by a discussion of modeling issues associated with racial bias. Results and conclusions are then discussed, along with ethical issues, which are critical when considering a forecasting model of this nature. Among large teams and businesses, the authors finished in the top five positions for seven of the nine categories for raw accuracy accounting for racial bias. The algorithms used found variables relating to employment to be highly influential in the model. This suggests that policies and other interventions related to employment should be evaluated to determine their effectiveness in reducing recidivism.