This is the NIJ Recidivism Challenge Report submitted by Team Smith (Andy J. Smith) for the competition.
For the competition, contestants were provided with a training data set and three test sets with the objective of using the training data to develop and train a machine learning (ML) model that can predict the recidivism of the individuals in the data set. The training data provided were intended to represent the data a parole officer would have at the time the individual was released on parole. The data covered a wide range of inputs from the education and mental health fields to previous arrest and conviction information. Using the training data, Smith developed and tested a variety of traditional ML models to predict the recidivism of each person. To support this, Smith brought a variety of geographic data to inform on the environment each person was returning to on parole, although they provided little significance to the final model. The final model selected was an ensemble method of four traditional ML models. In addition to the data provided by the NIJ for the challenge, Smith added additional data taken from the PUMA of the individual as part of the normalization process. Smith had initially assumed that bringing in additional data about the PUMA zone would inform on the individual’s recidivism. The data brought in included the average income, average lot size, etc., but none of these variables were statistically significant. Smith concludes that given the variable performance, it is clear that maintaining employment throughout parole is key to preventing recidivism.