This is the research paper presented by the team of listed authors for the National Institute of Justice’s (NIJ’s) Recidivism Forecasting Challenge
The dataset provided for the challenge consists of individuals who were released from state prison to parole supervision in Georgia between 2013 and 2015. The data include a wide array of prerelease individual-level characteristics ranging from demographics and detailed criminal history and supervision history and current case information to post-release characteristics, such as parole conditions and violations. The target prediction outcomes are a new in-state arrest for a felony or misdemeanor offense within 1, 2, and 3 years from release. In this challenge, the team did simple exploration of the original dataset and then made targeted processing of data. It also expanded the dataset by concatenating external census data. The team then chose a certain set of well-performed machine learning models and used grid searching to determine their best hyper-parameter. With all experiment results, the team further summed the weighted output of logistic regression, GBDT and Xgboost as the final output. Some limitations to the work are noted due to lack of time and insufficient efficiency.