This paper reports on a comparison of regression models to more advanced techniques for the development of risk assessments through the use of algorithms, and notes that findings demonstrate that sample size trumps algorithm type.
Risk assessments have been constructed using a variety of algorithms, from bivariate associations, to regression, to advanced machine learning (ML) approaches. While promising greater accuracy, agencies are hesitant to adopt tools using newer ML approaches, noting concerns of bias and transparency. Research is needed to identify optimal scenarios for algorithm use in assessment development. We compared regression models (logistic, boosted, and penalized) to more advanced, techniques (neural networks, support vector machines, random forests, and K-nearest neighbors); while also introducing 'stacking', a method that combines algorithms to create an optimized model. Using a multi- state sample of 258,464 youth assessments, we varied prediction scenarios by sample size and base rate. While performance generally improved with greater sample size, a set of 'top performing' algorithms was identified. Among top performers, a 'saturation point' was observed, where algorithm type had little impact when samples exceeded 5000 subjects. In an era of big data and artificial intelligence, it is tantalizing to explore new approaches. While we do not hasten exploration, our findings demonstrate that sample size trumps algorithm type. Agencies and providers should consider this finding when adopting or developing tools, as algorithms that offer transparency may also be top performers. (Published Abstract Provided)
Downloads
Similar Publications
- Development and validation of a systematic approach for the quantitative assessment of the quality of duct tape physical fits
- Evaluating Machine Learning Methods on a Large-scale of in Silico Fire Debris Data
- Prevalence Estimates and Correlates of Elder Abuse in the United States: The National Intimate Partner and Sexual Violence Survey