The colored lines represent different types of ensembled statistical models and their accuracy across various thresholds of their predicted probabilities. The points represent the accuracy of comparable models in the research literature using reported accuracy from a paper by Alex Bowers:
Bowers, A.J., Sprott, R.*, Taff, S.A.* (2013) Do we Know Who Will Drop Out? A Review of the Predictors of Dropping out of High School: Precision, Sensitivity and Specificity. The
High School Journal, 96(2), 77-100. doi:10.1353/hsj.2013.0000. This article serves as good background and grounds the benchmarking of the models built in Wisconsin and for others when benchmarking their own models.
The state of Wisconsin has one of the highest four year graduation rates in the nation, but deep disparities among student subgroups remain. To address this the state has created the Wisconsin Dropout Early Warning System (DEWS), a predictive model of student dropout risk for students in grades six through nine. The Wisconsin DEWS is in use statewide and currently provides predictions on the likelihood of graduation for over 225,000 students. DEWS represents a novel statistical learning based approach to the challenge of assessing the risk of non-graduation for students and provides highly accurate predictions for students in the middle grades without expanding beyond mandated administrative data collections.
Similar dropout early warning systems are in place in many jurisdictions across the country. Prior research has shown that in many cases the indicators used by such systems do a poor job of balancing the trade off between correct classification of likely dropouts and false-alarm (Bowers et al., 2013). Building on this work, DEWS uses the receiver-operating characteristic (ROC) metric to identify the best possible set of statistical models for making predictions about individual students.
This paper describes the DEWS approach and the software behind it, which leverages the open source statistical language R (R Core Team, 2013). As a result DEWS is a flexible series of software modules that can adapt to new data, new algorithms, and new outcome variables to not only predict dropout, but also impute key predictors as well. The design and implementation of each of these modules is described in detail as well as the open-source R package, EWStools, that serves as the core of DEWS (Knowles, 2014).
The code that powers the EWS is an open source R extension of the caret package which is available on GitHub: EWStools on GitHub