Mid-Atlantic Opioid Task Force

Goal

Utilize machine learning in order to classify Opioid overdose incidents occurred in Pennsylvania.

Dataset

There are two classification models: one predicts survival, and another predicts Naloxone administration. Both models are similar, and therefore the input data is similar as well. Below are the features for the Survival classification model:

Data Sample
Fig.1 - Data Sample

Pseudo-code

  1. Create dataset
  2. Split data
  3. Initialize pipelines with the desired estimators
  4. Fit pipelines
  5. Cross-validate using StratifiedKFold (sklearn)
  6. Plot threshold scores to find the best threshold (maximize both AUC and Matthew’s scores)
  7. Use that threshold to predict

Class Balance

As can be seen below, the class is unbalanced, so class weights will be taken into account, as well as other evaluation methods such as ROC AUC and Matthews Correlation Coefficient.

Class Balance
Fig.2 - Class Balance

Baseline

Predicted survived if Naloxone was administered and there were no multiple drugs consumed:

Baseline Results
Fig.3 - Baseline Results

Correlation Heatmap

Used Spearman correlation and Seaborn package to draw a heatmap:

Correlation Heatmap
Fig.4 - Correlation Heatmap

Methods

Performance

Created a graph that shows the AUC and Matthew's scores as function of the threshold. The graph was generated using a function, so it could be generated for any model we utilized. Below is the result of Random Forest:

AUC and Matthew scores as function of thresholds
Fig.5 - AUC and Matthew scores as function of thresholds

After choosing a threshold of 0.6, a manual iteration was done in order to maximize confusion matrix' results:

Random Forest Final Results with threshold = 0.64
Fig.6 - Random Forest Final Results with threshold = 0.64