数据集:
mstz/compas
The Compas dataset for recidivism prediction. Dataset known to have racial bias issues, check this Propublica article on the topic.
| Configuration | Task | Description |
|---|---|---|
| encoding | Encoding dictionary showing original values of encoded features. | |
| two-years-recidividity | Binary classification | Will the defendant be a violent recidivist? |
| two-years-recidividity-no-race | Binary classification | As above, but the race feature is removed. |
| priors-prediction | Regression | How many prior crimes has the defendant committed? |
| priors-prediction-no-race | Binary classification | As above, but the race feature is removed. |
| race | Multiclass classification | What is the race of the defendant? |
from datasets import load_dataset
dataset = load_dataset("mstz/compas", "two-years-recidividity")["train"]
| Feature | Type | Description |
|---|---|---|
| sex | int64 | |
| age | int64 | |
| race | int64 | |
| number_of_juvenile_fellonies | int64 | |
| decile_score | int64 | Criminality score |
| number_of_juvenile_misdemeanors | int64 | |
| number_of_other_juvenile_offenses | int64 | |
| number_of_prior_offenses | int64 | |
| days_before_screening_arrest | int64 | |
| is_recidivous | int64 | |
| days_in_custody | int64 | Days spent in custody |
| is_violent_recidivous | int64 | |
| violence_decile_score | int64 | Criminality score for violent crimes |
| two_years_recidivous | int64 |