数据集:
mstz/adult
The Adult dataset from the UCI ML repository . Census dataset including personal characteristic of a person, and their income threshold.
| Configuration | Task | Description |
|---|---|---|
| encoding | Encoding dictionary showing original values of encoded features. | |
| income | Binary classification | Classify the person's income as over or under the threshold. |
| income-no race | Binary classification | As income , but the race feature is removed. |
| race | Multiclass classification | Predict the race of the individual. |
from datasets import load_dataset
dataset = load_dataset("mstz/adult", "income")["train"]
Target feature changes according to the selected configuration and is always in last position in the dataset.
| Feature | Type | Description |
|---|---|---|
| age | [int64] | Age of the person. |
| capital_gain | [float64] | Capital gained by the person. |
| capital_loss | [float64] | Capital lost by the person. |
| education | [int8] | Education level: the higher, the more educated the person. |
| final_weight | [int64] | |
| hours_worked_per_week | [int64] | Hours worked per week. |
| marital_status | [string] | Marital status of the person. |
| native_country | [string] | Native country of the person. |
| occupation | [string] | Job of the person. |
| race | [string] | Race of the person. |
| relationship | [string] | |
| is_male | [bool] | Man/Woman. |
| workclass | [string] | Type of job of the person. |
| over_threshold | int8 | 1 for income >= 50k$ , 0 otherwise. |