Asked By – pir
I need to split my data into a training set (75%) and test set (25%). I currently do that with the code below:
X, Xt, userInfo, userInfo_train = sklearn.cross_validation.train_test_split(X, userInfo)
However, I’d like to stratify my training dataset. How do I do that? I’ve been looking into the
StratifiedKFold method, but doesn’t let me specifiy the 75%/25% split and only stratify the training dataset.
Now we will see solution for issue: Stratified Train/Test-split in scikit-learn
[update for 0.17]
See the docs of
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.25)
[/update for 0.17]
There is a pull request here.
But you can simply do
train, test = next(iter(StratifiedKFold(...)))
and use the train and test indices if you want.
This question is answered By – Andreas Mueller
This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0