Fix Python – UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples

I’m getting this weird error:
classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
‘precision’, ‘predicted’, average, warn_for)`

but then it also prints the f-score the first time I run:
metrics.f1_score(y_test, y_pred, average=’weighted’)

The second time I run, it prov….

Fix Python – sklearn plot confusion matrix with labels

I want to plot a confusion matrix to visualize the classifer’s performance, but it shows only the numbers of the labels, not the labels themselves:
from sklearn.metrics import confusion_matrix
import pylab as pl
y_test=[‘business’, ‘business’, ‘business’, ‘business’, ‘business’, ‘business’, ‘business’, ‘business’, ‘business’, ‘business’, ‘business….

Fix Python – Stratified Train/Test-split in scikit-learn

I need to split my data into a training set (75%) and test set (25%). I currently do that with the code below:
X, Xt, userInfo, userInfo_train = sklearn.cross_validation.train_test_split(X, userInfo)

However, I’d like to stratify my training dataset. How do I do that? I’ve been looking into the StratifiedKFold method, but doesn’t let me specif….

Fix Python – What are the different use cases of joblib versus pickle?

Background: I’m just getting started with scikit-learn, and read at the bottom of the page about joblib, versus pickle.

it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big data, but can only pickle to the disk and not to a string

I read this Q&A on Pickle,
Common use-case….

Fix Python – how to check which version of nltk, scikit learn installed?

In shell script I am checking whether this packages are installed or not, if not installed then install it. So withing shell script:
import nltk
echo nltk.__version__

but it stops shell script at import line
in linux terminal tried to see in this manner:
which nltk

which gives nothing thought it is installed.
Is there any other way to verify thi….

Fix Python – How to use sklearn fit_transform with pandas and return dataframe instead of numpy array?

I want to apply scaling (using StandardScaler() from sklearn.preprocessing) to a pandas dataframe. The following code returns a numpy array, so I lose all the column names and indeces. This is not what I want.
features = df[[“col1”, “col2”, “col3”, “col4”]]
autoscaler = StandardScaler()
features = autoscaler.fit_transform(features)

A “solution” I….