Fix Python – Sample datasets in Pandas


Asked By – canyon289

When using R it’s handy to load “practice” datasets using




Is there something similar for Pandas? I know I can load using any other method, just curious if there’s anything builtin.

Now we will see solution for issue: Sample datasets in Pandas


Since I originally wrote this answer, I have updated it with the many ways that are now available for accessing sample data sets in Python. Personally, I tend to stick with whatever package I am
already using (usually seaborn or pandas). If you need offline access,
installing the data set with Quilt seems to be the only option.


The brilliant plotting package seaborn has several built-in sample data sets.

import seaborn as sns

iris = sns.load_dataset('iris')
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa


If you do not want to import seaborn, but still want to access its sample
data sets
, you can use @andrewwowens’s approach for the seaborn sample

iris = pd.read_csv('')

Note that the sample data sets containing categorical columns have their column
type modified by sns.load_dataset()
and the result might not be the same
by getting it from the url directly. The iris and tips sample data sets are also
available in the pandas github repo here.

R sample datasets

Since any dataset can be read via pd.read_csv(), it is possible to access all
R’s sample data sets by copying the URLs from this R data set

Additional ways of loading the R sample data sets include

import statsmodels.api as sm

iris = sm.datasets.get_rdataset('iris').data

and PyDataset

from pydataset import data

iris = data('iris')


scikit-learn returns sample data as numpy arrays rather than a pandas data

from sklearn.datasets import load_iris

iris = load_iris()
# `` holds the numerical values
# `iris.feature_names` holds the numerical column names
# `` holds the categorical (species) values (as ints)
# `iris.target_names` holds the unique categorical names


Quilt is a dataset manager created to facilitate
dataset management. It includes many common sample datasets, such as
several from the uciml sample
. The quick start
shows how to install
and import the iris data set:

# In your terminal
$ pip install quilt
$ quilt install uciml/iris

After installing a dataset, it is accessible locally, so this is the best option if you want to work with the data offline.

import as ir

iris = ir.tables.iris()
   sepal_length  sepal_width  petal_length  petal_width        class
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa

Quilt also support dataset versioning and include a short
of each dataset.

This question is answered By – joelostblom

This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0