Asked By – Sean McCarthy
I’ve noticed three methods of selecting a column in a Pandas DataFrame:
First method of selecting a column using loc:
df_new = df.loc[:, 'col1']
Second method – seems simpler and faster:
df_new = df['col1']
Third method – most convenient:
df_new = df.col1
Is there a difference between these three methods? I don’t think so, in which case I’d rather use the third method.
I’m mostly curious as to why there appear to be three methods for doing the same thing.
Now we will see solution for issue: What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?
In the following situations, they behave the same:
- Selecting a single column (
df['A']is the same as
df.loc[:, 'A']-> selects column A)
- Selecting a list of columns (
df[['A', 'B', 'C']]is the same as
df.loc[:, ['A', 'B', 'C']]-> selects columns A, B and C)
- Slicing by rows (
df[1:3]is the same as
df.iloc[1:3]-> selects rows 1 and 2. Note, however, if you slice rows with
loc, instead of
iloc, you’ll get rows 1, 2 and 3 assuming you have a RangeIndex. See details here.)
 does not work in the following situations:
- You can select a single row with
- You can select a list of rows with
- You can slice columns with
These three cannot be done with
More importantly, if your selection involves both rows and columns, then assignment becomes problematic.
df[1:3]['A'] = 5
This selects rows 1 and 2 then selects column ‘A’ of the returning object and assigns value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of making this assignment is:
df.loc[1:3, 'A'] = 5
.loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (
df.loc[:, 'C':'F']), select a single row (
df.loc), and select a list of rows (
df.loc[[1, 2, 5]]).
Also note that these two were not included in the API at the same time.
.loc was added much later as a more powerful and explicit indexer. See unutbu’s answer for more detail.
Note: Getting columns with
. is a completely different topic.
. is only there for convenience. It only allows accessing columns whose names are valid Python identifiers (i.e. they cannot contain spaces, they cannot be composed of numbers…). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment
df.a = 1 won’t work if there is no column
a). Other than that,
 are the same.