Fix Python – How to split text in a column into multiple rows

Question

Asked By – Bradley

I’m working with a large csv file and the next to last column has a string of text that I want to split by a specific delimiter. I was wondering if there is a simple way to do this using pandas or python?

CustNum  CustomerName     ItemQty  Item   Seatblocks                 ItemExt
32363    McCartney, Paul      3     F04    2:218:10:4,6                   60
31316    Lennon, John        25     F01    1:13:36:1,12 1:13:37:1,13     300

I want to split by the space (' ') and then the colon (':') in the Seatblocks column, but each cell would result in a different number of columns. I have a function to rearrange the columns so the Seatblocks column is at the end of the sheet, but I’m not sure what to do from there. I can do it in excel with the built in text-to-columns function and a quick macro, but my dataset has too many records for excel to handle.

Ultimately, I want to take records such John Lennon’s and create multiple lines, with the info from each set of seats on a separate line.

Now we will see solution for issue: How to split text in a column into multiple rows


Answer

This splits the Seatblocks by space and gives each its own row.

In [43]: df
Out[43]: 
   CustNum     CustomerName  ItemQty Item                 Seatblocks  ItemExt
0    32363  McCartney, Paul        3  F04               2:218:10:4,6       60
1    31316     Lennon, John       25  F01  1:13:36:1,12 1:13:37:1,13      300

In [44]: s = df['Seatblocks'].str.split(' ').apply(Series, 1).stack()

In [45]: s.index = s.index.droplevel(-1) # to line up with df's index

In [46]: s.name = 'Seatblocks' # needs a name to join

In [47]: s
Out[47]: 
0    2:218:10:4,6
1    1:13:36:1,12
1    1:13:37:1,13
Name: Seatblocks, dtype: object

In [48]: del df['Seatblocks']

In [49]: df.join(s)
Out[49]: 
   CustNum     CustomerName  ItemQty Item  ItemExt    Seatblocks
0    32363  McCartney, Paul        3  F04       60  2:218:10:4,6
1    31316     Lennon, John       25  F01      300  1:13:36:1,12
1    31316     Lennon, John       25  F01      300  1:13:37:1,13

Or, to give each colon-separated string in its own column:

In [50]: df.join(s.apply(lambda x: Series(x.split(':'))))
Out[50]: 
   CustNum     CustomerName  ItemQty Item  ItemExt  0    1   2     3
0    32363  McCartney, Paul        3  F04       60  2  218  10   4,6
1    31316     Lennon, John       25  F01      300  1   13  36  1,12
1    31316     Lennon, John       25  F01      300  1   13  37  1,13

This is a little ugly, but maybe someone will chime in with a prettier solution.

This question is answered By – Dan Allan

This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0