Question
Asked By – Hellnar
Assume that I have a set of data pair where index 0 is the value and index 1 is the type:
input = [
('11013331', 'KAT'),
('9085267', 'NOT'),
('5238761', 'ETH'),
('5349618', 'ETH'),
('11788544', 'NOT'),
('962142', 'ETH'),
('7795297', 'ETH'),
('7341464', 'ETH'),
('9843236', 'KAT'),
('5594916', 'ETH'),
('1550003', 'ETH')
]
I want to group them by their type (by the 1st indexed string) as such:
result = [
{
'type': 'KAT',
'items': ['11013331', '9843236']
},
{
'type': 'NOT',
'items': ['9085267', '11788544']
},
{
'type': 'ETH',
'items': ['5238761', '962142', '7795297', '7341464', '5594916', '1550003']
}
]
How can I achieve this in an efficient way?
Now we will see solution for issue: Python group by
Answer
Do it in 2 steps. First, create a dictionary.
>>> input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for v, k in input: res[k].append(v)
...
Then, convert that dictionary into the expected format.
>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}]
It is also possible with itertools.groupby but it requires the input to be sorted first.
>>> sorted_input = sorted(input, key=itemgetter(1))
>>> groups = groupby(sorted_input, key=itemgetter(1))
>>> [{'type':k, 'items':[x[0] for x in v]} for k, v in groups]
[{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}]
Note both of these do not respect the original order of the keys. You need an OrderedDict if you need to keep the order.
>>> from collections import OrderedDict
>>> res = OrderedDict()
>>> for v, k in input:
... if k in res: res[k].append(v)
... else: res[k] = [v]
...
>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}]
This question is answered By – kennytm
This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0