Mapping values inside pandas column

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


Mapping values inside pandas column



I used the code below to map the 2 values inside S column to 0 but it didn't work. Any suggestion on how to solve this?
N.B : I want to implement an external function inside the map.


df = pd.DataFrame({'Age':[30,40,50,60,70,80],'Sex':
['F','M','M','F','M','F'],'S':
[1,1,2,2,1,2]})
def app(value):
for n in df['S']:
if n == 1:
return 1
if n == 2:
return 0
df["S"] = df.S.map(app)




8 Answers
8



Use eq to create a boolean series and conver that boolean series to int with astype:


eq


astype


df['S'] = df['S'].eq(1).astype(int)



OR


df['S'] = (df['S'] == 1).astype(int)



Output:


Age Sex S
0 30 F 1
1 40 M 1
2 50 M 0
3 60 F 0
4 70 M 1
5 80 F 0





Hmm, this is much faster than assigning via loc
– user3483203
yesterday


loc





@user3483203 you can try mask , should be faster :-) df.S.mask(df.S>1,0)
– Wen
yesterday


df.S.mask(df.S>1,0)





Yep, much faster, I need to use mask more :D
– user3483203
yesterday


mask



Don't use apply, simply use loc to assign the values:


apply


loc


df.loc[df.S.eq(2), 'S'] = 0

Age Sex S
0 30 F 1
1 40 M 1
2 50 M 0
3 60 F 0
4 70 M 1
5 80 F 0



If you need a more performant option, use np.select. This is also more scalable, as you can always add more conditions:


np.select


df['S'] = np.select([df.S.eq(2)], [0], 1)



You're close but you need a few corrections. Since you want to use a function, remove the for loop and replace n with value. Additionally, use apply instead of map. Apply operates on the entire column at once. See this answer for how to properly use apply vs applymap vs map


for


n


value


apply


map


Apply


apply


applymap


map


def app(value):
if value == 1:
return 1
elif value == 2:
return 0
df['S'] = df.S.apply(app)
Age Sex S
0 30 F 1
1 40 M 1
2 50 M 0
3 60 F 0
4 70 M 1
5 80 F 0



If you only wish to change values equal to 2, you can use pd.DataFrame.loc:


pd.DataFrame.loc


df.loc[df['S'] == 0, 'S'] = 0



pd.Series.apply is not recommend and this is just a thinly veiled, inefficient loop.


pd.Series.apply



You could use .replace as follows:
df["S"] = df["S"].replace([2], 0)
This will replace all of 2 values to 0 in one line



Go with vectorize numpy operation:


df['S'] = np.abs(df['S'] - 2)



and stand yourself out from competitions in interviews and SO answers :)


>>>df = pd.DataFrame({'Age':[30,40,50,60,70,80],'Sex':
['F','M','M','F','M','F'],'S':
[1,1,2,2,1,2]})


>>> def app(value):
return 1 if value == 1 else 0
# or app = lambda value : 1 if value == 1 else 0

>>> df["S"] = df["S"].map(app)

>>> df
Age S Sex
Age S Sex
0 30 1 F
1 40 1 M
2 50 0 M
3 60 0 F
4 70 1 M
5 80 0 F



You can do:


import numpy as np

df['S'] = np.where(df['S'] == 2, 0, df['S'])






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Keycloak server returning user_not_found error when user is already imported with LDAP

PHP parse/syntax errors; and how to solve them?

How to scale/resize CVPixelBufferRef in objective C, iOS