Split string from a preset list of strings from pandas df column

I have a pandas dataframe that looks like below. It has about a million rows.

name = ['Jake','Matt', 'Henry']



0   A        

1 Jake Hill

2 Matt Dawn

3 Matt King

4 White Henry

5 Hyde Jake

I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.

0   A

1  Jake

2  Matt

3  Matt

4  Henry

5  Jake

Thanks in advance. I am new to python so still figuring out the easiest way to do this.

edited Nov 20 '18 at 5:45

asked Nov 20 '18 at 5:29

Matt

546

2

what if value of column A doesn't exist in list?

– AkshayNevrekar
Nov 20 '18 at 5:31

1

What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?

– CIsForCookies
Nov 20 '18 at 5:31

Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.

– Matt
Nov 20 '18 at 5:33

add a comment |

I have a pandas dataframe that looks like below. It has about a million rows.

name = ['Jake','Matt', 'Henry']



0   A        

1 Jake Hill

2 Matt Dawn

3 Matt King

4 White Henry

5 Hyde Jake

I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.

0   A

1  Jake

2  Matt

3  Matt

4  Henry

5  Jake

Thanks in advance. I am new to python so still figuring out the easiest way to do this.

edited Nov 20 '18 at 5:45

asked Nov 20 '18 at 5:29

Matt

546

2

what if value of column A doesn't exist in list?

– AkshayNevrekar
Nov 20 '18 at 5:31

1

What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?

– CIsForCookies
Nov 20 '18 at 5:31

Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.

– Matt
Nov 20 '18 at 5:33

add a comment |

I have a pandas dataframe that looks like below. It has about a million rows.

name = ['Jake','Matt', 'Henry']



0   A        

1 Jake Hill

2 Matt Dawn

3 Matt King

4 White Henry

5 Hyde Jake

I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.

0   A

1  Jake

2  Matt

3  Matt

4  Henry

5  Jake

Thanks in advance. I am new to python so still figuring out the easiest way to do this.

edited Nov 20 '18 at 5:45

asked Nov 20 '18 at 5:29

Matt

546

I have a pandas dataframe that looks like below. It has about a million rows.

name = ['Jake','Matt', 'Henry']



0   A        

1 Jake Hill

2 Matt Dawn

3 Matt King

4 White Henry

5 Hyde Jake

I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.

0   A

1  Jake

2  Matt

3  Matt

4  Henry

5  Jake

Thanks in advance. I am new to python so still figuring out the easiest way to do this.

python python-3.x pandas python-2.7

edited Nov 20 '18 at 5:45

asked Nov 20 '18 at 5:29

Matt

546

edited Nov 20 '18 at 5:45

asked Nov 20 '18 at 5:29

Matt

546

edited Nov 20 '18 at 5:45

asked Nov 20 '18 at 5:29

Matt

546

asked Nov 20 '18 at 5:29

Matt

546

asked Nov 20 '18 at 5:29

Matt

546

2

what if value of column A doesn't exist in list?

– AkshayNevrekar
Nov 20 '18 at 5:31

1

What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?

– CIsForCookies
Nov 20 '18 at 5:31

Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.

– Matt
Nov 20 '18 at 5:33

add a comment |

2

what if value of column A doesn't exist in list?

– AkshayNevrekar
Nov 20 '18 at 5:31

1

What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?

– CIsForCookies
Nov 20 '18 at 5:31

Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.

– Matt
Nov 20 '18 at 5:33

what if value of column A doesn't exist in list?

– AkshayNevrekar
Nov 20 '18 at 5:31

What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?

– CIsForCookies
Nov 20 '18 at 5:31

Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.

– Matt
Nov 20 '18 at 5:33

add a comment |

7 Answers
7

active

oldest

votes

You need:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})



def func(x):

    for k in first_name:

        if k in x:

            return k 

    return x



df['A'] = df['A'].apply(lambda x: func(x))

Output:

            A

0           Jake

1           Matt

2           Matt

3          Henry

4           Jake

5    Dwayne John

edited Nov 20 '18 at 5:53

answered Nov 20 '18 at 5:37

AkshayNevrekar

4,14491735

Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:48

add a comment |

You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.

df.A.str.extract(r'({})'.format('|'.join(name)))

       0

0   Jake

1   Matt

2   Matt

3  Henry

4   Jake

answered Nov 20 '18 at 5:59

user3483203

30.8k82454

add a comment |

Here is one method to achieve this:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})



df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))

and you get:

             A      B

0    Jake Hill   Jake

1    Matt Dawn   Matt

2    Matt King   Matt

3  Henry White  Henry

4    Jake Hyde   Jake

answered Nov 20 '18 at 5:37

Gerges Dib

3,0181820

Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:46

add a comment |

name = ['Jake','Matt', 'Henry']

df = pd.read_csv("file.csv")



#filling nan values in-case if it is there

df.fillna(0, inplace = True)

df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0]  if x != 0 else "Not Found")

Output:

             A First Name

0    Jake Hill       Jake

1    Matt Dawn       Matt

2    Matt King       Matt

3  Henry White      Henry

4    Hyde Jake       Jake

edited Nov 20 '18 at 5:51

answered Nov 20 '18 at 5:40

Srce Cde

1,164511

Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:47

add a comment |

Try using:

A_final=A[0].str.split(' ',expand=True, n=1).str.get(0) A_final[0]
, your problem is resolved.

edited Nov 20 '18 at 6:05

answered Nov 20 '18 at 6:01

Jeet Bhattachariya

What is this doing?

– pygo
Nov 20 '18 at 6:03

add a comment |

In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A Fist and choose the First Index of of it and passing to lambda using apply method.

DataFrame Structure:

df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde

Your name Var..

$ name

['Jake', 'Matt', 'Henry']

Your Final desired Dataset:

Parameter n can be used to limit the number of splits in the output.

df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))



   print(df)

           A

    0   Jake

    1   Matt

    2   Matt

    3  Henry

    4   Jake

It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :

>>> df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde





>>> df['A'].str.split(n=1, expand=True)[0]

0     Jake

1     Matt

2     Matt

3    Henry

4     Jake

Name: 0, dtype: object

OR In case you want inplace replacement for column A ..

df['A'] = df['A'].str.split(n=1, expand=True)[0]

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 5:44

pygo

3,0551619

your input df is different from the user input. In this problem first name is customised.

– Mohamed Thasin ah
Nov 20 '18 at 5:59

@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.

– pygo
Nov 20 '18 at 6:00

In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.

– Mohamed Thasin ah
Nov 20 '18 at 6:02

add a comment |

This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).

# split the name strings into columns as new dataframe

df1 = df.A.str.split(' ', expand=True)

# Keep the first names in the new dataframe and fill the rest with

# empty strings, then sum the df1 column string values to make a new array

names_result = np.where(df1.isin(name), df1, '').sum(axis=1)

# find the array indexes where no first names were found

no_match_idx = np.where(names_result == '')[0]

# fill the no first name index locations with original dataframe values

names_result[no_match_idx] = df.A.values[no_match_idx]

# make a dataframe using the results

df_out = pd.DataFrame(names_result, columns=['A'])



# to find names with a first and last name that are both found in the

# first names list:

# df_out['dups'] = df1.isin(name).sum(axis=1) > 1

edited Nov 21 '18 at 2:38

answered Nov 21 '18 at 2:00

b2002

546148

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53386763%2fsplit-string-from-a-preset-list-of-strings-from-pandas-df-column%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

7 Answers
7

active

oldest

votes

7 Answers
7

active

oldest

votes

You need:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})



def func(x):

    for k in first_name:

        if k in x:

            return k 

    return x



df['A'] = df['A'].apply(lambda x: func(x))

Output:

            A

0           Jake

1           Matt

2           Matt

3          Henry

4           Jake

5    Dwayne John

edited Nov 20 '18 at 5:53

answered Nov 20 '18 at 5:37

AkshayNevrekar

4,14491735

Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:48

add a comment |

You need:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})



def func(x):

    for k in first_name:

        if k in x:

            return k 

    return x



df['A'] = df['A'].apply(lambda x: func(x))

Output:

            A

0           Jake

1           Matt

2           Matt

3          Henry

4           Jake

5    Dwayne John

edited Nov 20 '18 at 5:53

answered Nov 20 '18 at 5:37

AkshayNevrekar

4,14491735

Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:48

add a comment |

You need:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})



def func(x):

    for k in first_name:

        if k in x:

            return k 

    return x



df['A'] = df['A'].apply(lambda x: func(x))

Output:

            A

0           Jake

1           Matt

2           Matt

3          Henry

4           Jake

5    Dwayne John

edited Nov 20 '18 at 5:53

answered Nov 20 '18 at 5:37

AkshayNevrekar

4,14491735

You need:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})



def func(x):

    for k in first_name:

        if k in x:

            return k 

    return x



df['A'] = df['A'].apply(lambda x: func(x))

Output:

            A

0           Jake

1           Matt

2           Matt

3          Henry

4           Jake

5    Dwayne John

edited Nov 20 '18 at 5:53

answered Nov 20 '18 at 5:37

AkshayNevrekar

4,14491735

edited Nov 20 '18 at 5:53

answered Nov 20 '18 at 5:37

AkshayNevrekar

4,14491735

answered Nov 20 '18 at 5:37

AkshayNevrekar

4,14491735

answered Nov 20 '18 at 5:37

AkshayNevrekar

4,14491735

Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:48

add a comment |

Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:48

Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:48

add a comment |

You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.

df.A.str.extract(r'({})'.format('|'.join(name)))

       0

0   Jake

1   Matt

2   Matt

3  Henry

4   Jake

answered Nov 20 '18 at 5:59

user3483203

30.8k82454

add a comment |

You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.

df.A.str.extract(r'({})'.format('|'.join(name)))

       0

0   Jake

1   Matt

2   Matt

3  Henry

4   Jake

answered Nov 20 '18 at 5:59

user3483203

30.8k82454

add a comment |

You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.

df.A.str.extract(r'({})'.format('|'.join(name)))

       0

0   Jake

1   Matt

2   Matt

3  Henry

4   Jake

answered Nov 20 '18 at 5:59

user3483203

30.8k82454

You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.

df.A.str.extract(r'({})'.format('|'.join(name)))

       0

0   Jake

1   Matt

2   Matt

3  Henry

4   Jake

answered Nov 20 '18 at 5:59

user3483203

30.8k82454

answered Nov 20 '18 at 5:59

user3483203

30.8k82454

answered Nov 20 '18 at 5:59

user3483203

30.8k82454

answered Nov 20 '18 at 5:59

user3483203

30.8k82454

add a comment |

Here is one method to achieve this:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})



df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))

and you get:

             A      B

0    Jake Hill   Jake

1    Matt Dawn   Matt

2    Matt King   Matt

3  Henry White  Henry

4    Jake Hyde   Jake

answered Nov 20 '18 at 5:37

Gerges Dib

3,0181820

Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:46

add a comment |

Here is one method to achieve this:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})



df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))

and you get:

             A      B

0    Jake Hill   Jake

1    Matt Dawn   Matt

2    Matt King   Matt

3  Henry White  Henry

4    Jake Hyde   Jake

answered Nov 20 '18 at 5:37

Gerges Dib

3,0181820

Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:46

add a comment |

Here is one method to achieve this:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})



df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))

and you get:

             A      B

0    Jake Hill   Jake

1    Matt Dawn   Matt

2    Matt King   Matt

3  Henry White  Henry

4    Jake Hyde   Jake

answered Nov 20 '18 at 5:37

Gerges Dib

3,0181820

Here is one method to achieve this:

first_name = ['Jake','Matt', 'Henry']



df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})



df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))

and you get:

             A      B

0    Jake Hill   Jake

1    Matt Dawn   Matt

2    Matt King   Matt

3  Henry White  Henry

4    Jake Hyde   Jake

answered Nov 20 '18 at 5:37

Gerges Dib

3,0181820

answered Nov 20 '18 at 5:37

Gerges Dib

3,0181820

answered Nov 20 '18 at 5:37

Gerges Dib

3,0181820

answered Nov 20 '18 at 5:37

Gerges Dib

3,0181820

Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:46

add a comment |

Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:46

Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:46

add a comment |

name = ['Jake','Matt', 'Henry']

df = pd.read_csv("file.csv")



#filling nan values in-case if it is there

df.fillna(0, inplace = True)

df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0]  if x != 0 else "Not Found")

Output:

             A First Name

0    Jake Hill       Jake

1    Matt Dawn       Matt

2    Matt King       Matt

3  Henry White      Henry

4    Hyde Jake       Jake

edited Nov 20 '18 at 5:51

answered Nov 20 '18 at 5:40

Srce Cde

1,164511

Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:47

add a comment |

name = ['Jake','Matt', 'Henry']

df = pd.read_csv("file.csv")



#filling nan values in-case if it is there

df.fillna(0, inplace = True)

df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0]  if x != 0 else "Not Found")

Output:

             A First Name

0    Jake Hill       Jake

1    Matt Dawn       Matt

2    Matt King       Matt

3  Henry White      Henry

4    Hyde Jake       Jake

edited Nov 20 '18 at 5:51

answered Nov 20 '18 at 5:40

Srce Cde

1,164511

Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:47

add a comment |

name = ['Jake','Matt', 'Henry']

df = pd.read_csv("file.csv")



#filling nan values in-case if it is there

df.fillna(0, inplace = True)

df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0]  if x != 0 else "Not Found")

Output:

             A First Name

0    Jake Hill       Jake

1    Matt Dawn       Matt

2    Matt King       Matt

3  Henry White      Henry

4    Hyde Jake       Jake

edited Nov 20 '18 at 5:51

answered Nov 20 '18 at 5:40

Srce Cde

1,164511

name = ['Jake','Matt', 'Henry']

df = pd.read_csv("file.csv")



#filling nan values in-case if it is there

df.fillna(0, inplace = True)

df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0]  if x != 0 else "Not Found")

Output:

             A First Name

0    Jake Hill       Jake

1    Matt Dawn       Matt

2    Matt King       Matt

3  Henry White      Henry

4    Hyde Jake       Jake

edited Nov 20 '18 at 5:51

answered Nov 20 '18 at 5:40

Srce Cde

1,164511

edited Nov 20 '18 at 5:51

answered Nov 20 '18 at 5:40

Srce Cde

1,164511

answered Nov 20 '18 at 5:40

Srce Cde

1,164511

answered Nov 20 '18 at 5:40

Srce Cde

1,164511

Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:47

add a comment |

Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:47

Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.

– Matt
Nov 20 '18 at 5:47

add a comment |

Try using:

A_final=A[0].str.split(' ',expand=True, n=1).str.get(0) A_final[0]
, your problem is resolved.

edited Nov 20 '18 at 6:05

answered Nov 20 '18 at 6:01

Jeet Bhattachariya

What is this doing?

– pygo
Nov 20 '18 at 6:03

add a comment |

Try using:

A_final=A[0].str.split(' ',expand=True, n=1).str.get(0) A_final[0]
, your problem is resolved.

edited Nov 20 '18 at 6:05

answered Nov 20 '18 at 6:01

Jeet Bhattachariya

What is this doing?

– pygo
Nov 20 '18 at 6:03

add a comment |

Try using:

A_final=A[0].str.split(' ',expand=True, n=1).str.get(0) A_final[0]
, your problem is resolved.

edited Nov 20 '18 at 6:05

answered Nov 20 '18 at 6:01

Jeet Bhattachariya

Try using:

A_final=A[0].str.split(' ',expand=True, n=1).str.get(0) A_final[0]
, your problem is resolved.

edited Nov 20 '18 at 6:05

answered Nov 20 '18 at 6:01

Jeet Bhattachariya

edited Nov 20 '18 at 6:05

answered Nov 20 '18 at 6:01

Jeet Bhattachariya

answered Nov 20 '18 at 6:01

Jeet Bhattachariya

answered Nov 20 '18 at 6:01

Jeet Bhattachariya

What is this doing?

– pygo
Nov 20 '18 at 6:03

add a comment |

What is this doing?

– pygo
Nov 20 '18 at 6:03

What is this doing?

– pygo
Nov 20 '18 at 6:03

add a comment |

DataFrame Structure:

df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde

Your name Var..

$ name

['Jake', 'Matt', 'Henry']

Your Final desired Dataset:

Parameter n can be used to limit the number of splits in the output.

df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))



   print(df)

           A

    0   Jake

    1   Matt

    2   Matt

    3  Henry

    4   Jake

It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :

>>> df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde





>>> df['A'].str.split(n=1, expand=True)[0]

0     Jake

1     Matt

2     Matt

3    Henry

4     Jake

Name: 0, dtype: object

OR In case you want inplace replacement for column A ..

df['A'] = df['A'].str.split(n=1, expand=True)[0]

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 5:44

pygo

3,0551619

your input df is different from the user input. In this problem first name is customised.

– Mohamed Thasin ah
Nov 20 '18 at 5:59

@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.

– pygo
Nov 20 '18 at 6:00

In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.

– Mohamed Thasin ah
Nov 20 '18 at 6:02

add a comment |

DataFrame Structure:

df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde

Your name Var..

$ name

['Jake', 'Matt', 'Henry']

Your Final desired Dataset:

Parameter n can be used to limit the number of splits in the output.

df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))



   print(df)

           A

    0   Jake

    1   Matt

    2   Matt

    3  Henry

    4   Jake

It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :

>>> df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde





>>> df['A'].str.split(n=1, expand=True)[0]

0     Jake

1     Matt

2     Matt

3    Henry

4     Jake

Name: 0, dtype: object

OR In case you want inplace replacement for column A ..

df['A'] = df['A'].str.split(n=1, expand=True)[0]

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 5:44

pygo

3,0551619

your input df is different from the user input. In this problem first name is customised.

– Mohamed Thasin ah
Nov 20 '18 at 5:59

@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.

– pygo
Nov 20 '18 at 6:00

In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.

– Mohamed Thasin ah
Nov 20 '18 at 6:02

add a comment |

DataFrame Structure:

df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde

Your name Var..

$ name

['Jake', 'Matt', 'Henry']

Your Final desired Dataset:

Parameter n can be used to limit the number of splits in the output.

df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))



   print(df)

           A

    0   Jake

    1   Matt

    2   Matt

    3  Henry

    4   Jake

It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :

>>> df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde





>>> df['A'].str.split(n=1, expand=True)[0]

0     Jake

1     Matt

2     Matt

3    Henry

4     Jake

Name: 0, dtype: object

OR In case you want inplace replacement for column A ..

df['A'] = df['A'].str.split(n=1, expand=True)[0]

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 5:44

pygo

3,0551619

DataFrame Structure:

df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde

Your name Var..

$ name

['Jake', 'Matt', 'Henry']

Your Final desired Dataset:

Parameter n can be used to limit the number of splits in the output.

df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))



   print(df)

           A

    0   Jake

    1   Matt

    2   Matt

    3  Henry

    4   Jake

It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :

>>> df

             A

0    Jake Hill

1    Matt Dawn

2    Matt King

3  Henry White

4    Jake Hyde





>>> df['A'].str.split(n=1, expand=True)[0]

0     Jake

1     Matt

2     Matt

3    Henry

4     Jake

Name: 0, dtype: object

OR In case you want inplace replacement for column A ..

df['A'] = df['A'].str.split(n=1, expand=True)[0]

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 5:44

pygo

3,0551619

edited Nov 20 '18 at 6:55

answered Nov 20 '18 at 5:44

pygo

3,0551619

answered Nov 20 '18 at 5:44

pygo

3,0551619

answered Nov 20 '18 at 5:44

pygo

3,0551619

your input df is different from the user input. In this problem first name is customised.

– Mohamed Thasin ah
Nov 20 '18 at 5:59

@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.

– pygo
Nov 20 '18 at 6:00

In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.

– Mohamed Thasin ah
Nov 20 '18 at 6:02

add a comment |

your input df is different from the user input. In this problem first name is customised.

– Mohamed Thasin ah
Nov 20 '18 at 5:59

@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.

– pygo
Nov 20 '18 at 6:00

In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.

– Mohamed Thasin ah
Nov 20 '18 at 6:02

your input df is different from the user input. In this problem first name is customised.

– Mohamed Thasin ah
Nov 20 '18 at 5:59

@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.

– pygo
Nov 20 '18 at 6:00

In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.

– Mohamed Thasin ah
Nov 20 '18 at 6:02

add a comment |

# split the name strings into columns as new dataframe

df1 = df.A.str.split(' ', expand=True)

# Keep the first names in the new dataframe and fill the rest with

# empty strings, then sum the df1 column string values to make a new array

names_result = np.where(df1.isin(name), df1, '').sum(axis=1)

# find the array indexes where no first names were found

no_match_idx = np.where(names_result == '')[0]

# fill the no first name index locations with original dataframe values

names_result[no_match_idx] = df.A.values[no_match_idx]

# make a dataframe using the results

df_out = pd.DataFrame(names_result, columns=['A'])



# to find names with a first and last name that are both found in the

# first names list:

# df_out['dups'] = df1.isin(name).sum(axis=1) > 1

edited Nov 21 '18 at 2:38

answered Nov 21 '18 at 2:00

b2002

546148

add a comment |

# split the name strings into columns as new dataframe

df1 = df.A.str.split(' ', expand=True)

# Keep the first names in the new dataframe and fill the rest with

# empty strings, then sum the df1 column string values to make a new array

names_result = np.where(df1.isin(name), df1, '').sum(axis=1)

# find the array indexes where no first names were found

no_match_idx = np.where(names_result == '')[0]

# fill the no first name index locations with original dataframe values

names_result[no_match_idx] = df.A.values[no_match_idx]

# make a dataframe using the results

df_out = pd.DataFrame(names_result, columns=['A'])



# to find names with a first and last name that are both found in the

# first names list:

# df_out['dups'] = df1.isin(name).sum(axis=1) > 1

edited Nov 21 '18 at 2:38

answered Nov 21 '18 at 2:00

b2002

546148

add a comment |

# split the name strings into columns as new dataframe

df1 = df.A.str.split(' ', expand=True)

# Keep the first names in the new dataframe and fill the rest with

# empty strings, then sum the df1 column string values to make a new array

names_result = np.where(df1.isin(name), df1, '').sum(axis=1)

# find the array indexes where no first names were found

no_match_idx = np.where(names_result == '')[0]

# fill the no first name index locations with original dataframe values

names_result[no_match_idx] = df.A.values[no_match_idx]

# make a dataframe using the results

df_out = pd.DataFrame(names_result, columns=['A'])



# to find names with a first and last name that are both found in the

# first names list:

# df_out['dups'] = df1.isin(name).sum(axis=1) > 1

edited Nov 21 '18 at 2:38

answered Nov 21 '18 at 2:00

b2002

546148

# split the name strings into columns as new dataframe

df1 = df.A.str.split(' ', expand=True)

# Keep the first names in the new dataframe and fill the rest with

# empty strings, then sum the df1 column string values to make a new array

names_result = np.where(df1.isin(name), df1, '').sum(axis=1)

# find the array indexes where no first names were found

no_match_idx = np.where(names_result == '')[0]

# fill the no first name index locations with original dataframe values

names_result[no_match_idx] = df.A.values[no_match_idx]

# make a dataframe using the results

df_out = pd.DataFrame(names_result, columns=['A'])



# to find names with a first and last name that are both found in the

# first names list:

# df_out['dups'] = df1.isin(name).sum(axis=1) > 1

edited Nov 21 '18 at 2:38

answered Nov 21 '18 at 2:00

b2002

546148

edited Nov 21 '18 at 2:38

answered Nov 21 '18 at 2:00

b2002

546148

answered Nov 21 '18 at 2:00

b2002

546148

answered Nov 21 '18 at 2:00

b2002

546148

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu