How to assign count of unique values to the records in a data frame in python
I have a data frame like this:
IP_address
IP1
IP1
IP1
IP4
IP4
IP4
IP4
IP4
IP7
IP7
IP7
I would like to take count of unique values in this column and add the count as a variable by itself. At the end, it should look like this:
IP_address IP_address_Count
IP1 3
IP1 3
IP1 3
IP4 5
IP4 5
IP4 5
IP4 5
IP4 5
IP7 3
IP7 3
IP7 3
I am able to take the unique values of the column using the below code:
unique_ip_address_count = (df_c_train.drop_duplicates().IP_address.value_counts()).to_dict()
However, I am not sure how to match these in a loop in python so that i can get the desired results in python. Any sort of help is much appreciated.
I am not able to find a equivalent answer in stackoverflow. If there is anything please direct me there. Thank you.
python pandas
add a comment |
I have a data frame like this:
IP_address
IP1
IP1
IP1
IP4
IP4
IP4
IP4
IP4
IP7
IP7
IP7
I would like to take count of unique values in this column and add the count as a variable by itself. At the end, it should look like this:
IP_address IP_address_Count
IP1 3
IP1 3
IP1 3
IP4 5
IP4 5
IP4 5
IP4 5
IP4 5
IP7 3
IP7 3
IP7 3
I am able to take the unique values of the column using the below code:
unique_ip_address_count = (df_c_train.drop_duplicates().IP_address.value_counts()).to_dict()
However, I am not sure how to match these in a loop in python so that i can get the desired results in python. Any sort of help is much appreciated.
I am not able to find a equivalent answer in stackoverflow. If there is anything please direct me there. Thank you.
python pandas
add a comment |
I have a data frame like this:
IP_address
IP1
IP1
IP1
IP4
IP4
IP4
IP4
IP4
IP7
IP7
IP7
I would like to take count of unique values in this column and add the count as a variable by itself. At the end, it should look like this:
IP_address IP_address_Count
IP1 3
IP1 3
IP1 3
IP4 5
IP4 5
IP4 5
IP4 5
IP4 5
IP7 3
IP7 3
IP7 3
I am able to take the unique values of the column using the below code:
unique_ip_address_count = (df_c_train.drop_duplicates().IP_address.value_counts()).to_dict()
However, I am not sure how to match these in a loop in python so that i can get the desired results in python. Any sort of help is much appreciated.
I am not able to find a equivalent answer in stackoverflow. If there is anything please direct me there. Thank you.
python pandas
I have a data frame like this:
IP_address
IP1
IP1
IP1
IP4
IP4
IP4
IP4
IP4
IP7
IP7
IP7
I would like to take count of unique values in this column and add the count as a variable by itself. At the end, it should look like this:
IP_address IP_address_Count
IP1 3
IP1 3
IP1 3
IP4 5
IP4 5
IP4 5
IP4 5
IP4 5
IP7 3
IP7 3
IP7 3
I am able to take the unique values of the column using the below code:
unique_ip_address_count = (df_c_train.drop_duplicates().IP_address.value_counts()).to_dict()
However, I am not sure how to match these in a loop in python so that i can get the desired results in python. Any sort of help is much appreciated.
I am not able to find a equivalent answer in stackoverflow. If there is anything please direct me there. Thank you.
python pandas
python pandas
asked Sep 20 '17 at 20:25
Doubt DhanabaluDoubt Dhanabalu
1721313
1721313
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
You can use value_counts() with map
df['count'] = df['IP_address'].map(df['IP_address'].value_counts())
IP_address count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
1
I like your solution more, compared to mine... :)
– MaxU
Sep 20 '17 at 20:32
@Vaishali - Thanks a lot. This has worked.
– Doubt Dhanabalu
Sep 20 '17 at 20:33
@Vaishali - I have one question. The resultant value is a float. Should i make something here to convert to integer or should i take that as a separate code?
– Doubt Dhanabalu
Sep 20 '17 at 20:39
Shouldn't be. When I try df.dtypes, I get IP_address object, count int64
– Vaishali
Sep 20 '17 at 20:40
oh, ok, I got to be float64.
– Doubt Dhanabalu
Sep 20 '17 at 20:42
|
show 3 more comments
Using pd.factorize
This should be a very fast solution that scales well for large data
f, u = pd.factorize(df.IP_address.values)
df.assign(IP_address_Count=np.bincount(f)[f])
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
1
Yes, it is quick . .. currently , I am using this method for count unique ;-)
– Wen-Ben
Sep 20 '17 at 20:50
add a comment |
NumPy way -
tags, C = np.unique(df.IP_address, return_counts=1, return_inverse=1)[1:]
df['IP_address_Count'] = C[tags]
Sample output -
In [275]: df
Out[275]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
add a comment |
In [75]: df['IP_address_Count'] = df.groupby('IP_address')['IP_address'].transform('size')
In [76]: df
Out[76]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
thank you max for taking time and answering.
– Doubt Dhanabalu
Sep 20 '17 at 20:34
add a comment |
ip_set = df.IP_address.unique()
dict_temp = {}
for ip in ip_set:
dict_temp[ip] = df[df.IP_address == ip].IP_address.value_counts()[0]
df['counts'] = [dict_temp[ip] for ip in df.IP_address]
This seems to give me the sort of output that you desire
EDIT: Vaishali's use of map is perfect
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f46331210%2fhow-to-assign-count-of-unique-values-to-the-records-in-a-data-frame-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use value_counts() with map
df['count'] = df['IP_address'].map(df['IP_address'].value_counts())
IP_address count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
1
I like your solution more, compared to mine... :)
– MaxU
Sep 20 '17 at 20:32
@Vaishali - Thanks a lot. This has worked.
– Doubt Dhanabalu
Sep 20 '17 at 20:33
@Vaishali - I have one question. The resultant value is a float. Should i make something here to convert to integer or should i take that as a separate code?
– Doubt Dhanabalu
Sep 20 '17 at 20:39
Shouldn't be. When I try df.dtypes, I get IP_address object, count int64
– Vaishali
Sep 20 '17 at 20:40
oh, ok, I got to be float64.
– Doubt Dhanabalu
Sep 20 '17 at 20:42
|
show 3 more comments
You can use value_counts() with map
df['count'] = df['IP_address'].map(df['IP_address'].value_counts())
IP_address count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
1
I like your solution more, compared to mine... :)
– MaxU
Sep 20 '17 at 20:32
@Vaishali - Thanks a lot. This has worked.
– Doubt Dhanabalu
Sep 20 '17 at 20:33
@Vaishali - I have one question. The resultant value is a float. Should i make something here to convert to integer or should i take that as a separate code?
– Doubt Dhanabalu
Sep 20 '17 at 20:39
Shouldn't be. When I try df.dtypes, I get IP_address object, count int64
– Vaishali
Sep 20 '17 at 20:40
oh, ok, I got to be float64.
– Doubt Dhanabalu
Sep 20 '17 at 20:42
|
show 3 more comments
You can use value_counts() with map
df['count'] = df['IP_address'].map(df['IP_address'].value_counts())
IP_address count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
You can use value_counts() with map
df['count'] = df['IP_address'].map(df['IP_address'].value_counts())
IP_address count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
edited Nov 21 '18 at 17:55
answered Sep 20 '17 at 20:29


VaishaliVaishali
20.8k41132
20.8k41132
1
I like your solution more, compared to mine... :)
– MaxU
Sep 20 '17 at 20:32
@Vaishali - Thanks a lot. This has worked.
– Doubt Dhanabalu
Sep 20 '17 at 20:33
@Vaishali - I have one question. The resultant value is a float. Should i make something here to convert to integer or should i take that as a separate code?
– Doubt Dhanabalu
Sep 20 '17 at 20:39
Shouldn't be. When I try df.dtypes, I get IP_address object, count int64
– Vaishali
Sep 20 '17 at 20:40
oh, ok, I got to be float64.
– Doubt Dhanabalu
Sep 20 '17 at 20:42
|
show 3 more comments
1
I like your solution more, compared to mine... :)
– MaxU
Sep 20 '17 at 20:32
@Vaishali - Thanks a lot. This has worked.
– Doubt Dhanabalu
Sep 20 '17 at 20:33
@Vaishali - I have one question. The resultant value is a float. Should i make something here to convert to integer or should i take that as a separate code?
– Doubt Dhanabalu
Sep 20 '17 at 20:39
Shouldn't be. When I try df.dtypes, I get IP_address object, count int64
– Vaishali
Sep 20 '17 at 20:40
oh, ok, I got to be float64.
– Doubt Dhanabalu
Sep 20 '17 at 20:42
1
1
I like your solution more, compared to mine... :)
– MaxU
Sep 20 '17 at 20:32
I like your solution more, compared to mine... :)
– MaxU
Sep 20 '17 at 20:32
@Vaishali - Thanks a lot. This has worked.
– Doubt Dhanabalu
Sep 20 '17 at 20:33
@Vaishali - Thanks a lot. This has worked.
– Doubt Dhanabalu
Sep 20 '17 at 20:33
@Vaishali - I have one question. The resultant value is a float. Should i make something here to convert to integer or should i take that as a separate code?
– Doubt Dhanabalu
Sep 20 '17 at 20:39
@Vaishali - I have one question. The resultant value is a float. Should i make something here to convert to integer or should i take that as a separate code?
– Doubt Dhanabalu
Sep 20 '17 at 20:39
Shouldn't be. When I try df.dtypes, I get IP_address object, count int64
– Vaishali
Sep 20 '17 at 20:40
Shouldn't be. When I try df.dtypes, I get IP_address object, count int64
– Vaishali
Sep 20 '17 at 20:40
oh, ok, I got to be float64.
– Doubt Dhanabalu
Sep 20 '17 at 20:42
oh, ok, I got to be float64.
– Doubt Dhanabalu
Sep 20 '17 at 20:42
|
show 3 more comments
Using pd.factorize
This should be a very fast solution that scales well for large data
f, u = pd.factorize(df.IP_address.values)
df.assign(IP_address_Count=np.bincount(f)[f])
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
1
Yes, it is quick . .. currently , I am using this method for count unique ;-)
– Wen-Ben
Sep 20 '17 at 20:50
add a comment |
Using pd.factorize
This should be a very fast solution that scales well for large data
f, u = pd.factorize(df.IP_address.values)
df.assign(IP_address_Count=np.bincount(f)[f])
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
1
Yes, it is quick . .. currently , I am using this method for count unique ;-)
– Wen-Ben
Sep 20 '17 at 20:50
add a comment |
Using pd.factorize
This should be a very fast solution that scales well for large data
f, u = pd.factorize(df.IP_address.values)
df.assign(IP_address_Count=np.bincount(f)[f])
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
Using pd.factorize
This should be a very fast solution that scales well for large data
f, u = pd.factorize(df.IP_address.values)
df.assign(IP_address_Count=np.bincount(f)[f])
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
answered Sep 20 '17 at 20:48


piRSquaredpiRSquared
156k22150294
156k22150294
1
Yes, it is quick . .. currently , I am using this method for count unique ;-)
– Wen-Ben
Sep 20 '17 at 20:50
add a comment |
1
Yes, it is quick . .. currently , I am using this method for count unique ;-)
– Wen-Ben
Sep 20 '17 at 20:50
1
1
Yes, it is quick . .. currently , I am using this method for count unique ;-)
– Wen-Ben
Sep 20 '17 at 20:50
Yes, it is quick . .. currently , I am using this method for count unique ;-)
– Wen-Ben
Sep 20 '17 at 20:50
add a comment |
NumPy way -
tags, C = np.unique(df.IP_address, return_counts=1, return_inverse=1)[1:]
df['IP_address_Count'] = C[tags]
Sample output -
In [275]: df
Out[275]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
add a comment |
NumPy way -
tags, C = np.unique(df.IP_address, return_counts=1, return_inverse=1)[1:]
df['IP_address_Count'] = C[tags]
Sample output -
In [275]: df
Out[275]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
add a comment |
NumPy way -
tags, C = np.unique(df.IP_address, return_counts=1, return_inverse=1)[1:]
df['IP_address_Count'] = C[tags]
Sample output -
In [275]: df
Out[275]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
NumPy way -
tags, C = np.unique(df.IP_address, return_counts=1, return_inverse=1)[1:]
df['IP_address_Count'] = C[tags]
Sample output -
In [275]: df
Out[275]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
answered Sep 20 '17 at 20:28


DivakarDivakar
156k1487178
156k1487178
add a comment |
add a comment |
In [75]: df['IP_address_Count'] = df.groupby('IP_address')['IP_address'].transform('size')
In [76]: df
Out[76]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
thank you max for taking time and answering.
– Doubt Dhanabalu
Sep 20 '17 at 20:34
add a comment |
In [75]: df['IP_address_Count'] = df.groupby('IP_address')['IP_address'].transform('size')
In [76]: df
Out[76]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
thank you max for taking time and answering.
– Doubt Dhanabalu
Sep 20 '17 at 20:34
add a comment |
In [75]: df['IP_address_Count'] = df.groupby('IP_address')['IP_address'].transform('size')
In [76]: df
Out[76]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
In [75]: df['IP_address_Count'] = df.groupby('IP_address')['IP_address'].transform('size')
In [76]: df
Out[76]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
answered Sep 20 '17 at 20:28


MaxUMaxU
122k12121173
122k12121173
thank you max for taking time and answering.
– Doubt Dhanabalu
Sep 20 '17 at 20:34
add a comment |
thank you max for taking time and answering.
– Doubt Dhanabalu
Sep 20 '17 at 20:34
thank you max for taking time and answering.
– Doubt Dhanabalu
Sep 20 '17 at 20:34
thank you max for taking time and answering.
– Doubt Dhanabalu
Sep 20 '17 at 20:34
add a comment |
ip_set = df.IP_address.unique()
dict_temp = {}
for ip in ip_set:
dict_temp[ip] = df[df.IP_address == ip].IP_address.value_counts()[0]
df['counts'] = [dict_temp[ip] for ip in df.IP_address]
This seems to give me the sort of output that you desire
EDIT: Vaishali's use of map is perfect
add a comment |
ip_set = df.IP_address.unique()
dict_temp = {}
for ip in ip_set:
dict_temp[ip] = df[df.IP_address == ip].IP_address.value_counts()[0]
df['counts'] = [dict_temp[ip] for ip in df.IP_address]
This seems to give me the sort of output that you desire
EDIT: Vaishali's use of map is perfect
add a comment |
ip_set = df.IP_address.unique()
dict_temp = {}
for ip in ip_set:
dict_temp[ip] = df[df.IP_address == ip].IP_address.value_counts()[0]
df['counts'] = [dict_temp[ip] for ip in df.IP_address]
This seems to give me the sort of output that you desire
EDIT: Vaishali's use of map is perfect
ip_set = df.IP_address.unique()
dict_temp = {}
for ip in ip_set:
dict_temp[ip] = df[df.IP_address == ip].IP_address.value_counts()[0]
df['counts'] = [dict_temp[ip] for ip in df.IP_address]
This seems to give me the sort of output that you desire
EDIT: Vaishali's use of map is perfect
answered Sep 20 '17 at 20:41
NRKNRK
363
363
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f46331210%2fhow-to-assign-count-of-unique-values-to-the-records-in-a-data-frame-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown