Pandas GroupBy and SQL Where Clause parameters
I want to get a SQL query string output that takes in multiple parameters in the WHERE Clause from a Pandas DataFrame column using groupby. What is the best way to do this?
import pandas as pd
df = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address': ['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
From the above example, I need to get 3 different SQL queries that go as follows:
SELECT * FROM TABLE WHERE [Contract No] IN ('2851', '2852')
SELECT * FROM TABLE WHERE [Contract No] IN ('2853')
SELECT * FROM TABLE WHERE [Contract No] IN ('2854', '2855')
python pandas
add a comment |
I want to get a SQL query string output that takes in multiple parameters in the WHERE Clause from a Pandas DataFrame column using groupby. What is the best way to do this?
import pandas as pd
df = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address': ['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
From the above example, I need to get 3 different SQL queries that go as follows:
SELECT * FROM TABLE WHERE [Contract No] IN ('2851', '2852')
SELECT * FROM TABLE WHERE [Contract No] IN ('2853')
SELECT * FROM TABLE WHERE [Contract No] IN ('2854', '2855')
python pandas
1
What was wrong with the approaches you found from your research? Why do you think they're not best practice?
– roganjosh
Jan 2 at 22:28
add a comment |
I want to get a SQL query string output that takes in multiple parameters in the WHERE Clause from a Pandas DataFrame column using groupby. What is the best way to do this?
import pandas as pd
df = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address': ['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
From the above example, I need to get 3 different SQL queries that go as follows:
SELECT * FROM TABLE WHERE [Contract No] IN ('2851', '2852')
SELECT * FROM TABLE WHERE [Contract No] IN ('2853')
SELECT * FROM TABLE WHERE [Contract No] IN ('2854', '2855')
python pandas
I want to get a SQL query string output that takes in multiple parameters in the WHERE Clause from a Pandas DataFrame column using groupby. What is the best way to do this?
import pandas as pd
df = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address': ['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
From the above example, I need to get 3 different SQL queries that go as follows:
SELECT * FROM TABLE WHERE [Contract No] IN ('2851', '2852')
SELECT * FROM TABLE WHERE [Contract No] IN ('2853')
SELECT * FROM TABLE WHERE [Contract No] IN ('2854', '2855')
python pandas
python pandas
asked Jan 2 at 22:25
ShantanuShantanu
10310
10310
1
What was wrong with the approaches you found from your research? Why do you think they're not best practice?
– roganjosh
Jan 2 at 22:28
add a comment |
1
What was wrong with the approaches you found from your research? Why do you think they're not best practice?
– roganjosh
Jan 2 at 22:28
1
1
What was wrong with the approaches you found from your research? Why do you think they're not best practice?
– roganjosh
Jan 2 at 22:28
What was wrong with the approaches you found from your research? Why do you think they're not best practice?
– roganjosh
Jan 2 at 22:28
add a comment |
2 Answers
2
active
oldest
votes
Let's use parametrized sql to give hackers one less entryway into our databases:
sqls =
args =
for key, grp in df.groupby(['Contact Name', 'Email Address']):
arg = tuple(grp['Contract No'])
sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
sqls.append(sql)
args.append(arg)
for sql, arg in zip(sqls, args):
print(sql, arg)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')
To execute the parametrized sql, use the 2-argument form of cursor.execute
:
for sql, arg in zip(sqls, args):
cursor.execute(sql, arg)
Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?
– Shantanu
Jan 2 at 23:02
Theprint(sql, arg)
is there only to show you the values insqls
andargs
. In practice, to avoid SQL Injection, you would callcursor.execute(sql, arg)
exactly the way they have been defined -- withsql
containing those%s
placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.
– unutbu
Jan 2 at 23:19
Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use%s
placemarkers, butsqlite3
uses?
placemarkers.
– unutbu
Jan 2 at 23:21
Thank you. I replaced%s
with?
to work forpyodbc
andmssql
.
– Shantanu
Jan 3 at 2:02
add a comment |
Figured out the solution. I just needed to use a lambda function alongwith groupby.
import pandas as pd
df1 = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()
for index, row in df2.iterrows():
print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54013971%2fpandas-groupby-and-sql-where-clause-parameters%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Let's use parametrized sql to give hackers one less entryway into our databases:
sqls =
args =
for key, grp in df.groupby(['Contact Name', 'Email Address']):
arg = tuple(grp['Contract No'])
sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
sqls.append(sql)
args.append(arg)
for sql, arg in zip(sqls, args):
print(sql, arg)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')
To execute the parametrized sql, use the 2-argument form of cursor.execute
:
for sql, arg in zip(sqls, args):
cursor.execute(sql, arg)
Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?
– Shantanu
Jan 2 at 23:02
Theprint(sql, arg)
is there only to show you the values insqls
andargs
. In practice, to avoid SQL Injection, you would callcursor.execute(sql, arg)
exactly the way they have been defined -- withsql
containing those%s
placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.
– unutbu
Jan 2 at 23:19
Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use%s
placemarkers, butsqlite3
uses?
placemarkers.
– unutbu
Jan 2 at 23:21
Thank you. I replaced%s
with?
to work forpyodbc
andmssql
.
– Shantanu
Jan 3 at 2:02
add a comment |
Let's use parametrized sql to give hackers one less entryway into our databases:
sqls =
args =
for key, grp in df.groupby(['Contact Name', 'Email Address']):
arg = tuple(grp['Contract No'])
sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
sqls.append(sql)
args.append(arg)
for sql, arg in zip(sqls, args):
print(sql, arg)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')
To execute the parametrized sql, use the 2-argument form of cursor.execute
:
for sql, arg in zip(sqls, args):
cursor.execute(sql, arg)
Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?
– Shantanu
Jan 2 at 23:02
Theprint(sql, arg)
is there only to show you the values insqls
andargs
. In practice, to avoid SQL Injection, you would callcursor.execute(sql, arg)
exactly the way they have been defined -- withsql
containing those%s
placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.
– unutbu
Jan 2 at 23:19
Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use%s
placemarkers, butsqlite3
uses?
placemarkers.
– unutbu
Jan 2 at 23:21
Thank you. I replaced%s
with?
to work forpyodbc
andmssql
.
– Shantanu
Jan 3 at 2:02
add a comment |
Let's use parametrized sql to give hackers one less entryway into our databases:
sqls =
args =
for key, grp in df.groupby(['Contact Name', 'Email Address']):
arg = tuple(grp['Contract No'])
sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
sqls.append(sql)
args.append(arg)
for sql, arg in zip(sqls, args):
print(sql, arg)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')
To execute the parametrized sql, use the 2-argument form of cursor.execute
:
for sql, arg in zip(sqls, args):
cursor.execute(sql, arg)
Let's use parametrized sql to give hackers one less entryway into our databases:
sqls =
args =
for key, grp in df.groupby(['Contact Name', 'Email Address']):
arg = tuple(grp['Contract No'])
sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
sqls.append(sql)
args.append(arg)
for sql, arg in zip(sqls, args):
print(sql, arg)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')
To execute the parametrized sql, use the 2-argument form of cursor.execute
:
for sql, arg in zip(sqls, args):
cursor.execute(sql, arg)
edited Jan 3 at 2:14
answered Jan 2 at 22:45
unutbuunutbu
561k10512111264
561k10512111264
Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?
– Shantanu
Jan 2 at 23:02
Theprint(sql, arg)
is there only to show you the values insqls
andargs
. In practice, to avoid SQL Injection, you would callcursor.execute(sql, arg)
exactly the way they have been defined -- withsql
containing those%s
placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.
– unutbu
Jan 2 at 23:19
Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use%s
placemarkers, butsqlite3
uses?
placemarkers.
– unutbu
Jan 2 at 23:21
Thank you. I replaced%s
with?
to work forpyodbc
andmssql
.
– Shantanu
Jan 3 at 2:02
add a comment |
Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?
– Shantanu
Jan 2 at 23:02
Theprint(sql, arg)
is there only to show you the values insqls
andargs
. In practice, to avoid SQL Injection, you would callcursor.execute(sql, arg)
exactly the way they have been defined -- withsql
containing those%s
placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.
– unutbu
Jan 2 at 23:19
Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use%s
placemarkers, butsqlite3
uses?
placemarkers.
– unutbu
Jan 2 at 23:21
Thank you. I replaced%s
with?
to work forpyodbc
andmssql
.
– Shantanu
Jan 3 at 2:02
Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?
– Shantanu
Jan 2 at 23:02
Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?
– Shantanu
Jan 2 at 23:02
The
print(sql, arg)
is there only to show you the values in sqls
and args
. In practice, to avoid SQL Injection, you would call cursor.execute(sql, arg)
exactly the way they have been defined -- with sql
containing those %s
placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.– unutbu
Jan 2 at 23:19
The
print(sql, arg)
is there only to show you the values in sqls
and args
. In practice, to avoid SQL Injection, you would call cursor.execute(sql, arg)
exactly the way they have been defined -- with sql
containing those %s
placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.– unutbu
Jan 2 at 23:19
Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use
%s
placemarkers, but sqlite3
uses ?
placemarkers.– unutbu
Jan 2 at 23:21
Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use
%s
placemarkers, but sqlite3
uses ?
placemarkers.– unutbu
Jan 2 at 23:21
Thank you. I replaced
%s
with ?
to work for pyodbc
and mssql
.– Shantanu
Jan 3 at 2:02
Thank you. I replaced
%s
with ?
to work for pyodbc
and mssql
.– Shantanu
Jan 3 at 2:02
add a comment |
Figured out the solution. I just needed to use a lambda function alongwith groupby.
import pandas as pd
df1 = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()
for index, row in df2.iterrows():
print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')
add a comment |
Figured out the solution. I just needed to use a lambda function alongwith groupby.
import pandas as pd
df1 = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()
for index, row in df2.iterrows():
print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')
add a comment |
Figured out the solution. I just needed to use a lambda function alongwith groupby.
import pandas as pd
df1 = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()
for index, row in df2.iterrows():
print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')
Figured out the solution. I just needed to use a lambda function alongwith groupby.
import pandas as pd
df1 = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()
for index, row in df2.iterrows():
print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')
answered Jan 2 at 22:51
ShantanuShantanu
10310
10310
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54013971%2fpandas-groupby-and-sql-where-clause-parameters%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
What was wrong with the approaches you found from your research? Why do you think they're not best practice?
– roganjosh
Jan 2 at 22:28