Pandas GroupBy and SQL Where Clause parameters












0















I want to get a SQL query string output that takes in multiple parameters in the WHERE Clause from a Pandas DataFrame column using groupby. What is the best way to do this?



import pandas as pd

df = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address': ['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})


From the above example, I need to get 3 different SQL queries that go as follows:



SELECT * FROM TABLE WHERE [Contract No] IN ('2851', '2852')
SELECT * FROM TABLE WHERE [Contract No] IN ('2853')
SELECT * FROM TABLE WHERE [Contract No] IN ('2854', '2855')









share|improve this question


















  • 1





    What was wrong with the approaches you found from your research? Why do you think they're not best practice?

    – roganjosh
    Jan 2 at 22:28
















0















I want to get a SQL query string output that takes in multiple parameters in the WHERE Clause from a Pandas DataFrame column using groupby. What is the best way to do this?



import pandas as pd

df = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address': ['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})


From the above example, I need to get 3 different SQL queries that go as follows:



SELECT * FROM TABLE WHERE [Contract No] IN ('2851', '2852')
SELECT * FROM TABLE WHERE [Contract No] IN ('2853')
SELECT * FROM TABLE WHERE [Contract No] IN ('2854', '2855')









share|improve this question


















  • 1





    What was wrong with the approaches you found from your research? Why do you think they're not best practice?

    – roganjosh
    Jan 2 at 22:28














0












0








0








I want to get a SQL query string output that takes in multiple parameters in the WHERE Clause from a Pandas DataFrame column using groupby. What is the best way to do this?



import pandas as pd

df = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address': ['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})


From the above example, I need to get 3 different SQL queries that go as follows:



SELECT * FROM TABLE WHERE [Contract No] IN ('2851', '2852')
SELECT * FROM TABLE WHERE [Contract No] IN ('2853')
SELECT * FROM TABLE WHERE [Contract No] IN ('2854', '2855')









share|improve this question














I want to get a SQL query string output that takes in multiple parameters in the WHERE Clause from a Pandas DataFrame column using groupby. What is the best way to do this?



import pandas as pd

df = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address': ['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})


From the above example, I need to get 3 different SQL queries that go as follows:



SELECT * FROM TABLE WHERE [Contract No] IN ('2851', '2852')
SELECT * FROM TABLE WHERE [Contract No] IN ('2853')
SELECT * FROM TABLE WHERE [Contract No] IN ('2854', '2855')






python pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 2 at 22:25









ShantanuShantanu

10310




10310








  • 1





    What was wrong with the approaches you found from your research? Why do you think they're not best practice?

    – roganjosh
    Jan 2 at 22:28














  • 1





    What was wrong with the approaches you found from your research? Why do you think they're not best practice?

    – roganjosh
    Jan 2 at 22:28








1




1





What was wrong with the approaches you found from your research? Why do you think they're not best practice?

– roganjosh
Jan 2 at 22:28





What was wrong with the approaches you found from your research? Why do you think they're not best practice?

– roganjosh
Jan 2 at 22:28












2 Answers
2






active

oldest

votes


















1














Let's use parametrized sql to give hackers one less entryway into our databases:



sqls = 
args =
for key, grp in df.groupby(['Contact Name', 'Email Address']):
arg = tuple(grp['Contract No'])
sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
sqls.append(sql)
args.append(arg)

for sql, arg in zip(sqls, args):
print(sql, arg)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')


To execute the parametrized sql, use the 2-argument form of cursor.execute:



for sql, arg in zip(sqls, args):
cursor.execute(sql, arg)





share|improve this answer


























  • Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?

    – Shantanu
    Jan 2 at 23:02











  • The print(sql, arg) is there only to show you the values in sqls and args. In practice, to avoid SQL Injection, you would call cursor.execute(sql, arg) exactly the way they have been defined -- with sql containing those %s placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.

    – unutbu
    Jan 2 at 23:19











  • Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use %s placemarkers, but sqlite3 uses ? placemarkers.

    – unutbu
    Jan 2 at 23:21











  • Thank you. I replaced %s with ? to work for pyodbc and mssql.

    – Shantanu
    Jan 3 at 2:02



















0














Figured out the solution. I just needed to use a lambda function alongwith groupby.



import pandas as pd

df1 = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})

df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()

for index, row in df2.iterrows():
print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')





share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54013971%2fpandas-groupby-and-sql-where-clause-parameters%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Let's use parametrized sql to give hackers one less entryway into our databases:



    sqls = 
    args =
    for key, grp in df.groupby(['Contact Name', 'Email Address']):
    arg = tuple(grp['Contract No'])
    sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
    sqls.append(sql)
    args.append(arg)

    for sql, arg in zip(sqls, args):
    print(sql, arg)
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')


    To execute the parametrized sql, use the 2-argument form of cursor.execute:



    for sql, arg in zip(sqls, args):
    cursor.execute(sql, arg)





    share|improve this answer


























    • Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?

      – Shantanu
      Jan 2 at 23:02











    • The print(sql, arg) is there only to show you the values in sqls and args. In practice, to avoid SQL Injection, you would call cursor.execute(sql, arg) exactly the way they have been defined -- with sql containing those %s placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.

      – unutbu
      Jan 2 at 23:19











    • Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use %s placemarkers, but sqlite3 uses ? placemarkers.

      – unutbu
      Jan 2 at 23:21











    • Thank you. I replaced %s with ? to work for pyodbc and mssql.

      – Shantanu
      Jan 3 at 2:02
















    1














    Let's use parametrized sql to give hackers one less entryway into our databases:



    sqls = 
    args =
    for key, grp in df.groupby(['Contact Name', 'Email Address']):
    arg = tuple(grp['Contract No'])
    sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
    sqls.append(sql)
    args.append(arg)

    for sql, arg in zip(sqls, args):
    print(sql, arg)
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')


    To execute the parametrized sql, use the 2-argument form of cursor.execute:



    for sql, arg in zip(sqls, args):
    cursor.execute(sql, arg)





    share|improve this answer


























    • Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?

      – Shantanu
      Jan 2 at 23:02











    • The print(sql, arg) is there only to show you the values in sqls and args. In practice, to avoid SQL Injection, you would call cursor.execute(sql, arg) exactly the way they have been defined -- with sql containing those %s placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.

      – unutbu
      Jan 2 at 23:19











    • Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use %s placemarkers, but sqlite3 uses ? placemarkers.

      – unutbu
      Jan 2 at 23:21











    • Thank you. I replaced %s with ? to work for pyodbc and mssql.

      – Shantanu
      Jan 3 at 2:02














    1












    1








    1







    Let's use parametrized sql to give hackers one less entryway into our databases:



    sqls = 
    args =
    for key, grp in df.groupby(['Contact Name', 'Email Address']):
    arg = tuple(grp['Contract No'])
    sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
    sqls.append(sql)
    args.append(arg)

    for sql, arg in zip(sqls, args):
    print(sql, arg)
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')


    To execute the parametrized sql, use the 2-argument form of cursor.execute:



    for sql, arg in zip(sqls, args):
    cursor.execute(sql, arg)





    share|improve this answer















    Let's use parametrized sql to give hackers one less entryway into our databases:



    sqls = 
    args =
    for key, grp in df.groupby(['Contact Name', 'Email Address']):
    arg = tuple(grp['Contract No'])
    sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
    sqls.append(sql)
    args.append(arg)

    for sql, arg in zip(sqls, args):
    print(sql, arg)
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
    # SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')


    To execute the parametrized sql, use the 2-argument form of cursor.execute:



    for sql, arg in zip(sqls, args):
    cursor.execute(sql, arg)






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 3 at 2:14

























    answered Jan 2 at 22:45









    unutbuunutbu

    561k10512111264




    561k10512111264













    • Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?

      – Shantanu
      Jan 2 at 23:02











    • The print(sql, arg) is there only to show you the values in sqls and args. In practice, to avoid SQL Injection, you would call cursor.execute(sql, arg) exactly the way they have been defined -- with sql containing those %s placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.

      – unutbu
      Jan 2 at 23:19











    • Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use %s placemarkers, but sqlite3 uses ? placemarkers.

      – unutbu
      Jan 2 at 23:21











    • Thank you. I replaced %s with ? to work for pyodbc and mssql.

      – Shantanu
      Jan 3 at 2:02



















    • Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?

      – Shantanu
      Jan 2 at 23:02











    • The print(sql, arg) is there only to show you the values in sqls and args. In practice, to avoid SQL Injection, you would call cursor.execute(sql, arg) exactly the way they have been defined -- with sql containing those %s placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.

      – unutbu
      Jan 2 at 23:19











    • Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use %s placemarkers, but sqlite3 uses ? placemarkers.

      – unutbu
      Jan 2 at 23:21











    • Thank you. I replaced %s with ? to work for pyodbc and mssql.

      – Shantanu
      Jan 3 at 2:02

















    Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?

    – Shantanu
    Jan 2 at 23:02





    Thank you @unutbu, I was looking for a solution that was foolproof against injection. Also, how do I eliminate the trailing comma and %s appearing in the print statements?

    – Shantanu
    Jan 2 at 23:02













    The print(sql, arg) is there only to show you the values in sqls and args. In practice, to avoid SQL Injection, you would call cursor.execute(sql, arg) exactly the way they have been defined -- with sql containing those %s placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.

    – unutbu
    Jan 2 at 23:19





    The print(sql, arg) is there only to show you the values in sqls and args. In practice, to avoid SQL Injection, you would call cursor.execute(sql, arg) exactly the way they have been defined -- with sql containing those %s placemarkers. The database adapter will insert the arguments into the sql with proper escaping for you.

    – unutbu
    Jan 2 at 23:19













    Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use %s placemarkers, but sqlite3 uses ? placemarkers.

    – unutbu
    Jan 2 at 23:21





    Note that the proper placemarker depends on which database adapter you are using. For example, MySQLdb and psycopg use %s placemarkers, but sqlite3 uses ? placemarkers.

    – unutbu
    Jan 2 at 23:21













    Thank you. I replaced %s with ? to work for pyodbc and mssql.

    – Shantanu
    Jan 3 at 2:02





    Thank you. I replaced %s with ? to work for pyodbc and mssql.

    – Shantanu
    Jan 3 at 2:02













    0














    Figured out the solution. I just needed to use a lambda function alongwith groupby.



    import pandas as pd

    df1 = pd.DataFrame({
    'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
    'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
    'Contract No':['2851','2852','2853','2854','2855'],
    })

    df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()

    for index, row in df2.iterrows():
    print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')





    share|improve this answer




























      0














      Figured out the solution. I just needed to use a lambda function alongwith groupby.



      import pandas as pd

      df1 = pd.DataFrame({
      'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
      'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
      'Contract No':['2851','2852','2853','2854','2855'],
      })

      df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()

      for index, row in df2.iterrows():
      print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')





      share|improve this answer


























        0












        0








        0







        Figured out the solution. I just needed to use a lambda function alongwith groupby.



        import pandas as pd

        df1 = pd.DataFrame({
        'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
        'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
        'Contract No':['2851','2852','2853','2854','2855'],
        })

        df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()

        for index, row in df2.iterrows():
        print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')





        share|improve this answer













        Figured out the solution. I just needed to use a lambda function alongwith groupby.



        import pandas as pd

        df1 = pd.DataFrame({
        'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
        'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
        'Contract No':['2851','2852','2853','2854','2855'],
        })

        df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join(''' + x + ''')).reset_index()

        for index, row in df2.iterrows():
        print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 2 at 22:51









        ShantanuShantanu

        10310




        10310






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54013971%2fpandas-groupby-and-sql-where-clause-parameters%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            Npm cannot find a required file even through it is in the searched directory

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith