crosstab in Pandas DataFrame












1














I created a DataFrame



    A1  A2  A3  A4
0 cccc xx 6 5
1 aaaa yy 8 0
2 aaaa xx 15 0
3 bbbb xx 21 4
4 bbbb xx 26 0
5 cccc yy 33 2
6 aaaa xx 44 1
7 cccc xx 48 2
8 aaaa yy 58 0
9 cccc yy 59 5
10 bbbb yy 77 0
11 bbbb yy 99 0


and now using crosstab() with the command given below I was created new DataFrame.



df5 = pd.crosstab(df4['A1'], df4['A2'], margins=False,values=df4['A3'] , 
dropna=False, aggfunc='mean').reset_index().fillna(0)


this works properl.
it gives me output as follows



A2   A1      xx      yy
0 aaaa 29.5 33.0
1 bbbb 23.5 88.0
2 cccc 27.0 46.0


Now I want to store the mean values into the DataFrame df4



How can I do it, since I want to change A3 which contain 0 in df5 based on the crosstab()? and I want output as follows



    A1      A2  A3  A4    
0 aaaa xx 15 29.5
1 aaaa xx 44 1.0
2 aaaa yy 8 33.0
3 aaaa yy 58 33.0
4 bbbb xx 21 4.0
5 bbbb xx 26 23.5
6 bbbb yy 77 88.0
7 bbbb yy 99 88.0
8 cccc xx 6 5.0
9 cccc xx 48 2.0









share|improve this question




















  • 3




    Can you create minimal, complete, and verifiable example?
    – jezrael
    Nov 19 '18 at 12:55










  • How do you go from the input to the output? Whats that new output calculating?
    – Franco Piccolo
    Nov 19 '18 at 14:07










  • in some rows A4 contain 0. I want to replace it with the mean value which I received from crosstab
    – Anuprita
    Nov 19 '18 at 15:19
















1














I created a DataFrame



    A1  A2  A3  A4
0 cccc xx 6 5
1 aaaa yy 8 0
2 aaaa xx 15 0
3 bbbb xx 21 4
4 bbbb xx 26 0
5 cccc yy 33 2
6 aaaa xx 44 1
7 cccc xx 48 2
8 aaaa yy 58 0
9 cccc yy 59 5
10 bbbb yy 77 0
11 bbbb yy 99 0


and now using crosstab() with the command given below I was created new DataFrame.



df5 = pd.crosstab(df4['A1'], df4['A2'], margins=False,values=df4['A3'] , 
dropna=False, aggfunc='mean').reset_index().fillna(0)


this works properl.
it gives me output as follows



A2   A1      xx      yy
0 aaaa 29.5 33.0
1 bbbb 23.5 88.0
2 cccc 27.0 46.0


Now I want to store the mean values into the DataFrame df4



How can I do it, since I want to change A3 which contain 0 in df5 based on the crosstab()? and I want output as follows



    A1      A2  A3  A4    
0 aaaa xx 15 29.5
1 aaaa xx 44 1.0
2 aaaa yy 8 33.0
3 aaaa yy 58 33.0
4 bbbb xx 21 4.0
5 bbbb xx 26 23.5
6 bbbb yy 77 88.0
7 bbbb yy 99 88.0
8 cccc xx 6 5.0
9 cccc xx 48 2.0









share|improve this question




















  • 3




    Can you create minimal, complete, and verifiable example?
    – jezrael
    Nov 19 '18 at 12:55










  • How do you go from the input to the output? Whats that new output calculating?
    – Franco Piccolo
    Nov 19 '18 at 14:07










  • in some rows A4 contain 0. I want to replace it with the mean value which I received from crosstab
    – Anuprita
    Nov 19 '18 at 15:19














1












1








1







I created a DataFrame



    A1  A2  A3  A4
0 cccc xx 6 5
1 aaaa yy 8 0
2 aaaa xx 15 0
3 bbbb xx 21 4
4 bbbb xx 26 0
5 cccc yy 33 2
6 aaaa xx 44 1
7 cccc xx 48 2
8 aaaa yy 58 0
9 cccc yy 59 5
10 bbbb yy 77 0
11 bbbb yy 99 0


and now using crosstab() with the command given below I was created new DataFrame.



df5 = pd.crosstab(df4['A1'], df4['A2'], margins=False,values=df4['A3'] , 
dropna=False, aggfunc='mean').reset_index().fillna(0)


this works properl.
it gives me output as follows



A2   A1      xx      yy
0 aaaa 29.5 33.0
1 bbbb 23.5 88.0
2 cccc 27.0 46.0


Now I want to store the mean values into the DataFrame df4



How can I do it, since I want to change A3 which contain 0 in df5 based on the crosstab()? and I want output as follows



    A1      A2  A3  A4    
0 aaaa xx 15 29.5
1 aaaa xx 44 1.0
2 aaaa yy 8 33.0
3 aaaa yy 58 33.0
4 bbbb xx 21 4.0
5 bbbb xx 26 23.5
6 bbbb yy 77 88.0
7 bbbb yy 99 88.0
8 cccc xx 6 5.0
9 cccc xx 48 2.0









share|improve this question















I created a DataFrame



    A1  A2  A3  A4
0 cccc xx 6 5
1 aaaa yy 8 0
2 aaaa xx 15 0
3 bbbb xx 21 4
4 bbbb xx 26 0
5 cccc yy 33 2
6 aaaa xx 44 1
7 cccc xx 48 2
8 aaaa yy 58 0
9 cccc yy 59 5
10 bbbb yy 77 0
11 bbbb yy 99 0


and now using crosstab() with the command given below I was created new DataFrame.



df5 = pd.crosstab(df4['A1'], df4['A2'], margins=False,values=df4['A3'] , 
dropna=False, aggfunc='mean').reset_index().fillna(0)


this works properl.
it gives me output as follows



A2   A1      xx      yy
0 aaaa 29.5 33.0
1 bbbb 23.5 88.0
2 cccc 27.0 46.0


Now I want to store the mean values into the DataFrame df4



How can I do it, since I want to change A3 which contain 0 in df5 based on the crosstab()? and I want output as follows



    A1      A2  A3  A4    
0 aaaa xx 15 29.5
1 aaaa xx 44 1.0
2 aaaa yy 8 33.0
3 aaaa yy 58 33.0
4 bbbb xx 21 4.0
5 bbbb xx 26 23.5
6 bbbb yy 77 88.0
7 bbbb yy 99 88.0
8 cccc xx 6 5.0
9 cccc xx 48 2.0






python pandas pandas-groupby






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 '18 at 15:34









jpp

91.7k2052102




91.7k2052102










asked Nov 19 '18 at 12:54









Anuprita

285




285








  • 3




    Can you create minimal, complete, and verifiable example?
    – jezrael
    Nov 19 '18 at 12:55










  • How do you go from the input to the output? Whats that new output calculating?
    – Franco Piccolo
    Nov 19 '18 at 14:07










  • in some rows A4 contain 0. I want to replace it with the mean value which I received from crosstab
    – Anuprita
    Nov 19 '18 at 15:19














  • 3




    Can you create minimal, complete, and verifiable example?
    – jezrael
    Nov 19 '18 at 12:55










  • How do you go from the input to the output? Whats that new output calculating?
    – Franco Piccolo
    Nov 19 '18 at 14:07










  • in some rows A4 contain 0. I want to replace it with the mean value which I received from crosstab
    – Anuprita
    Nov 19 '18 at 15:19








3




3




Can you create minimal, complete, and verifiable example?
– jezrael
Nov 19 '18 at 12:55




Can you create minimal, complete, and verifiable example?
– jezrael
Nov 19 '18 at 12:55












How do you go from the input to the output? Whats that new output calculating?
– Franco Piccolo
Nov 19 '18 at 14:07




How do you go from the input to the output? Whats that new output calculating?
– Franco Piccolo
Nov 19 '18 at 14:07












in some rows A4 contain 0. I want to replace it with the mean value which I received from crosstab
– Anuprita
Nov 19 '18 at 15:19




in some rows A4 contain 0. I want to replace it with the mean value which I received from crosstab
– Anuprita
Nov 19 '18 at 15:19












1 Answer
1






active

oldest

votes


















0















mask + groupby + transform



Ignoring the unnecessary reordering and removal of some rows in your desired output, you can use mask with groupby:



group_mean = df4.groupby(['A1', 'A2'])['A3'].transform('mean')

df4['A4'] = df4['A4'].mask(df4['A4'] == 0, group_mean)

print(df4)

A1 A2 A3 A4
0 cccc xx 6 5.0
1 aaaa yy 8 33.0
2 aaaa xx 15 29.5
3 bbbb xx 21 4.0
4 bbbb xx 26 23.5
5 cccc yy 33 2.0
6 aaaa xx 44 1.0
7 cccc xx 48 2.0
8 aaaa yy 58 33.0
9 cccc yy 59 5.0
10 bbbb yy 77 88.0
11 bbbb yy 99 88.0





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53375106%2fcrosstab-in-pandas-dataframe%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0















    mask + groupby + transform



    Ignoring the unnecessary reordering and removal of some rows in your desired output, you can use mask with groupby:



    group_mean = df4.groupby(['A1', 'A2'])['A3'].transform('mean')

    df4['A4'] = df4['A4'].mask(df4['A4'] == 0, group_mean)

    print(df4)

    A1 A2 A3 A4
    0 cccc xx 6 5.0
    1 aaaa yy 8 33.0
    2 aaaa xx 15 29.5
    3 bbbb xx 21 4.0
    4 bbbb xx 26 23.5
    5 cccc yy 33 2.0
    6 aaaa xx 44 1.0
    7 cccc xx 48 2.0
    8 aaaa yy 58 33.0
    9 cccc yy 59 5.0
    10 bbbb yy 77 88.0
    11 bbbb yy 99 88.0





    share|improve this answer


























      0















      mask + groupby + transform



      Ignoring the unnecessary reordering and removal of some rows in your desired output, you can use mask with groupby:



      group_mean = df4.groupby(['A1', 'A2'])['A3'].transform('mean')

      df4['A4'] = df4['A4'].mask(df4['A4'] == 0, group_mean)

      print(df4)

      A1 A2 A3 A4
      0 cccc xx 6 5.0
      1 aaaa yy 8 33.0
      2 aaaa xx 15 29.5
      3 bbbb xx 21 4.0
      4 bbbb xx 26 23.5
      5 cccc yy 33 2.0
      6 aaaa xx 44 1.0
      7 cccc xx 48 2.0
      8 aaaa yy 58 33.0
      9 cccc yy 59 5.0
      10 bbbb yy 77 88.0
      11 bbbb yy 99 88.0





      share|improve this answer
























        0












        0








        0







        mask + groupby + transform



        Ignoring the unnecessary reordering and removal of some rows in your desired output, you can use mask with groupby:



        group_mean = df4.groupby(['A1', 'A2'])['A3'].transform('mean')

        df4['A4'] = df4['A4'].mask(df4['A4'] == 0, group_mean)

        print(df4)

        A1 A2 A3 A4
        0 cccc xx 6 5.0
        1 aaaa yy 8 33.0
        2 aaaa xx 15 29.5
        3 bbbb xx 21 4.0
        4 bbbb xx 26 23.5
        5 cccc yy 33 2.0
        6 aaaa xx 44 1.0
        7 cccc xx 48 2.0
        8 aaaa yy 58 33.0
        9 cccc yy 59 5.0
        10 bbbb yy 77 88.0
        11 bbbb yy 99 88.0





        share|improve this answer













        mask + groupby + transform



        Ignoring the unnecessary reordering and removal of some rows in your desired output, you can use mask with groupby:



        group_mean = df4.groupby(['A1', 'A2'])['A3'].transform('mean')

        df4['A4'] = df4['A4'].mask(df4['A4'] == 0, group_mean)

        print(df4)

        A1 A2 A3 A4
        0 cccc xx 6 5.0
        1 aaaa yy 8 33.0
        2 aaaa xx 15 29.5
        3 bbbb xx 21 4.0
        4 bbbb xx 26 23.5
        5 cccc yy 33 2.0
        6 aaaa xx 44 1.0
        7 cccc xx 48 2.0
        8 aaaa yy 58 33.0
        9 cccc yy 59 5.0
        10 bbbb yy 77 88.0
        11 bbbb yy 99 88.0






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 19 '18 at 15:31









        jpp

        91.7k2052102




        91.7k2052102






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53375106%2fcrosstab-in-pandas-dataframe%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            The term 'EXEC' is not recognized as the name of a cmdlet Powershell

            NPM command prompt closes immediately [closed]

            Error binding properties and functions in emscripten