A pythonic and uFunc-y way to turn pandas column into “increasing” index?











up vote
6
down vote

favorite
2












Let's say I have a pandas df like so:



Index   A     B
0 foo 3
1 foo 2
2 foo 5
3 bar 3
4 bar 4
5 baz 5


What's a good fast way to add a column like so:



Index   A     B    Aidx
0 foo 3 0
1 foo 2 0
2 foo 5 0
3 bar 3 1
4 bar 4 1
5 baz 5 2


I.e. adding an increasing index for each unique value?



I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?










share|improve this question


























    up vote
    6
    down vote

    favorite
    2












    Let's say I have a pandas df like so:



    Index   A     B
    0 foo 3
    1 foo 2
    2 foo 5
    3 bar 3
    4 bar 4
    5 baz 5


    What's a good fast way to add a column like so:



    Index   A     B    Aidx
    0 foo 3 0
    1 foo 2 0
    2 foo 5 0
    3 bar 3 1
    4 bar 4 1
    5 baz 5 2


    I.e. adding an increasing index for each unique value?



    I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?










    share|improve this question
























      up vote
      6
      down vote

      favorite
      2









      up vote
      6
      down vote

      favorite
      2






      2





      Let's say I have a pandas df like so:



      Index   A     B
      0 foo 3
      1 foo 2
      2 foo 5
      3 bar 3
      4 bar 4
      5 baz 5


      What's a good fast way to add a column like so:



      Index   A     B    Aidx
      0 foo 3 0
      1 foo 2 0
      2 foo 5 0
      3 bar 3 1
      4 bar 4 1
      5 baz 5 2


      I.e. adding an increasing index for each unique value?



      I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?










      share|improve this question













      Let's say I have a pandas df like so:



      Index   A     B
      0 foo 3
      1 foo 2
      2 foo 5
      3 bar 3
      4 bar 4
      5 baz 5


      What's a good fast way to add a column like so:



      Index   A     B    Aidx
      0 foo 3 0
      1 foo 2 0
      2 foo 5 0
      3 bar 3 1
      4 bar 4 1
      5 baz 5 2


      I.e. adding an increasing index for each unique value?



      I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 1 hour ago









      Lagerbaer

      2,6181124




      2,6181124
























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          6
          down vote













          One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



          df['Aidx'] = df.groupby('A',sort=False).ngroup()
          >>> df
          Index A B Aidx
          0 0 foo 3 0
          1 1 foo 2 0
          2 2 foo 5 0
          3 3 bar 3 1
          4 4 bar 4 1
          5 5 baz 5 2





          share|improve this answer






























            up vote
            2
            down vote













            No need groupby using





            Method 1factorize



            pd.factorize(df.A)[0]
            array([0, 0, 0, 1, 1, 2], dtype=int64)
            #df['Aidx']=pd.factorize(df.A)[0]




            Method 2 sklearn



            from sklearn import preprocessing
            le = preprocessing.LabelEncoder()
            le.fit(df.A)
            LabelEncoder()
            le.transform(df.A)
            array([2, 2, 2, 0, 0, 1])




            Method 3 cat.codes



            df.A.astype('category').cat.codes




            Method 4 map + unique



            l=df.A.unique()
            df.A.map(dict(zip(l,range(len(l)))))
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int64





            share|improve this answer






























              up vote
              2
              down vote













              One more method of doing so could be.



              df['C'] = i.ne(df.A.shift()).cumsum()-1
              df


              When we print df value it will be as follows.



                Index  A    B  C
              0 0 foo 3 0
              1 1 foo 2 0
              2 2 foo 5 0
              3 3 bar 3 1
              4 4 bar 4 1
              5 5 baz 5 2


              Explanation of solution: Let's break above solution into parts for understanding purposes.



              1st step: Compare df's A column by shifting its value down to itself as follows.



              i.ne(df.A.shift())


              Output we will get is:



              0     True
              1 False
              2 False
              3 True
              4 False
              5 True


              2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



              i.ne(df.A.shift()).cumsum()-1
              0 0
              1 0
              2 0
              3 1
              4 1
              5 2
              Name: A, dtype: int32


              3rd step: Save command's value into df['C'] which will create a new column named C in df.






              share|improve this answer



















              • 1




                Nice method ve++ for you
                – W-B
                36 mins ago










              • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                – RavinderSingh13
                34 mins ago













              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53772121%2fa-pythonic-and-ufunc-y-way-to-turn-pandas-column-into-increasing-index%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              6
              down vote













              One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



              df['Aidx'] = df.groupby('A',sort=False).ngroup()
              >>> df
              Index A B Aidx
              0 0 foo 3 0
              1 1 foo 2 0
              2 2 foo 5 0
              3 3 bar 3 1
              4 4 bar 4 1
              5 5 baz 5 2





              share|improve this answer



























                up vote
                6
                down vote













                One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



                df['Aidx'] = df.groupby('A',sort=False).ngroup()
                >>> df
                Index A B Aidx
                0 0 foo 3 0
                1 1 foo 2 0
                2 2 foo 5 0
                3 3 bar 3 1
                4 4 bar 4 1
                5 5 baz 5 2





                share|improve this answer

























                  up vote
                  6
                  down vote










                  up vote
                  6
                  down vote









                  One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



                  df['Aidx'] = df.groupby('A',sort=False).ngroup()
                  >>> df
                  Index A B Aidx
                  0 0 foo 3 0
                  1 1 foo 2 0
                  2 2 foo 5 0
                  3 3 bar 3 1
                  4 4 bar 4 1
                  5 5 baz 5 2





                  share|improve this answer














                  One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



                  df['Aidx'] = df.groupby('A',sort=False).ngroup()
                  >>> df
                  Index A B Aidx
                  0 0 foo 3 0
                  1 1 foo 2 0
                  2 2 foo 5 0
                  3 3 bar 3 1
                  4 4 bar 4 1
                  5 5 baz 5 2






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 19 mins ago

























                  answered 1 hour ago









                  sacul

                  29.7k41640




                  29.7k41640
























                      up vote
                      2
                      down vote













                      No need groupby using





                      Method 1factorize



                      pd.factorize(df.A)[0]
                      array([0, 0, 0, 1, 1, 2], dtype=int64)
                      #df['Aidx']=pd.factorize(df.A)[0]




                      Method 2 sklearn



                      from sklearn import preprocessing
                      le = preprocessing.LabelEncoder()
                      le.fit(df.A)
                      LabelEncoder()
                      le.transform(df.A)
                      array([2, 2, 2, 0, 0, 1])




                      Method 3 cat.codes



                      df.A.astype('category').cat.codes




                      Method 4 map + unique



                      l=df.A.unique()
                      df.A.map(dict(zip(l,range(len(l)))))
                      0 0
                      1 0
                      2 0
                      3 1
                      4 1
                      5 2
                      Name: A, dtype: int64





                      share|improve this answer



























                        up vote
                        2
                        down vote













                        No need groupby using





                        Method 1factorize



                        pd.factorize(df.A)[0]
                        array([0, 0, 0, 1, 1, 2], dtype=int64)
                        #df['Aidx']=pd.factorize(df.A)[0]




                        Method 2 sklearn



                        from sklearn import preprocessing
                        le = preprocessing.LabelEncoder()
                        le.fit(df.A)
                        LabelEncoder()
                        le.transform(df.A)
                        array([2, 2, 2, 0, 0, 1])




                        Method 3 cat.codes



                        df.A.astype('category').cat.codes




                        Method 4 map + unique



                        l=df.A.unique()
                        df.A.map(dict(zip(l,range(len(l)))))
                        0 0
                        1 0
                        2 0
                        3 1
                        4 1
                        5 2
                        Name: A, dtype: int64





                        share|improve this answer

























                          up vote
                          2
                          down vote










                          up vote
                          2
                          down vote









                          No need groupby using





                          Method 1factorize



                          pd.factorize(df.A)[0]
                          array([0, 0, 0, 1, 1, 2], dtype=int64)
                          #df['Aidx']=pd.factorize(df.A)[0]




                          Method 2 sklearn



                          from sklearn import preprocessing
                          le = preprocessing.LabelEncoder()
                          le.fit(df.A)
                          LabelEncoder()
                          le.transform(df.A)
                          array([2, 2, 2, 0, 0, 1])




                          Method 3 cat.codes



                          df.A.astype('category').cat.codes




                          Method 4 map + unique



                          l=df.A.unique()
                          df.A.map(dict(zip(l,range(len(l)))))
                          0 0
                          1 0
                          2 0
                          3 1
                          4 1
                          5 2
                          Name: A, dtype: int64





                          share|improve this answer














                          No need groupby using





                          Method 1factorize



                          pd.factorize(df.A)[0]
                          array([0, 0, 0, 1, 1, 2], dtype=int64)
                          #df['Aidx']=pd.factorize(df.A)[0]




                          Method 2 sklearn



                          from sklearn import preprocessing
                          le = preprocessing.LabelEncoder()
                          le.fit(df.A)
                          LabelEncoder()
                          le.transform(df.A)
                          array([2, 2, 2, 0, 0, 1])




                          Method 3 cat.codes



                          df.A.astype('category').cat.codes




                          Method 4 map + unique



                          l=df.A.unique()
                          df.A.map(dict(zip(l,range(len(l)))))
                          0 0
                          1 0
                          2 0
                          3 1
                          4 1
                          5 2
                          Name: A, dtype: int64






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited 28 mins ago

























                          answered 38 mins ago









                          W-B

                          97.8k73162




                          97.8k73162






















                              up vote
                              2
                              down vote













                              One more method of doing so could be.



                              df['C'] = i.ne(df.A.shift()).cumsum()-1
                              df


                              When we print df value it will be as follows.



                                Index  A    B  C
                              0 0 foo 3 0
                              1 1 foo 2 0
                              2 2 foo 5 0
                              3 3 bar 3 1
                              4 4 bar 4 1
                              5 5 baz 5 2


                              Explanation of solution: Let's break above solution into parts for understanding purposes.



                              1st step: Compare df's A column by shifting its value down to itself as follows.



                              i.ne(df.A.shift())


                              Output we will get is:



                              0     True
                              1 False
                              2 False
                              3 True
                              4 False
                              5 True


                              2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                              i.ne(df.A.shift()).cumsum()-1
                              0 0
                              1 0
                              2 0
                              3 1
                              4 1
                              5 2
                              Name: A, dtype: int32


                              3rd step: Save command's value into df['C'] which will create a new column named C in df.






                              share|improve this answer



















                              • 1




                                Nice method ve++ for you
                                – W-B
                                36 mins ago










                              • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                                – RavinderSingh13
                                34 mins ago

















                              up vote
                              2
                              down vote













                              One more method of doing so could be.



                              df['C'] = i.ne(df.A.shift()).cumsum()-1
                              df


                              When we print df value it will be as follows.



                                Index  A    B  C
                              0 0 foo 3 0
                              1 1 foo 2 0
                              2 2 foo 5 0
                              3 3 bar 3 1
                              4 4 bar 4 1
                              5 5 baz 5 2


                              Explanation of solution: Let's break above solution into parts for understanding purposes.



                              1st step: Compare df's A column by shifting its value down to itself as follows.



                              i.ne(df.A.shift())


                              Output we will get is:



                              0     True
                              1 False
                              2 False
                              3 True
                              4 False
                              5 True


                              2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                              i.ne(df.A.shift()).cumsum()-1
                              0 0
                              1 0
                              2 0
                              3 1
                              4 1
                              5 2
                              Name: A, dtype: int32


                              3rd step: Save command's value into df['C'] which will create a new column named C in df.






                              share|improve this answer



















                              • 1




                                Nice method ve++ for you
                                – W-B
                                36 mins ago










                              • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                                – RavinderSingh13
                                34 mins ago















                              up vote
                              2
                              down vote










                              up vote
                              2
                              down vote









                              One more method of doing so could be.



                              df['C'] = i.ne(df.A.shift()).cumsum()-1
                              df


                              When we print df value it will be as follows.



                                Index  A    B  C
                              0 0 foo 3 0
                              1 1 foo 2 0
                              2 2 foo 5 0
                              3 3 bar 3 1
                              4 4 bar 4 1
                              5 5 baz 5 2


                              Explanation of solution: Let's break above solution into parts for understanding purposes.



                              1st step: Compare df's A column by shifting its value down to itself as follows.



                              i.ne(df.A.shift())


                              Output we will get is:



                              0     True
                              1 False
                              2 False
                              3 True
                              4 False
                              5 True


                              2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                              i.ne(df.A.shift()).cumsum()-1
                              0 0
                              1 0
                              2 0
                              3 1
                              4 1
                              5 2
                              Name: A, dtype: int32


                              3rd step: Save command's value into df['C'] which will create a new column named C in df.






                              share|improve this answer














                              One more method of doing so could be.



                              df['C'] = i.ne(df.A.shift()).cumsum()-1
                              df


                              When we print df value it will be as follows.



                                Index  A    B  C
                              0 0 foo 3 0
                              1 1 foo 2 0
                              2 2 foo 5 0
                              3 3 bar 3 1
                              4 4 bar 4 1
                              5 5 baz 5 2


                              Explanation of solution: Let's break above solution into parts for understanding purposes.



                              1st step: Compare df's A column by shifting its value down to itself as follows.



                              i.ne(df.A.shift())


                              Output we will get is:



                              0     True
                              1 False
                              2 False
                              3 True
                              4 False
                              5 True


                              2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                              i.ne(df.A.shift()).cumsum()-1
                              0 0
                              1 0
                              2 0
                              3 1
                              4 1
                              5 2
                              Name: A, dtype: int32


                              3rd step: Save command's value into df['C'] which will create a new column named C in df.







                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited 23 mins ago

























                              answered 42 mins ago









                              RavinderSingh13

                              25k41437




                              25k41437








                              • 1




                                Nice method ve++ for you
                                – W-B
                                36 mins ago










                              • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                                – RavinderSingh13
                                34 mins ago
















                              • 1




                                Nice method ve++ for you
                                – W-B
                                36 mins ago










                              • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                                – RavinderSingh13
                                34 mins ago










                              1




                              1




                              Nice method ve++ for you
                              – W-B
                              36 mins ago




                              Nice method ve++ for you
                              – W-B
                              36 mins ago












                              @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                              – RavinderSingh13
                              34 mins ago






                              @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                              – RavinderSingh13
                              34 mins ago




















                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53772121%2fa-pythonic-and-ufunc-y-way-to-turn-pandas-column-into-increasing-index%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

                              Mangá

                              Eduardo VII do Reino Unido