How to aggregate categorical data in R?












7















I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



       Category.x  Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



           Category.x    Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?










share|improve this question


















  • 4





    Looks like you need table(df1)

    – akrun
    2 hours ago











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    2 hours ago













  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    1 hour ago
















7















I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



       Category.x  Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



           Category.x    Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?










share|improve this question


















  • 4





    Looks like you need table(df1)

    – akrun
    2 hours ago











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    2 hours ago













  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    1 hour ago














7












7








7


1






I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



       Category.x  Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



           Category.x    Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?










share|improve this question














I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



       Category.x  Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



           Category.x    Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?







r aggregate






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 2 hours ago









DanielDaniel

644




644








  • 4





    Looks like you need table(df1)

    – akrun
    2 hours ago











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    2 hours ago













  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    1 hour ago














  • 4





    Looks like you need table(df1)

    – akrun
    2 hours ago











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    2 hours ago













  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    1 hour ago








4




4





Looks like you need table(df1)

– akrun
2 hours ago





Looks like you need table(df1)

– akrun
2 hours ago













Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

– Daniel
2 hours ago







Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

– Daniel
2 hours ago















I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

– akrun
1 hour ago





I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

– akrun
1 hour ago












3 Answers
3






active

oldest

votes


















6














As mentioned in the comments, table is standard for this, like



table(stack(DT))

ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


or



table(value = unlist(DT), cat = names(DT)[col(DT)])

cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


or



with(reshape(DT, direction = "long", varying = 1:2), 
table(value = Category, cat = time)
)

cat
value x y
Better 2 2
Similar 1 2
Worse 1 0





share|improve this answer































    3














    sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
    # Category.x Category.y
    #Better 2 2
    #Similar 1 2
    #Worse 1 0





    share|improve this answer































      2














      One dplyr and tidyr possibility could be:



      df %>%
      gather(var, val) %>%
      count(var, val) %>%
      spread(var, n, fill = 0)

      val Category.x Category.y
      <chr> <dbl> <dbl>
      1 Better 2 2
      2 Similar 1 2
      3 Worse 1 0


      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



      Or with dplyr and reshape2 you can do:



      df %>%
      mutate(rowid = row_number()) %>%
      melt(., id.vars = "rowid") %>%
      count(variable, value) %>%
      dcast(value ~ variable, value.var = "n", fill = 0)

      value Category.x Category.y
      1 Better 2 2
      2 Similar 1 2
      3 Worse 1 0





      share|improve this answer


























      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

        – Daniel
        1 hour ago











      • Please see the updated post for commentary.

        – tmfmnk
        1 hour ago












      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      6














      As mentioned in the comments, table is standard for this, like



      table(stack(DT))

      ind
      values Category.x Category.y
      Better 2 2
      Similar 1 2
      Worse 1 0


      or



      table(value = unlist(DT), cat = names(DT)[col(DT)])

      cat
      value Category.x Category.y
      Better 2 2
      Similar 1 2
      Worse 1 0


      or



      with(reshape(DT, direction = "long", varying = 1:2), 
      table(value = Category, cat = time)
      )

      cat
      value x y
      Better 2 2
      Similar 1 2
      Worse 1 0





      share|improve this answer




























        6














        As mentioned in the comments, table is standard for this, like



        table(stack(DT))

        ind
        values Category.x Category.y
        Better 2 2
        Similar 1 2
        Worse 1 0


        or



        table(value = unlist(DT), cat = names(DT)[col(DT)])

        cat
        value Category.x Category.y
        Better 2 2
        Similar 1 2
        Worse 1 0


        or



        with(reshape(DT, direction = "long", varying = 1:2), 
        table(value = Category, cat = time)
        )

        cat
        value x y
        Better 2 2
        Similar 1 2
        Worse 1 0





        share|improve this answer


























          6












          6








          6







          As mentioned in the comments, table is standard for this, like



          table(stack(DT))

          ind
          values Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          table(value = unlist(DT), cat = names(DT)[col(DT)])

          cat
          value Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          with(reshape(DT, direction = "long", varying = 1:2), 
          table(value = Category, cat = time)
          )

          cat
          value x y
          Better 2 2
          Similar 1 2
          Worse 1 0





          share|improve this answer













          As mentioned in the comments, table is standard for this, like



          table(stack(DT))

          ind
          values Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          table(value = unlist(DT), cat = names(DT)[col(DT)])

          cat
          value Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          with(reshape(DT, direction = "long", varying = 1:2), 
          table(value = Category, cat = time)
          )

          cat
          value x y
          Better 2 2
          Similar 1 2
          Worse 1 0






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 1 hour ago









          FrankFrank

          55.9k660135




          55.9k660135

























              3














              sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
              # Category.x Category.y
              #Better 2 2
              #Similar 1 2
              #Worse 1 0





              share|improve this answer




























                3














                sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
                # Category.x Category.y
                #Better 2 2
                #Similar 1 2
                #Worse 1 0





                share|improve this answer


























                  3












                  3








                  3







                  sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
                  # Category.x Category.y
                  #Better 2 2
                  #Similar 1 2
                  #Worse 1 0





                  share|improve this answer













                  sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
                  # Category.x Category.y
                  #Better 2 2
                  #Similar 1 2
                  #Worse 1 0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 2 hours ago









                  d.bd.b

                  20.5k41949




                  20.5k41949























                      2














                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0





                      share|improve this answer


























                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        1 hour ago











                      • Please see the updated post for commentary.

                        – tmfmnk
                        1 hour ago
















                      2














                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0





                      share|improve this answer


























                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        1 hour ago











                      • Please see the updated post for commentary.

                        – tmfmnk
                        1 hour ago














                      2












                      2








                      2







                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0





                      share|improve this answer















                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 37 mins ago

























                      answered 1 hour ago









                      tmfmnktmfmnk

                      3,6561516




                      3,6561516













                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        1 hour ago











                      • Please see the updated post for commentary.

                        – tmfmnk
                        1 hour ago



















                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        1 hour ago











                      • Please see the updated post for commentary.

                        – tmfmnk
                        1 hour ago

















                      Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                      – Daniel
                      1 hour ago





                      Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                      – Daniel
                      1 hour ago













                      Please see the updated post for commentary.

                      – tmfmnk
                      1 hour ago





                      Please see the updated post for commentary.

                      – tmfmnk
                      1 hour ago


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

                      Mangá

                      Eduardo VII do Reino Unido