Count data divided by year and by region in R











up vote
8
down vote

favorite












I have a very large (too big to open in Excel) biological dataset that looks something like this



    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)

df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1


What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).



I'm looking to end up with a dataset that looks something along the lines of this,



      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2


such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.



Does anyone have any suggestions?










share|improve this question









New contributor




cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    8
    down vote

    favorite












    I have a very large (too big to open in Excel) biological dataset that looks something like this



        year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 
    1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
    1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
    'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
    'C', 'C', 'C', 'A')
    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
    3, 2, 2, 1, 1, 1)
    df <- data.frame(year, species, region)

    df
    year species region
    1 1990 A 1
    2 1980 A 1
    3 1985 B 1
    4 1980 B 3
    5 1990 B 2
    6 1990 C 3
    7 1980 C 3
    8 1985 C 2
    9 1985 A 1
    10 1990 A 1
    11 1980 A 3
    12 1985 B 3
    13 1980 B 3
    14 1990 B 2
    15 1990 C 2
    16 1980 C 1
    17 1985 C 1
    18 1985 A 1
    19 1990 A 1
    20 1980 A 3
    21 1985 B 3
    22 1980 B 3
    23 1990 B 2
    24 1990 C 2
    25 1980 C 1
    26 1985 C 1
    27 1985 A 1


    What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).



    I'm looking to end up with a dataset that looks something along the lines of this,



          region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
    1 1 0 0 0 0 0 0 0 0 0
    2 2 1 1 1 1 1 1 1 1 1
    3 3 2 2 2 2 2 2 2 2 2


    such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.



    Does anyone have any suggestions?










    share|improve this question









    New contributor




    cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      8
      down vote

      favorite









      up vote
      8
      down vote

      favorite











      I have a very large (too big to open in Excel) biological dataset that looks something like this



          year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 
      1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
      1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
      species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
      'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
      'C', 'C', 'C', 'A')
      region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
      3, 2, 2, 1, 1, 1)
      df <- data.frame(year, species, region)

      df
      year species region
      1 1990 A 1
      2 1980 A 1
      3 1985 B 1
      4 1980 B 3
      5 1990 B 2
      6 1990 C 3
      7 1980 C 3
      8 1985 C 2
      9 1985 A 1
      10 1990 A 1
      11 1980 A 3
      12 1985 B 3
      13 1980 B 3
      14 1990 B 2
      15 1990 C 2
      16 1980 C 1
      17 1985 C 1
      18 1985 A 1
      19 1990 A 1
      20 1980 A 3
      21 1985 B 3
      22 1980 B 3
      23 1990 B 2
      24 1990 C 2
      25 1980 C 1
      26 1985 C 1
      27 1985 A 1


      What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).



      I'm looking to end up with a dataset that looks something along the lines of this,



            region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
      1 1 0 0 0 0 0 0 0 0 0
      2 2 1 1 1 1 1 1 1 1 1
      3 3 2 2 2 2 2 2 2 2 2


      such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.



      Does anyone have any suggestions?










      share|improve this question









      New contributor




      cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I have a very large (too big to open in Excel) biological dataset that looks something like this



          year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 
      1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
      1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
      species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
      'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
      'C', 'C', 'C', 'A')
      region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
      3, 2, 2, 1, 1, 1)
      df <- data.frame(year, species, region)

      df
      year species region
      1 1990 A 1
      2 1980 A 1
      3 1985 B 1
      4 1980 B 3
      5 1990 B 2
      6 1990 C 3
      7 1980 C 3
      8 1985 C 2
      9 1985 A 1
      10 1990 A 1
      11 1980 A 3
      12 1985 B 3
      13 1980 B 3
      14 1990 B 2
      15 1990 C 2
      16 1980 C 1
      17 1985 C 1
      18 1985 A 1
      19 1990 A 1
      20 1980 A 3
      21 1985 B 3
      22 1980 B 3
      23 1990 B 2
      24 1990 C 2
      25 1980 C 1
      26 1985 C 1
      27 1985 A 1


      What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).



      I'm looking to end up with a dataset that looks something along the lines of this,



            region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
      1 1 0 0 0 0 0 0 0 0 0
      2 2 1 1 1 1 1 1 1 1 1
      3 3 2 2 2 2 2 2 2 2 2


      such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.



      Does anyone have any suggestions?







      r grouping tidyverse data-management






      share|improve this question









      New contributor




      cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 2 days ago









      m0nhawk

      14.8k83160




      14.8k83160






      New contributor




      cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      cb14

      434




      434




      New contributor




      cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          10
          down vote



          accepted










          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1





          share|improve this answer

















          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            2 days ago


















          up vote
          5
          down vote













          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1





          share|improve this answer



















          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            2 days ago












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            2 days ago











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          cb14 is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356871%2fcount-data-divided-by-year-and-by-region-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          10
          down vote



          accepted










          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1





          share|improve this answer

















          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            2 days ago















          up vote
          10
          down vote



          accepted










          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1





          share|improve this answer

















          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            2 days ago













          up vote
          10
          down vote



          accepted







          up vote
          10
          down vote



          accepted






          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1





          share|improve this answer












          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 days ago









          wl1234

          183211




          183211








          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            2 days ago














          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            2 days ago








          1




          1




          also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
          – Shree
          2 days ago




          also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
          – Shree
          2 days ago












          up vote
          5
          down vote













          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1





          share|improve this answer



















          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            2 days ago












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            2 days ago















          up vote
          5
          down vote













          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1





          share|improve this answer



















          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            2 days ago












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            2 days ago













          up vote
          5
          down vote










          up vote
          5
          down vote









          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1





          share|improve this answer














          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 2 days ago

























          answered 2 days ago









          www

          25.3k102239




          25.3k102239








          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            2 days ago












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            2 days ago














          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            2 days ago












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            2 days ago








          1




          1




          This is awesome, and I love the NA => 0 addition as well! Thank you!
          – cb14
          2 days ago






          This is awesome, and I love the NA => 0 addition as well! Thank you!
          – cb14
          2 days ago














          I didn't know about unite. I will use that instead of paste next time.
          – wl1234
          2 days ago




          I didn't know about unite. I will use that instead of paste next time.
          – wl1234
          2 days ago










          cb14 is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          cb14 is a new contributor. Be nice, and check out our Code of Conduct.













          cb14 is a new contributor. Be nice, and check out our Code of Conduct.












          cb14 is a new contributor. Be nice, and check out our Code of Conduct.















           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356871%2fcount-data-divided-by-year-and-by-region-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

          Mangá

          Eduardo VII do Reino Unido