Count data divided by year and by region in R

up vote
8
down vote

favorite

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.

Does anyone have any suggestions?

edited 2 days ago

m0nhawk

14.8k83160

asked 2 days ago

cb14

434

New contributor

add a comment |

up vote
8
down vote

favorite

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

Does anyone have any suggestions?

edited 2 days ago

m0nhawk

14.8k83160

asked 2 days ago

cb14

434

New contributor

add a comment |

up vote
8
down vote

favorite

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

Does anyone have any suggestions?

edited 2 days ago

m0nhawk

14.8k83160

asked 2 days ago

cb14

434

New contributor

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

Does anyone have any suggestions?

r grouping tidyverse data-management

edited 2 days ago

m0nhawk

14.8k83160

asked 2 days ago

cb14

434

New contributor

edited 2 days ago

m0nhawk

14.8k83160

asked 2 days ago

cb14

434

New contributor

edited 2 days ago

m0nhawk

14.8k83160

edited 2 days ago

m0nhawk

14.8k83160

edited 2 days ago

m0nhawk

14.8k83160

asked 2 days ago

cb14

434

New contributor

asked 2 days ago

cb14

434

asked 2 days ago

cb14

434

New contributor

cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

up vote
10
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered 2 days ago

wl1234

183211

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
2 days ago

add a comment |

up vote
5
down vote

Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited 2 days ago

answered 2 days ago

www

25.3k102239

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
2 days ago

I didn't know about unite. I will use that instead of paste next time.
– wl1234
2 days ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

cb14 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356871%2fcount-data-divided-by-year-and-by-region-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
10
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered 2 days ago

wl1234

183211

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
2 days ago

add a comment |

up vote
10
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered 2 days ago

wl1234

183211

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
2 days ago

add a comment |

up vote
10
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered 2 days ago

wl1234

183211

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered 2 days ago

wl1234

183211

answered 2 days ago

wl1234

183211

answered 2 days ago

wl1234

183211

answered 2 days ago

wl1234

183211

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
2 days ago

add a comment |

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
2 days ago

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
2 days ago

add a comment |

up vote
5
down vote

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited 2 days ago

answered 2 days ago

www

25.3k102239

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
2 days ago

I didn't know about unite. I will use that instead of paste next time.
– wl1234
2 days ago

add a comment |

up vote
5
down vote

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited 2 days ago

answered 2 days ago

www

25.3k102239

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
2 days ago

I didn't know about unite. I will use that instead of paste next time.
– wl1234
2 days ago

add a comment |

up vote
5
down vote

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited 2 days ago

answered 2 days ago

www

25.3k102239

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited 2 days ago

answered 2 days ago

www

25.3k102239

edited 2 days ago

answered 2 days ago

www

25.3k102239

answered 2 days ago

www

25.3k102239

answered 2 days ago

www

25.3k102239

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
2 days ago

I didn't know about unite. I will use that instead of paste next time.
– wl1234
2 days ago

add a comment |

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
2 days ago

I didn't know about unite. I will use that instead of paste next time.
– wl1234
2 days ago

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
2 days ago

I didn't know about unite. I will use that instead of paste next time.
– wl1234
2 days ago

add a comment |

cb14 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

cb14 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtyktl