How to aggregate categorical data in R?
I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:
Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar
I would like to come up with a table like this:
Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
How would you go about it?
r aggregate
add a comment |
I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:
Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar
I would like to come up with a table like this:
Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
How would you go about it?
r aggregate
4
Looks like you needtable(df1)
– akrun
2 hours ago
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
2 hours ago
I would convert tofactor
with commonlevels
lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls)
and then do thetable(df1)
– akrun
1 hour ago
add a comment |
I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:
Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar
I would like to come up with a table like this:
Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
How would you go about it?
r aggregate
I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:
Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar
I would like to come up with a table like this:
Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
How would you go about it?
r aggregate
r aggregate
asked 2 hours ago
DanielDaniel
644
644
4
Looks like you needtable(df1)
– akrun
2 hours ago
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
2 hours ago
I would convert tofactor
with commonlevels
lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls)
and then do thetable(df1)
– akrun
1 hour ago
add a comment |
4
Looks like you needtable(df1)
– akrun
2 hours ago
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
2 hours ago
I would convert tofactor
with commonlevels
lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls)
and then do thetable(df1)
– akrun
1 hour ago
4
4
Looks like you need
table(df1)
– akrun
2 hours ago
Looks like you need
table(df1)
– akrun
2 hours ago
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
2 hours ago
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
2 hours ago
I would convert to
factor
with common levels
lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls)
and then do the table(df1)
– akrun
1 hour ago
I would convert to
factor
with common levels
lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls)
and then do the table(df1)
– akrun
1 hour ago
add a comment |
3 Answers
3
active
oldest
votes
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
add a comment |
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
add a comment |
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
1 hour ago
Please see the updated post for commentary.
– tmfmnk
1 hour ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
add a comment |
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
add a comment |
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
answered 1 hour ago
FrankFrank
55.9k660135
55.9k660135
add a comment |
add a comment |
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
add a comment |
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
add a comment |
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
answered 2 hours ago
d.bd.b
20.5k41949
20.5k41949
add a comment |
add a comment |
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
1 hour ago
Please see the updated post for commentary.
– tmfmnk
1 hour ago
add a comment |
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
1 hour ago
Please see the updated post for commentary.
– tmfmnk
1 hour ago
add a comment |
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
edited 37 mins ago
answered 1 hour ago
tmfmnktmfmnk
3,6561516
3,6561516
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
1 hour ago
Please see the updated post for commentary.
– tmfmnk
1 hour ago
add a comment |
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
1 hour ago
Please see the updated post for commentary.
– tmfmnk
1 hour ago
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
1 hour ago
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
1 hour ago
Please see the updated post for commentary.
– tmfmnk
1 hour ago
Please see the updated post for commentary.
– tmfmnk
1 hour ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
Looks like you need
table(df1)
– akrun
2 hours ago
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
2 hours ago
I would convert to
factor
with commonlevels
lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls)
and then do thetable(df1)
– akrun
1 hour ago