Count appearances of a value until it changes to another value
up vote
7
down vote
favorite
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10 6
9 3
23 2
12 1
The desired output is
10:2
23:2
9:3
10:4
12:1
How can I do this?
python pandas count frequency
add a comment |
up vote
7
down vote
favorite
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10 6
9 3
23 2
12 1
The desired output is
10:2
23:2
9:3
10:4
12:1
How can I do this?
python pandas count frequency
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
51 mins ago
add a comment |
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10 6
9 3
23 2
12 1
The desired output is
10:2
23:2
9:3
10:4
12:1
How can I do this?
python pandas count frequency
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10 6
9 3
23 2
12 1
The desired output is
10:2
23:2
9:3
10:4
12:1
How can I do this?
python pandas count frequency
python pandas count frequency
edited 2 hours ago
Alex Riley
75.5k20155159
75.5k20155159
asked 6 hours ago
Mischa
666
666
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
51 mins ago
add a comment |
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
51 mins ago
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
51 mins ago
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
51 mins ago
add a comment |
4 Answers
4
active
oldest
votes
up vote
10
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
6 hours ago
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
6 hours ago
add a comment |
up vote
5
down vote
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
add a comment |
up vote
4
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
10
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
6 hours ago
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
6 hours ago
add a comment |
up vote
10
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
6 hours ago
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
6 hours ago
add a comment |
up vote
10
down vote
up vote
10
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
edited 6 hours ago
answered 6 hours ago
jezrael
311k21246322
311k21246322
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
6 hours ago
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
6 hours ago
add a comment |
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
6 hours ago
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
6 hours ago
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
6 hours ago
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
6 hours ago
1
1
@Mischa - Then add
.rename
like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
6 hours ago
@Mischa - Then add
.rename
like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
6 hours ago
add a comment |
up vote
5
down vote
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
5
down vote
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
5
down vote
up vote
5
down vote
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
edited 6 hours ago
answered 6 hours ago
nixon
1,14716
1,14716
add a comment |
add a comment |
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
add a comment |
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
add a comment |
up vote
4
down vote
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
edited 6 hours ago
answered 6 hours ago
W-B
94.5k72857
94.5k72857
add a comment |
add a comment |
up vote
4
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
4
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
4
down vote
up vote
4
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
edited 5 hours ago
answered 6 hours ago
piRSquared
150k21135277
150k21135277
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
51 mins ago