Count appearances of a value until it changes to another value
up vote
7
down vote
favorite
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10    6
9     3
23    2
12    1
The desired output is
10:2 
23:2
 9:3
10:4
12:1
How can I do this?
python pandas count frequency
add a comment |
up vote
7
down vote
favorite
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10    6
9     3
23    2
12    1
The desired output is
10:2 
23:2
 9:3
10:4
12:1
How can I do this?
python pandas count frequency
 
 
 
 
 
 
 You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
 – Buhb
 51 mins ago
 
 
 
add a comment |
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10    6
9     3
23    2
12    1
The desired output is
10:2 
23:2
 9:3
10:4
12:1
How can I do this?
python pandas count frequency
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10    6
9     3
23    2
12    1
The desired output is
10:2 
23:2
 9:3
10:4
12:1
How can I do this?
python pandas count frequency
python pandas count frequency
edited 2 hours ago


Alex Riley
75.5k20155159
75.5k20155159
asked 6 hours ago
Mischa
666
666
 
 
 
 
 
 
 You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
 – Buhb
 51 mins ago
 
 
 
add a comment |
 
 
 
 
 
 
 You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
 – Buhb
 51 mins ago
 
 
 
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
51 mins ago
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
51 mins ago
add a comment |
                                4 Answers
                                4
                        
active
oldest
votes
up vote
10
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values  values
1       10        2
2       23        2
3       9         3
4       10        4
5       12        1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
Explanation:
Compare original column by shifted with not equal ne and then add cumsum for helper Series:
print (pd.concat([df['values'], a, b, c], 
                 keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
    orig  shifted  not_equal  cumsum
0     10      NaN       True       1
1     10     10.0      False       1
2     23     10.0       True       2
3     23     23.0      False       2
4      9     23.0       True       3
5      9      9.0      False       3
6      9      9.0      False       3
7     10      9.0       True       4
8     10     10.0      False       4
9     10     10.0      False       4
10    10     10.0      False       4
11    12     10.0       True       5
 
 
 
 
 
 
 i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
 – Mischa
 6 hours ago
 
 
 
 
 
 1
 
 
 
 
 @Mischa - Then add- .renamelike- df['values'].ne(df['values'].shift()).cumsum().rename('val1')
 – jezrael
 6 hours ago
 
 
 
add a comment |
up vote
5
down vote
You can keep track of where the changes in df['values'] occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0     1
1     1
2     2
3     2
4     3
5     3
6     3
7     4
8     4
9     4
10    4
11    5
And groupby the changes and also df['values'] (to keep them as index) computing the size of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
add a comment |
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]: 
values  9   10  12  23
key                   
1        0   2   0   0
2        0   0   0   2
3        3   0   0   0
4        0   4   0   0
5        0   0   1   0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]: 
key  values
1    10        2
2    23        2
3    9         3
4    10        4
5    12        1
dtype: int64
Base on python groupby 
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
add a comment |
up vote
4
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10    2
23    2
9     3
10    4
12    1
dtype: int64
It's a generator
def f(x):
  count = 1
  for this, that in zip(x, x[1:]):
    if this == that:
      count += 1
    else:
      yield count, this
      count = 1
  yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10    2
23    2
9     3
10    4
12    1
dtype: int64
add a comment |
                                4 Answers
                                4
                        
active
oldest
votes
                                4 Answers
                                4
                        
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
10
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values  values
1       10        2
2       23        2
3       9         3
4       10        4
5       12        1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
Explanation:
Compare original column by shifted with not equal ne and then add cumsum for helper Series:
print (pd.concat([df['values'], a, b, c], 
                 keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
    orig  shifted  not_equal  cumsum
0     10      NaN       True       1
1     10     10.0      False       1
2     23     10.0       True       2
3     23     23.0      False       2
4      9     23.0       True       3
5      9      9.0      False       3
6      9      9.0      False       3
7     10      9.0       True       4
8     10     10.0      False       4
9     10     10.0      False       4
10    10     10.0      False       4
11    12     10.0       True       5
 
 
 
 
 
 
 i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
 – Mischa
 6 hours ago
 
 
 
 
 
 1
 
 
 
 
 @Mischa - Then add- .renamelike- df['values'].ne(df['values'].shift()).cumsum().rename('val1')
 – jezrael
 6 hours ago
 
 
 
add a comment |
up vote
10
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values  values
1       10        2
2       23        2
3       9         3
4       10        4
5       12        1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
Explanation:
Compare original column by shifted with not equal ne and then add cumsum for helper Series:
print (pd.concat([df['values'], a, b, c], 
                 keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
    orig  shifted  not_equal  cumsum
0     10      NaN       True       1
1     10     10.0      False       1
2     23     10.0       True       2
3     23     23.0      False       2
4      9     23.0       True       3
5      9      9.0      False       3
6      9      9.0      False       3
7     10      9.0       True       4
8     10     10.0      False       4
9     10     10.0      False       4
10    10     10.0      False       4
11    12     10.0       True       5
 
 
 
 
 
 
 i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
 – Mischa
 6 hours ago
 
 
 
 
 
 1
 
 
 
 
 @Mischa - Then add- .renamelike- df['values'].ne(df['values'].shift()).cumsum().rename('val1')
 – jezrael
 6 hours ago
 
 
 
add a comment |
up vote
10
down vote
up vote
10
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values  values
1       10        2
2       23        2
3       9         3
4       10        4
5       12        1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
Explanation:
Compare original column by shifted with not equal ne and then add cumsum for helper Series:
print (pd.concat([df['values'], a, b, c], 
                 keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
    orig  shifted  not_equal  cumsum
0     10      NaN       True       1
1     10     10.0      False       1
2     23     10.0       True       2
3     23     23.0      False       2
4      9     23.0       True       3
5      9      9.0      False       3
6      9      9.0      False       3
7     10      9.0       True       4
8     10     10.0      False       4
9     10     10.0      False       4
10    10     10.0      False       4
11    12     10.0       True       5
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values  values
1       10        2
2       23        2
3       9         3
4       10        4
5       12        1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
Explanation:
Compare original column by shifted with not equal ne and then add cumsum for helper Series:
print (pd.concat([df['values'], a, b, c], 
                 keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
    orig  shifted  not_equal  cumsum
0     10      NaN       True       1
1     10     10.0      False       1
2     23     10.0       True       2
3     23     23.0      False       2
4      9     23.0       True       3
5      9      9.0      False       3
6      9      9.0      False       3
7     10      9.0       True       4
8     10     10.0      False       4
9     10     10.0      False       4
10    10     10.0      False       4
11    12     10.0       True       5
edited 6 hours ago
answered 6 hours ago


jezrael
311k21246322
311k21246322
 
 
 
 
 
 
 i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
 – Mischa
 6 hours ago
 
 
 
 
 
 1
 
 
 
 
 @Mischa - Then add- .renamelike- df['values'].ne(df['values'].shift()).cumsum().rename('val1')
 – jezrael
 6 hours ago
 
 
 
add a comment |
 
 
 
 
 
 
 i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
 – Mischa
 6 hours ago
 
 
 
 
 
 1
 
 
 
 
 @Mischa - Then add- .renamelike- df['values'].ne(df['values'].shift()).cumsum().rename('val1')
 – jezrael
 6 hours ago
 
 
 
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
6 hours ago
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
6 hours ago
1
1
@Mischa - Then add
.rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')– jezrael
6 hours ago
@Mischa - Then add
.rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')– jezrael
6 hours ago
add a comment |
up vote
5
down vote
You can keep track of where the changes in df['values'] occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0     1
1     1
2     2
3     2
4     3
5     3
6     3
7     4
8     4
9     4
10    4
11    5
And groupby the changes and also df['values'] (to keep them as index) computing the size of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
add a comment |
up vote
5
down vote
You can keep track of where the changes in df['values'] occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0     1
1     1
2     2
3     2
4     3
5     3
6     3
7     4
8     4
9     4
10    4
11    5
And groupby the changes and also df['values'] (to keep them as index) computing the size of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
add a comment |
up vote
5
down vote
up vote
5
down vote
You can keep track of where the changes in df['values'] occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0     1
1     1
2     2
3     2
4     3
5     3
6     3
7     4
8     4
9     4
10    4
11    5
And groupby the changes and also df['values'] (to keep them as index) computing the size of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
You can keep track of where the changes in df['values'] occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0     1
1     1
2     2
3     2
4     3
5     3
6     3
7     4
8     4
9     4
10    4
11    5
And groupby the changes and also df['values'] (to keep them as index) computing the size of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64
edited 6 hours ago
answered 6 hours ago
nixon
1,14716
1,14716
add a comment |
add a comment |
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]: 
values  9   10  12  23
key                   
1        0   2   0   0
2        0   0   0   2
3        3   0   0   0
4        0   4   0   0
5        0   0   1   0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]: 
key  values
1    10        2
2    23        2
3    9         3
4    10        4
5    12        1
dtype: int64
Base on python groupby 
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
add a comment |
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]: 
values  9   10  12  23
key                   
1        0   2   0   0
2        0   0   0   2
3        3   0   0   0
4        0   4   0   0
5        0   0   1   0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]: 
key  values
1    10        2
2    23        2
3    9         3
4    10        4
5    12        1
dtype: int64
Base on python groupby 
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
add a comment |
up vote
4
down vote
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]: 
values  9   10  12  23
key                   
1        0   2   0   0
2        0   0   0   2
3        3   0   0   0
4        0   4   0   0
5        0   0   1   0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]: 
key  values
1    10        2
2    23        2
3    9         3
4    10        4
5    12        1
dtype: int64
Base on python groupby 
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]: 
values  9   10  12  23
key                   
1        0   2   0   0
2        0   0   0   2
3        3   0   0   0
4        0   4   0   0
5        0   0   1   0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]: 
key  values
1    10        2
2    23        2
3    9         3
4    10        4
5    12        1
dtype: int64
Base on python groupby 
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
edited 6 hours ago
answered 6 hours ago


W-B
94.5k72857
94.5k72857
add a comment |
add a comment |
up vote
4
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10    2
23    2
9     3
10    4
12    1
dtype: int64
It's a generator
def f(x):
  count = 1
  for this, that in zip(x, x[1:]):
    if this == that:
      count += 1
    else:
      yield count, this
      count = 1
  yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10    2
23    2
9     3
10    4
12    1
dtype: int64
add a comment |
up vote
4
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10    2
23    2
9     3
10    4
12    1
dtype: int64
It's a generator
def f(x):
  count = 1
  for this, that in zip(x, x[1:]):
    if this == that:
      count += 1
    else:
      yield count, this
      count = 1
  yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10    2
23    2
9     3
10    4
12    1
dtype: int64
add a comment |
up vote
4
down vote
up vote
4
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10    2
23    2
9     3
10    4
12    1
dtype: int64
It's a generator
def f(x):
  count = 1
  for this, that in zip(x, x[1:]):
    if this == that:
      count += 1
    else:
      yield count, this
      count = 1
  yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10    2
23    2
9     3
10    4
12    1
dtype: int64
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10    2
23    2
9     3
10    4
12    1
dtype: int64
It's a generator
def f(x):
  count = 1
  for this, that in zip(x, x[1:]):
    if this == that:
      count += 1
    else:
      yield count, this
      count = 1
  yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10    2
23    2
9     3
10    4
12    1
dtype: int64
edited 5 hours ago
answered 6 hours ago


piRSquared
150k21135277
150k21135277
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
51 mins ago