Count string occurrences in pandas raw data row











up vote
7
down vote

favorite
1












I have a csv file as follows:



name,age
something
tom,20


And when I put it into a dataframe it looks like:



df = pd.read_csv('file', header=None)

0 1
1 name age
2 something NaN
3 tom 20


How would I get the count of a comma in the raw row data. For example, the answer should look like:



# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1









share|improve this question
























  • do you also want to count the commas if they're in the column value?
    – Omkar Sabade
    2 hours ago










  • @OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
    – David L
    2 hours ago

















up vote
7
down vote

favorite
1












I have a csv file as follows:



name,age
something
tom,20


And when I put it into a dataframe it looks like:



df = pd.read_csv('file', header=None)

0 1
1 name age
2 something NaN
3 tom 20


How would I get the count of a comma in the raw row data. For example, the answer should look like:



# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1









share|improve this question
























  • do you also want to count the commas if they're in the column value?
    – Omkar Sabade
    2 hours ago










  • @OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
    – David L
    2 hours ago















up vote
7
down vote

favorite
1









up vote
7
down vote

favorite
1






1





I have a csv file as follows:



name,age
something
tom,20


And when I put it into a dataframe it looks like:



df = pd.read_csv('file', header=None)

0 1
1 name age
2 something NaN
3 tom 20


How would I get the count of a comma in the raw row data. For example, the answer should look like:



# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1









share|improve this question















I have a csv file as follows:



name,age
something
tom,20


And when I put it into a dataframe it looks like:



df = pd.read_csv('file', header=None)

0 1
1 name age
2 something NaN
3 tom 20


How would I get the count of a comma in the raw row data. For example, the answer should look like:



# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1






python python-3.x pandas csv dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 1 hour ago









coldspeed

116k18107185




116k18107185










asked 2 hours ago









Henry H

1767




1767












  • do you also want to count the commas if they're in the column value?
    – Omkar Sabade
    2 hours ago










  • @OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
    – David L
    2 hours ago




















  • do you also want to count the commas if they're in the column value?
    – Omkar Sabade
    2 hours ago










  • @OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
    – David L
    2 hours ago


















do you also want to count the commas if they're in the column value?
– Omkar Sabade
2 hours ago




do you also want to count the commas if they're in the column value?
– Omkar Sabade
2 hours ago












@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
2 hours ago






@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
2 hours ago














4 Answers
4






active

oldest

votes

















up vote
4
down vote













Doing this



df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()




Data



name,age
something
tom,20
something,,,,,somethingelse





share|improve this answer




























    up vote
    4
    down vote













    Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



    # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
    s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
    df = pd.concat([
    s.str.split(',', expand=True),
    s.str.count(',').rename('_count_sep')
    ], axis=1)

    df
    0 1 _count_sep
    0 name age 1
    1 something None 0
    2 tom 20 1





    share|improve this answer





















    • We are on the same road:-) cheers
      – W-B
      2 hours ago












    • @W-B yup did not see until I posted... great minds.. huh? ;)
      – coldspeed
      2 hours ago






    • 1




      I read your mind hahahaha:-)
      – W-B
      2 hours ago










    • But learn new strcount:-) thanks man
      – W-B
      2 hours ago






    • 1




      Your answers stopped me from thinking otherwise
      – Dark
      1 hour ago


















    up vote
    0
    down vote













    Try below code



    df = pd.read_csv('file', header=None)
    df['_count_separators'] = df.count(axis='columns')
    print(df)
    output:
    0 1 _count_separators
    1 name age 1
    2 something NaN 0
    3 tom 20 1





    share|improve this answer




























      up vote
      0
      down vote













      One line of code: len(df) - df[1].isna().sum()






      share|improve this answer





















      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
        – Dark
        2 hours ago












      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
        – Quang Hoang
        2 hours ago










      • This assumes there are only two columns...?
        – coldspeed
        2 hours ago













      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-string-occurrences-in-pandas-raw-data-row%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      4
      down vote













      Doing this



      df = pd.read_csv('file', header=None)
      df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

      df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
      0 1
      1 0
      2 1
      3 5
      Name: 0, dtype: int64

      df['_count_separators']=df2['0'].str.findall(',').str.len()




      Data



      name,age
      something
      tom,20
      something,,,,,somethingelse





      share|improve this answer

























        up vote
        4
        down vote













        Doing this



        df = pd.read_csv('file', header=None)
        df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

        df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
        0 1
        1 0
        2 1
        3 5
        Name: 0, dtype: int64

        df['_count_separators']=df2['0'].str.findall(',').str.len()




        Data



        name,age
        something
        tom,20
        something,,,,,somethingelse





        share|improve this answer























          up vote
          4
          down vote










          up vote
          4
          down vote









          Doing this



          df = pd.read_csv('file', header=None)
          df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

          df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
          0 1
          1 0
          2 1
          3 5
          Name: 0, dtype: int64

          df['_count_separators']=df2['0'].str.findall(',').str.len()




          Data



          name,age
          something
          tom,20
          something,,,,,somethingelse





          share|improve this answer












          Doing this



          df = pd.read_csv('file', header=None)
          df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

          df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
          0 1
          1 0
          2 1
          3 5
          Name: 0, dtype: int64

          df['_count_separators']=df2['0'].str.findall(',').str.len()




          Data



          name,age
          something
          tom,20
          something,,,,,somethingelse






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 hours ago









          W-B

          99.1k73162




          99.1k73162
























              up vote
              4
              down vote













              Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



              # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
              s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
              df = pd.concat([
              s.str.split(',', expand=True),
              s.str.count(',').rename('_count_sep')
              ], axis=1)

              df
              0 1 _count_sep
              0 name age 1
              1 something None 0
              2 tom 20 1





              share|improve this answer





















              • We are on the same road:-) cheers
                – W-B
                2 hours ago












              • @W-B yup did not see until I posted... great minds.. huh? ;)
                – coldspeed
                2 hours ago






              • 1




                I read your mind hahahaha:-)
                – W-B
                2 hours ago










              • But learn new strcount:-) thanks man
                – W-B
                2 hours ago






              • 1




                Your answers stopped me from thinking otherwise
                – Dark
                1 hour ago















              up vote
              4
              down vote













              Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



              # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
              s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
              df = pd.concat([
              s.str.split(',', expand=True),
              s.str.count(',').rename('_count_sep')
              ], axis=1)

              df
              0 1 _count_sep
              0 name age 1
              1 something None 0
              2 tom 20 1





              share|improve this answer





















              • We are on the same road:-) cheers
                – W-B
                2 hours ago












              • @W-B yup did not see until I posted... great minds.. huh? ;)
                – coldspeed
                2 hours ago






              • 1




                I read your mind hahahaha:-)
                – W-B
                2 hours ago










              • But learn new strcount:-) thanks man
                – W-B
                2 hours ago






              • 1




                Your answers stopped me from thinking otherwise
                – Dark
                1 hour ago













              up vote
              4
              down vote










              up vote
              4
              down vote









              Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



              # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
              s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
              df = pd.concat([
              s.str.split(',', expand=True),
              s.str.count(',').rename('_count_sep')
              ], axis=1)

              df
              0 1 _count_sep
              0 name age 1
              1 something None 0
              2 tom 20 1





              share|improve this answer












              Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



              # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
              s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
              df = pd.concat([
              s.str.split(',', expand=True),
              s.str.count(',').rename('_count_sep')
              ], axis=1)

              df
              0 1 _count_sep
              0 name age 1
              1 something None 0
              2 tom 20 1






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 2 hours ago









              coldspeed

              116k18107185




              116k18107185












              • We are on the same road:-) cheers
                – W-B
                2 hours ago












              • @W-B yup did not see until I posted... great minds.. huh? ;)
                – coldspeed
                2 hours ago






              • 1




                I read your mind hahahaha:-)
                – W-B
                2 hours ago










              • But learn new strcount:-) thanks man
                – W-B
                2 hours ago






              • 1




                Your answers stopped me from thinking otherwise
                – Dark
                1 hour ago


















              • We are on the same road:-) cheers
                – W-B
                2 hours ago












              • @W-B yup did not see until I posted... great minds.. huh? ;)
                – coldspeed
                2 hours ago






              • 1




                I read your mind hahahaha:-)
                – W-B
                2 hours ago










              • But learn new strcount:-) thanks man
                – W-B
                2 hours ago






              • 1




                Your answers stopped me from thinking otherwise
                – Dark
                1 hour ago
















              We are on the same road:-) cheers
              – W-B
              2 hours ago






              We are on the same road:-) cheers
              – W-B
              2 hours ago














              @W-B yup did not see until I posted... great minds.. huh? ;)
              – coldspeed
              2 hours ago




              @W-B yup did not see until I posted... great minds.. huh? ;)
              – coldspeed
              2 hours ago




              1




              1




              I read your mind hahahaha:-)
              – W-B
              2 hours ago




              I read your mind hahahaha:-)
              – W-B
              2 hours ago












              But learn new strcount:-) thanks man
              – W-B
              2 hours ago




              But learn new strcount:-) thanks man
              – W-B
              2 hours ago




              1




              1




              Your answers stopped me from thinking otherwise
              – Dark
              1 hour ago




              Your answers stopped me from thinking otherwise
              – Dark
              1 hour ago










              up vote
              0
              down vote













              Try below code



              df = pd.read_csv('file', header=None)
              df['_count_separators'] = df.count(axis='columns')
              print(df)
              output:
              0 1 _count_separators
              1 name age 1
              2 something NaN 0
              3 tom 20 1





              share|improve this answer

























                up vote
                0
                down vote













                Try below code



                df = pd.read_csv('file', header=None)
                df['_count_separators'] = df.count(axis='columns')
                print(df)
                output:
                0 1 _count_separators
                1 name age 1
                2 something NaN 0
                3 tom 20 1





                share|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Try below code



                  df = pd.read_csv('file', header=None)
                  df['_count_separators'] = df.count(axis='columns')
                  print(df)
                  output:
                  0 1 _count_separators
                  1 name age 1
                  2 something NaN 0
                  3 tom 20 1





                  share|improve this answer












                  Try below code



                  df = pd.read_csv('file', header=None)
                  df['_count_separators'] = df.count(axis='columns')
                  print(df)
                  output:
                  0 1 _count_separators
                  1 name age 1
                  2 something NaN 0
                  3 tom 20 1






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 2 hours ago









                  Anjaneyulu Batta

                  3,23511333




                  3,23511333






















                      up vote
                      0
                      down vote













                      One line of code: len(df) - df[1].isna().sum()






                      share|improve this answer





















                      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
                        – Dark
                        2 hours ago












                      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                        – Quang Hoang
                        2 hours ago










                      • This assumes there are only two columns...?
                        – coldspeed
                        2 hours ago

















                      up vote
                      0
                      down vote













                      One line of code: len(df) - df[1].isna().sum()






                      share|improve this answer





















                      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
                        – Dark
                        2 hours ago












                      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                        – Quang Hoang
                        2 hours ago










                      • This assumes there are only two columns...?
                        – coldspeed
                        2 hours ago















                      up vote
                      0
                      down vote










                      up vote
                      0
                      down vote









                      One line of code: len(df) - df[1].isna().sum()






                      share|improve this answer












                      One line of code: len(df) - df[1].isna().sum()







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered 2 hours ago









                      Quang Hoang

                      1,6421913




                      1,6421913












                      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
                        – Dark
                        2 hours ago












                      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                        – Quang Hoang
                        2 hours ago










                      • This assumes there are only two columns...?
                        – coldspeed
                        2 hours ago




















                      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
                        – Dark
                        2 hours ago












                      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                        – Quang Hoang
                        2 hours ago










                      • This assumes there are only two columns...?
                        – coldspeed
                        2 hours ago


















                      Ohk if the nan itself is a part of the dataset then? like something,,,something?
                      – Dark
                      2 hours ago






                      Ohk if the nan itself is a part of the dataset then? like something,,,something?
                      – Dark
                      2 hours ago














                      i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                      – Quang Hoang
                      2 hours ago




                      i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                      – Quang Hoang
                      2 hours ago












                      This assumes there are only two columns...?
                      – coldspeed
                      2 hours ago






                      This assumes there are only two columns...?
                      – coldspeed
                      2 hours ago




















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-string-occurrences-in-pandas-raw-data-row%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

                      Mangá

                      Eduardo VII do Reino Unido