Identifying an invisible character in plain text file












2















I am working with a plain text file that has invisible characters that I do not recognise. How can I identify them?



In Atom, they show as blanks when I toggle to show invisible characters. They do not show as a common space (the one that Atom shows as a small centered dot).



In BBEdit, it shows as a centered dot that looks slightly thicker than the common space. Replacing non-ASCII characters (with 'zap gremlins') does not replace it.



I can copy the character into a regular expression, and the query will find the character. It is not recognised as a white space character with s.



I will copy the character here (between the arrows), but I have no idea if it actually shows up! -> <-



(wow, pasting an unknown invisible character felt absurdly awkward...)










share|improve this question





























    2















    I am working with a plain text file that has invisible characters that I do not recognise. How can I identify them?



    In Atom, they show as blanks when I toggle to show invisible characters. They do not show as a common space (the one that Atom shows as a small centered dot).



    In BBEdit, it shows as a centered dot that looks slightly thicker than the common space. Replacing non-ASCII characters (with 'zap gremlins') does not replace it.



    I can copy the character into a regular expression, and the query will find the character. It is not recognised as a white space character with s.



    I will copy the character here (between the arrows), but I have no idea if it actually shows up! -> <-



    (wow, pasting an unknown invisible character felt absurdly awkward...)










    share|improve this question



























      2












      2








      2








      I am working with a plain text file that has invisible characters that I do not recognise. How can I identify them?



      In Atom, they show as blanks when I toggle to show invisible characters. They do not show as a common space (the one that Atom shows as a small centered dot).



      In BBEdit, it shows as a centered dot that looks slightly thicker than the common space. Replacing non-ASCII characters (with 'zap gremlins') does not replace it.



      I can copy the character into a regular expression, and the query will find the character. It is not recognised as a white space character with s.



      I will copy the character here (between the arrows), but I have no idea if it actually shows up! -> <-



      (wow, pasting an unknown invisible character felt absurdly awkward...)










      share|improve this question
















      I am working with a plain text file that has invisible characters that I do not recognise. How can I identify them?



      In Atom, they show as blanks when I toggle to show invisible characters. They do not show as a common space (the one that Atom shows as a small centered dot).



      In BBEdit, it shows as a centered dot that looks slightly thicker than the common space. Replacing non-ASCII characters (with 'zap gremlins') does not replace it.



      I can copy the character into a regular expression, and the query will find the character. It is not recognised as a white space character with s.



      I will copy the character here (between the arrows), but I have no idea if it actually shows up! -> <-



      (wow, pasting an unknown invisible character felt absurdly awkward...)







      regex characters






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 12 at 14:55









      Blackwood

      2,88671728




      2,88671728










      asked Feb 12 at 11:40









      MatthijsMatthijs

      377




      377






















          1 Answer
          1






          active

          oldest

          votes


















          4














          Using a hex editor should reveal the hex codes you could then look up or search for.



          If you wanted to stick with a (bash?) terminal, you could put the whole file through hexdump / hd, or maybe grep an offending line and just pipe it to hd so you're only looking at one line, similar to:



          grep "unique line text" file | hd


          Or get only the Nth line with
          sed 'Nq;d file'



          There's also the regular expression character class for all printable characters:




          ‘[:print:]’
          Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.



          Searching for the inverse (-v) of those might be useful, like
          grep -v "[[:print:]]"



          Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...






          share|improve this answer





















          • 1





            Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

            – Matthijs
            Feb 12 at 13:10











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "3"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1404811%2fidentifying-an-invisible-character-in-plain-text-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4














          Using a hex editor should reveal the hex codes you could then look up or search for.



          If you wanted to stick with a (bash?) terminal, you could put the whole file through hexdump / hd, or maybe grep an offending line and just pipe it to hd so you're only looking at one line, similar to:



          grep "unique line text" file | hd


          Or get only the Nth line with
          sed 'Nq;d file'



          There's also the regular expression character class for all printable characters:




          ‘[:print:]’
          Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.



          Searching for the inverse (-v) of those might be useful, like
          grep -v "[[:print:]]"



          Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...






          share|improve this answer





















          • 1





            Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

            – Matthijs
            Feb 12 at 13:10
















          4














          Using a hex editor should reveal the hex codes you could then look up or search for.



          If you wanted to stick with a (bash?) terminal, you could put the whole file through hexdump / hd, or maybe grep an offending line and just pipe it to hd so you're only looking at one line, similar to:



          grep "unique line text" file | hd


          Or get only the Nth line with
          sed 'Nq;d file'



          There's also the regular expression character class for all printable characters:




          ‘[:print:]’
          Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.



          Searching for the inverse (-v) of those might be useful, like
          grep -v "[[:print:]]"



          Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...






          share|improve this answer





















          • 1





            Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

            – Matthijs
            Feb 12 at 13:10














          4












          4








          4







          Using a hex editor should reveal the hex codes you could then look up or search for.



          If you wanted to stick with a (bash?) terminal, you could put the whole file through hexdump / hd, or maybe grep an offending line and just pipe it to hd so you're only looking at one line, similar to:



          grep "unique line text" file | hd


          Or get only the Nth line with
          sed 'Nq;d file'



          There's also the regular expression character class for all printable characters:




          ‘[:print:]’
          Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.



          Searching for the inverse (-v) of those might be useful, like
          grep -v "[[:print:]]"



          Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...






          share|improve this answer















          Using a hex editor should reveal the hex codes you could then look up or search for.



          If you wanted to stick with a (bash?) terminal, you could put the whole file through hexdump / hd, or maybe grep an offending line and just pipe it to hd so you're only looking at one line, similar to:



          grep "unique line text" file | hd


          Or get only the Nth line with
          sed 'Nq;d file'



          There's also the regular expression character class for all printable characters:




          ‘[:print:]’
          Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.



          Searching for the inverse (-v) of those might be useful, like
          grep -v "[[:print:]]"



          Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 12 at 11:59

























          answered Feb 12 at 11:53









          Xen2050Xen2050

          11k31536




          11k31536








          • 1





            Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

            – Matthijs
            Feb 12 at 13:10














          • 1





            Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

            – Matthijs
            Feb 12 at 13:10








          1




          1





          Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

          – Matthijs
          Feb 12 at 13:10





          Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

          – Matthijs
          Feb 12 at 13:10


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Super User!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1404811%2fidentifying-an-invisible-character-in-plain-text-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

          Mangá

          Eduardo VII do Reino Unido