Identifying an invisible character in plain text file

I am working with a plain text file that has invisible characters that I do not recognise. How can I identify them?

In Atom, they show as blanks when I toggle to show invisible characters. They do not show as a common space (the one that Atom shows as a small centered dot).

In BBEdit, it shows as a centered dot that looks slightly thicker than the common space. Replacing non-ASCII characters (with 'zap gremlins') does not replace it.

I can copy the character into a regular expression, and the query will find the character. It is not recognised as a white space character with s.

I will copy the character here (between the arrows), but I have no idea if it actually shows up! -> <-

(wow, pasting an unknown invisible character felt absurdly awkward...)

edited Feb 12 at 14:55

Blackwood

2,88671728

asked Feb 12 at 11:40

Matthijs

377

add a comment |

I am working with a plain text file that has invisible characters that I do not recognise. How can I identify them?

In Atom, they show as blanks when I toggle to show invisible characters. They do not show as a common space (the one that Atom shows as a small centered dot).

In BBEdit, it shows as a centered dot that looks slightly thicker than the common space. Replacing non-ASCII characters (with 'zap gremlins') does not replace it.

I can copy the character into a regular expression, and the query will find the character. It is not recognised as a white space character with s.

I will copy the character here (between the arrows), but I have no idea if it actually shows up! -> <-

(wow, pasting an unknown invisible character felt absurdly awkward...)

edited Feb 12 at 14:55

Blackwood

2,88671728

asked Feb 12 at 11:40

Matthijs

377

add a comment |

I am working with a plain text file that has invisible characters that I do not recognise. How can I identify them?

In Atom, they show as blanks when I toggle to show invisible characters. They do not show as a common space (the one that Atom shows as a small centered dot).

In BBEdit, it shows as a centered dot that looks slightly thicker than the common space. Replacing non-ASCII characters (with 'zap gremlins') does not replace it.

I can copy the character into a regular expression, and the query will find the character. It is not recognised as a white space character with s.

I will copy the character here (between the arrows), but I have no idea if it actually shows up! -> <-

(wow, pasting an unknown invisible character felt absurdly awkward...)

edited Feb 12 at 14:55

Blackwood

2,88671728

asked Feb 12 at 11:40

Matthijs

377

I am working with a plain text file that has invisible characters that I do not recognise. How can I identify them?

In Atom, they show as blanks when I toggle to show invisible characters. They do not show as a common space (the one that Atom shows as a small centered dot).

In BBEdit, it shows as a centered dot that looks slightly thicker than the common space. Replacing non-ASCII characters (with 'zap gremlins') does not replace it.

I can copy the character into a regular expression, and the query will find the character. It is not recognised as a white space character with s.

I will copy the character here (between the arrows), but I have no idea if it actually shows up! -> <-

(wow, pasting an unknown invisible character felt absurdly awkward...)

regex characters

edited Feb 12 at 14:55

Blackwood

2,88671728

asked Feb 12 at 11:40

Matthijs

377

edited Feb 12 at 14:55

Blackwood

2,88671728

asked Feb 12 at 11:40

Matthijs

377

edited Feb 12 at 14:55

Blackwood

2,88671728

edited Feb 12 at 14:55

Blackwood

2,88671728

edited Feb 12 at 14:55

Blackwood

2,88671728

asked Feb 12 at 11:40

Matthijs

377

asked Feb 12 at 11:40

Matthijs

377

asked Feb 12 at 11:40

Matthijs

377

add a comment |

1 Answer
1

active

oldest

votes

Using a hex editor should reveal the hex codes you could then look up or search for.

If you wanted to stick with a (bash?) terminal, you could put the whole file through hexdump / hd, or maybe grep an offending line and just pipe it to hd so you're only looking at one line, similar to:

grep "unique line text" file | hd

Or get only the Nth line with
sed 'Nq;d file'

There's also the regular expression character class for all printable characters:

‘[:print:]’

 Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.

Searching for the inverse (-v) of those might be useful, like
grep -v "[[:print:]]"

Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...

edited Feb 12 at 11:59

answered Feb 12 at 11:53

Xen2050

11k31536

1

Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

– Matthijs
Feb 12 at 13:10

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1404811%2fidentifying-an-invisible-character-in-plain-text-file%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Using a hex editor should reveal the hex codes you could then look up or search for.

grep "unique line text" file | hd

Or get only the Nth line with
sed 'Nq;d file'

There's also the regular expression character class for all printable characters:

‘[:print:]’

 Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.

Searching for the inverse (-v) of those might be useful, like
grep -v "[[:print:]]"

Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...

edited Feb 12 at 11:59

answered Feb 12 at 11:53

Xen2050

11k31536

1

Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

– Matthijs
Feb 12 at 13:10

add a comment |

Using a hex editor should reveal the hex codes you could then look up or search for.

grep "unique line text" file | hd

Or get only the Nth line with
sed 'Nq;d file'

There's also the regular expression character class for all printable characters:

‘[:print:]’

 Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.

Searching for the inverse (-v) of those might be useful, like
grep -v "[[:print:]]"

Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...

edited Feb 12 at 11:59

answered Feb 12 at 11:53

Xen2050

11k31536

1

Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

– Matthijs
Feb 12 at 13:10

add a comment |

Using a hex editor should reveal the hex codes you could then look up or search for.

grep "unique line text" file | hd

Or get only the Nth line with
sed 'Nq;d file'

There's also the regular expression character class for all printable characters:

‘[:print:]’

 Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.

Searching for the inverse (-v) of those might be useful, like
grep -v "[[:print:]]"

Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...

edited Feb 12 at 11:59

answered Feb 12 at 11:53

Xen2050

11k31536

Using a hex editor should reveal the hex codes you could then look up or search for.

grep "unique line text" file | hd

Or get only the Nth line with
sed 'Nq;d file'

There's also the regular expression character class for all printable characters:

‘[:print:]’

 Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.

Searching for the inverse (-v) of those might be useful, like
grep -v "[[:print:]]"

Or if you can copy it successfully, you could just paste it into a hex editor, or an echo " " | hd command...

edited Feb 12 at 11:59

answered Feb 12 at 11:53

Xen2050

11k31536

edited Feb 12 at 11:59

answered Feb 12 at 11:53

Xen2050

11k31536

answered Feb 12 at 11:53

Xen2050

11k31536

answered Feb 12 at 11:53

Xen2050

11k31536

1

Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

– Matthijs
Feb 12 at 13:10

add a comment |

1

Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

– Matthijs
Feb 12 at 13:10

Thanks! Apparently it was a non-breaking space (c2 a0). In my regex u00A0 selects the character.

– Matthijs
Feb 12 at 13:10

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtyktl