If an empty line is touching another, remove it, otherwise leave it












0















Regarding is this a duplicate: There are similarly worded questions such as https://unix.stackexchange.com/questions/76061/can-sed-remove-double-newline-characters or https://stackoverflow.com/questions/27510462/how-can-i-remove-double-line-breaks-with-sed - on the popular first, although the original question arguably is the same as mine, its accepted and most upvoted question removes all empty lines, not just "when there are 2 or more together" like the question asked. Some comments complain that that answer and others behave that way, but no answers are given to leave a single empty line be. Some other answers turn duplicate empty lines into a single empty line (squeezing), rather than removing them entirely.





I'm looking for a scriptable way to remove back to back empty lines, but leave single empty lines there.



I'm looking to automatically clean up .srt (subtitle) files. The format requires newlines to be between subtitle sections (what to display at a particular amount of time.) Usually, if there's 2 lines to be displayed at once, the subtitle author just has the 2 lines. There's another style that some authors use of placing 2 empty lines between the lines to be displayed. On my device, this has the effect of displaying the first line only, and presumably rendering the second line off the TV.



So, I'd like to change this:



1
00:00:01,800 --> 00:00:03,802
First line is here


Second line is here

2
...


Into this:



1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here

2
...


Not that it probably needs to be handled differently, but the file format requires there be an empty line at the bottom of the file, which must be left there.



I want this to work probably by first removing trailing whitespace, then only removing all empty lines that touch another empty line. I don't want it to be anchored based off the rest of the format of a .srt, like having to do with how many lines are between numbered sections. (I've thought that all empty lines could be removed, and newlines could be added back in on lines containing only numerical characters, but I'm hoping to keep it more generic than that, ignoring the actual .srt format.)



Also, if for some reason a .srt has more than 2 lines of text, I'd like it left that way.



So, perhaps something along the lines of:



cat some.srt | sed 's/[ t]*$//' | SOMETHING_ELSE


I'd prefer a bash, sed, or awk solution over a perl one. If I understand right, I think awk will be easier to implement it in rather than sed, being multi-line.










share|improve this question

























  • If I understood right, this sed script would work sed -r ':a;N;${:b;s/n[[:blank:]]+n/nn/;tb;s/n{3,}/n/g;s/n+$/n/};ba'.

    – Paulo
    Jan 24 at 13:59
















0















Regarding is this a duplicate: There are similarly worded questions such as https://unix.stackexchange.com/questions/76061/can-sed-remove-double-newline-characters or https://stackoverflow.com/questions/27510462/how-can-i-remove-double-line-breaks-with-sed - on the popular first, although the original question arguably is the same as mine, its accepted and most upvoted question removes all empty lines, not just "when there are 2 or more together" like the question asked. Some comments complain that that answer and others behave that way, but no answers are given to leave a single empty line be. Some other answers turn duplicate empty lines into a single empty line (squeezing), rather than removing them entirely.





I'm looking for a scriptable way to remove back to back empty lines, but leave single empty lines there.



I'm looking to automatically clean up .srt (subtitle) files. The format requires newlines to be between subtitle sections (what to display at a particular amount of time.) Usually, if there's 2 lines to be displayed at once, the subtitle author just has the 2 lines. There's another style that some authors use of placing 2 empty lines between the lines to be displayed. On my device, this has the effect of displaying the first line only, and presumably rendering the second line off the TV.



So, I'd like to change this:



1
00:00:01,800 --> 00:00:03,802
First line is here


Second line is here

2
...


Into this:



1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here

2
...


Not that it probably needs to be handled differently, but the file format requires there be an empty line at the bottom of the file, which must be left there.



I want this to work probably by first removing trailing whitespace, then only removing all empty lines that touch another empty line. I don't want it to be anchored based off the rest of the format of a .srt, like having to do with how many lines are between numbered sections. (I've thought that all empty lines could be removed, and newlines could be added back in on lines containing only numerical characters, but I'm hoping to keep it more generic than that, ignoring the actual .srt format.)



Also, if for some reason a .srt has more than 2 lines of text, I'd like it left that way.



So, perhaps something along the lines of:



cat some.srt | sed 's/[ t]*$//' | SOMETHING_ELSE


I'd prefer a bash, sed, or awk solution over a perl one. If I understand right, I think awk will be easier to implement it in rather than sed, being multi-line.










share|improve this question

























  • If I understood right, this sed script would work sed -r ':a;N;${:b;s/n[[:blank:]]+n/nn/;tb;s/n{3,}/n/g;s/n+$/n/};ba'.

    – Paulo
    Jan 24 at 13:59














0












0








0








Regarding is this a duplicate: There are similarly worded questions such as https://unix.stackexchange.com/questions/76061/can-sed-remove-double-newline-characters or https://stackoverflow.com/questions/27510462/how-can-i-remove-double-line-breaks-with-sed - on the popular first, although the original question arguably is the same as mine, its accepted and most upvoted question removes all empty lines, not just "when there are 2 or more together" like the question asked. Some comments complain that that answer and others behave that way, but no answers are given to leave a single empty line be. Some other answers turn duplicate empty lines into a single empty line (squeezing), rather than removing them entirely.





I'm looking for a scriptable way to remove back to back empty lines, but leave single empty lines there.



I'm looking to automatically clean up .srt (subtitle) files. The format requires newlines to be between subtitle sections (what to display at a particular amount of time.) Usually, if there's 2 lines to be displayed at once, the subtitle author just has the 2 lines. There's another style that some authors use of placing 2 empty lines between the lines to be displayed. On my device, this has the effect of displaying the first line only, and presumably rendering the second line off the TV.



So, I'd like to change this:



1
00:00:01,800 --> 00:00:03,802
First line is here


Second line is here

2
...


Into this:



1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here

2
...


Not that it probably needs to be handled differently, but the file format requires there be an empty line at the bottom of the file, which must be left there.



I want this to work probably by first removing trailing whitespace, then only removing all empty lines that touch another empty line. I don't want it to be anchored based off the rest of the format of a .srt, like having to do with how many lines are between numbered sections. (I've thought that all empty lines could be removed, and newlines could be added back in on lines containing only numerical characters, but I'm hoping to keep it more generic than that, ignoring the actual .srt format.)



Also, if for some reason a .srt has more than 2 lines of text, I'd like it left that way.



So, perhaps something along the lines of:



cat some.srt | sed 's/[ t]*$//' | SOMETHING_ELSE


I'd prefer a bash, sed, or awk solution over a perl one. If I understand right, I think awk will be easier to implement it in rather than sed, being multi-line.










share|improve this question
















Regarding is this a duplicate: There are similarly worded questions such as https://unix.stackexchange.com/questions/76061/can-sed-remove-double-newline-characters or https://stackoverflow.com/questions/27510462/how-can-i-remove-double-line-breaks-with-sed - on the popular first, although the original question arguably is the same as mine, its accepted and most upvoted question removes all empty lines, not just "when there are 2 or more together" like the question asked. Some comments complain that that answer and others behave that way, but no answers are given to leave a single empty line be. Some other answers turn duplicate empty lines into a single empty line (squeezing), rather than removing them entirely.





I'm looking for a scriptable way to remove back to back empty lines, but leave single empty lines there.



I'm looking to automatically clean up .srt (subtitle) files. The format requires newlines to be between subtitle sections (what to display at a particular amount of time.) Usually, if there's 2 lines to be displayed at once, the subtitle author just has the 2 lines. There's another style that some authors use of placing 2 empty lines between the lines to be displayed. On my device, this has the effect of displaying the first line only, and presumably rendering the second line off the TV.



So, I'd like to change this:



1
00:00:01,800 --> 00:00:03,802
First line is here


Second line is here

2
...


Into this:



1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here

2
...


Not that it probably needs to be handled differently, but the file format requires there be an empty line at the bottom of the file, which must be left there.



I want this to work probably by first removing trailing whitespace, then only removing all empty lines that touch another empty line. I don't want it to be anchored based off the rest of the format of a .srt, like having to do with how many lines are between numbered sections. (I've thought that all empty lines could be removed, and newlines could be added back in on lines containing only numerical characters, but I'm hoping to keep it more generic than that, ignoring the actual .srt format.)



Also, if for some reason a .srt has more than 2 lines of text, I'd like it left that way.



So, perhaps something along the lines of:



cat some.srt | sed 's/[ t]*$//' | SOMETHING_ELSE


I'd prefer a bash, sed, or awk solution over a perl one. If I understand right, I think awk will be easier to implement it in rather than sed, being multi-line.







bash sed awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 23 at 2:50







user1902689

















asked Jan 23 at 0:08









user1902689user1902689

1479




1479













  • If I understood right, this sed script would work sed -r ':a;N;${:b;s/n[[:blank:]]+n/nn/;tb;s/n{3,}/n/g;s/n+$/n/};ba'.

    – Paulo
    Jan 24 at 13:59



















  • If I understood right, this sed script would work sed -r ':a;N;${:b;s/n[[:blank:]]+n/nn/;tb;s/n{3,}/n/g;s/n+$/n/};ba'.

    – Paulo
    Jan 24 at 13:59

















If I understood right, this sed script would work sed -r ':a;N;${:b;s/n[[:blank:]]+n/nn/;tb;s/n{3,}/n/g;s/n+$/n/};ba'.

– Paulo
Jan 24 at 13:59





If I understood right, this sed script would work sed -r ':a;N;${:b;s/n[[:blank:]]+n/nn/;tb;s/n{3,}/n/g;s/n+$/n/};ba'.

– Paulo
Jan 24 at 13:59










1 Answer
1






active

oldest

votes


















0














If the rest of the adjacent lines in your files are unique, and it's just the adjacent blank lines you want to remove, you could just use uniq:




uniq - report or omit repeated lines



Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or
standard output).



With no options, matching lines are merged to the first occurrence.




Running your example file through it returns:



$ uniq testfile
1
00:00:01,800 --> 00:00:03,802
First line is here

Second line is here

2
...


PS. your example does not do what the subject seems to request, it deletes all the blank lines between First & Second - it doesn't leave a single empty line.



Interestingly, using uniq -u (only print unique lines) on your example file gives the results in your example output (it deletes the two blank lines, leaving none between First & Second):



$ uniq -u testfile
1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here

2
...





share|improve this answer


























  • You're absolutely right about my title. Looking at it again, I know what I meant by my title, but it's ambiguous at best. By "Remove multiple back to back empty lines, leave single empty lines", I meant: "for multiple back to back empty lines, remove all of them; for single empty lines not back to back with any others, leave them be." I'll edit the title.

    – user1902689
    Jan 23 at 2:41













  • I'm hoping for a solution that replaces what could be described in a multiline regex as replacing n+ with nothing, which leaves a single n in place. But, absent such as olution, uniq -u should probably work. Unless there's 2 identical subtitle lines being displayed at once like two characters saying the same thing, it should work pretty well.

    – user1902689
    Jan 23 at 2:46













  • It's always nice when a coreutils program will almost solve the problem on it's own, without needing any regex or scripting. You could check a file for duplicate line first, looking for anything that's not blank, with uniq's options -d, --repeated "only print duplicate lines, one for each group" or -D "print all duplicate lines".

    – Xen2050
    Jan 23 at 3:26











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1397228%2fif-an-empty-line-is-touching-another-remove-it-otherwise-leave-it%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














If the rest of the adjacent lines in your files are unique, and it's just the adjacent blank lines you want to remove, you could just use uniq:




uniq - report or omit repeated lines



Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or
standard output).



With no options, matching lines are merged to the first occurrence.




Running your example file through it returns:



$ uniq testfile
1
00:00:01,800 --> 00:00:03,802
First line is here

Second line is here

2
...


PS. your example does not do what the subject seems to request, it deletes all the blank lines between First & Second - it doesn't leave a single empty line.



Interestingly, using uniq -u (only print unique lines) on your example file gives the results in your example output (it deletes the two blank lines, leaving none between First & Second):



$ uniq -u testfile
1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here

2
...





share|improve this answer


























  • You're absolutely right about my title. Looking at it again, I know what I meant by my title, but it's ambiguous at best. By "Remove multiple back to back empty lines, leave single empty lines", I meant: "for multiple back to back empty lines, remove all of them; for single empty lines not back to back with any others, leave them be." I'll edit the title.

    – user1902689
    Jan 23 at 2:41













  • I'm hoping for a solution that replaces what could be described in a multiline regex as replacing n+ with nothing, which leaves a single n in place. But, absent such as olution, uniq -u should probably work. Unless there's 2 identical subtitle lines being displayed at once like two characters saying the same thing, it should work pretty well.

    – user1902689
    Jan 23 at 2:46













  • It's always nice when a coreutils program will almost solve the problem on it's own, without needing any regex or scripting. You could check a file for duplicate line first, looking for anything that's not blank, with uniq's options -d, --repeated "only print duplicate lines, one for each group" or -D "print all duplicate lines".

    – Xen2050
    Jan 23 at 3:26
















0














If the rest of the adjacent lines in your files are unique, and it's just the adjacent blank lines you want to remove, you could just use uniq:




uniq - report or omit repeated lines



Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or
standard output).



With no options, matching lines are merged to the first occurrence.




Running your example file through it returns:



$ uniq testfile
1
00:00:01,800 --> 00:00:03,802
First line is here

Second line is here

2
...


PS. your example does not do what the subject seems to request, it deletes all the blank lines between First & Second - it doesn't leave a single empty line.



Interestingly, using uniq -u (only print unique lines) on your example file gives the results in your example output (it deletes the two blank lines, leaving none between First & Second):



$ uniq -u testfile
1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here

2
...





share|improve this answer


























  • You're absolutely right about my title. Looking at it again, I know what I meant by my title, but it's ambiguous at best. By "Remove multiple back to back empty lines, leave single empty lines", I meant: "for multiple back to back empty lines, remove all of them; for single empty lines not back to back with any others, leave them be." I'll edit the title.

    – user1902689
    Jan 23 at 2:41













  • I'm hoping for a solution that replaces what could be described in a multiline regex as replacing n+ with nothing, which leaves a single n in place. But, absent such as olution, uniq -u should probably work. Unless there's 2 identical subtitle lines being displayed at once like two characters saying the same thing, it should work pretty well.

    – user1902689
    Jan 23 at 2:46













  • It's always nice when a coreutils program will almost solve the problem on it's own, without needing any regex or scripting. You could check a file for duplicate line first, looking for anything that's not blank, with uniq's options -d, --repeated "only print duplicate lines, one for each group" or -D "print all duplicate lines".

    – Xen2050
    Jan 23 at 3:26














0












0








0







If the rest of the adjacent lines in your files are unique, and it's just the adjacent blank lines you want to remove, you could just use uniq:




uniq - report or omit repeated lines



Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or
standard output).



With no options, matching lines are merged to the first occurrence.




Running your example file through it returns:



$ uniq testfile
1
00:00:01,800 --> 00:00:03,802
First line is here

Second line is here

2
...


PS. your example does not do what the subject seems to request, it deletes all the blank lines between First & Second - it doesn't leave a single empty line.



Interestingly, using uniq -u (only print unique lines) on your example file gives the results in your example output (it deletes the two blank lines, leaving none between First & Second):



$ uniq -u testfile
1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here

2
...





share|improve this answer















If the rest of the adjacent lines in your files are unique, and it's just the adjacent blank lines you want to remove, you could just use uniq:




uniq - report or omit repeated lines



Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or
standard output).



With no options, matching lines are merged to the first occurrence.




Running your example file through it returns:



$ uniq testfile
1
00:00:01,800 --> 00:00:03,802
First line is here

Second line is here

2
...


PS. your example does not do what the subject seems to request, it deletes all the blank lines between First & Second - it doesn't leave a single empty line.



Interestingly, using uniq -u (only print unique lines) on your example file gives the results in your example output (it deletes the two blank lines, leaving none between First & Second):



$ uniq -u testfile
1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here

2
...






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 23 at 0:20

























answered Jan 23 at 0:15









Xen2050Xen2050

10.9k31536




10.9k31536













  • You're absolutely right about my title. Looking at it again, I know what I meant by my title, but it's ambiguous at best. By "Remove multiple back to back empty lines, leave single empty lines", I meant: "for multiple back to back empty lines, remove all of them; for single empty lines not back to back with any others, leave them be." I'll edit the title.

    – user1902689
    Jan 23 at 2:41













  • I'm hoping for a solution that replaces what could be described in a multiline regex as replacing n+ with nothing, which leaves a single n in place. But, absent such as olution, uniq -u should probably work. Unless there's 2 identical subtitle lines being displayed at once like two characters saying the same thing, it should work pretty well.

    – user1902689
    Jan 23 at 2:46













  • It's always nice when a coreutils program will almost solve the problem on it's own, without needing any regex or scripting. You could check a file for duplicate line first, looking for anything that's not blank, with uniq's options -d, --repeated "only print duplicate lines, one for each group" or -D "print all duplicate lines".

    – Xen2050
    Jan 23 at 3:26



















  • You're absolutely right about my title. Looking at it again, I know what I meant by my title, but it's ambiguous at best. By "Remove multiple back to back empty lines, leave single empty lines", I meant: "for multiple back to back empty lines, remove all of them; for single empty lines not back to back with any others, leave them be." I'll edit the title.

    – user1902689
    Jan 23 at 2:41













  • I'm hoping for a solution that replaces what could be described in a multiline regex as replacing n+ with nothing, which leaves a single n in place. But, absent such as olution, uniq -u should probably work. Unless there's 2 identical subtitle lines being displayed at once like two characters saying the same thing, it should work pretty well.

    – user1902689
    Jan 23 at 2:46













  • It's always nice when a coreutils program will almost solve the problem on it's own, without needing any regex or scripting. You could check a file for duplicate line first, looking for anything that's not blank, with uniq's options -d, --repeated "only print duplicate lines, one for each group" or -D "print all duplicate lines".

    – Xen2050
    Jan 23 at 3:26

















You're absolutely right about my title. Looking at it again, I know what I meant by my title, but it's ambiguous at best. By "Remove multiple back to back empty lines, leave single empty lines", I meant: "for multiple back to back empty lines, remove all of them; for single empty lines not back to back with any others, leave them be." I'll edit the title.

– user1902689
Jan 23 at 2:41







You're absolutely right about my title. Looking at it again, I know what I meant by my title, but it's ambiguous at best. By "Remove multiple back to back empty lines, leave single empty lines", I meant: "for multiple back to back empty lines, remove all of them; for single empty lines not back to back with any others, leave them be." I'll edit the title.

– user1902689
Jan 23 at 2:41















I'm hoping for a solution that replaces what could be described in a multiline regex as replacing n+ with nothing, which leaves a single n in place. But, absent such as olution, uniq -u should probably work. Unless there's 2 identical subtitle lines being displayed at once like two characters saying the same thing, it should work pretty well.

– user1902689
Jan 23 at 2:46







I'm hoping for a solution that replaces what could be described in a multiline regex as replacing n+ with nothing, which leaves a single n in place. But, absent such as olution, uniq -u should probably work. Unless there's 2 identical subtitle lines being displayed at once like two characters saying the same thing, it should work pretty well.

– user1902689
Jan 23 at 2:46















It's always nice when a coreutils program will almost solve the problem on it's own, without needing any regex or scripting. You could check a file for duplicate line first, looking for anything that's not blank, with uniq's options -d, --repeated "only print duplicate lines, one for each group" or -D "print all duplicate lines".

– Xen2050
Jan 23 at 3:26





It's always nice when a coreutils program will almost solve the problem on it's own, without needing any regex or scripting. You could check a file for duplicate line first, looking for anything that's not blank, with uniq's options -d, --repeated "only print duplicate lines, one for each group" or -D "print all duplicate lines".

– Xen2050
Jan 23 at 3:26


















draft saved

draft discarded




















































Thanks for contributing an answer to Super User!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1397228%2fif-an-empty-line-is-touching-another-remove-it-otherwise-leave-it%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

Mangá

 ⁒  ․,‪⁊‑⁙ ⁖, ⁇‒※‌, †,⁖‗‌⁝    ‾‸⁘,‖⁔⁣,⁂‾
”‑,‥–,‬ ,⁀‹⁋‴⁑ ‒ ,‴⁋”‼ ⁨,‷⁔„ ‰′,‐‚ ‥‡‎“‷⁃⁨⁅⁣,⁔
⁇‘⁔⁡⁏⁌⁡‿‶‏⁨ ⁣⁕⁖⁨⁩⁥‽⁀  ‴‬⁜‟ ⁃‣‧⁕‮ …‍⁨‴ ⁩,⁚⁖‫ ,‵ ⁀,‮⁝‣‣ ⁑  ⁂– ․, ‾‽ ‏⁁“⁗‸ ‾… ‹‡⁌⁎‸‘ ‡⁏⁌‪ ‵⁛ ‎⁨ ―⁦⁤⁄⁕