Get Non matching string from file1 to file2

I have two files - file1 & file2.

file1 contains (only words):

ABC 

YUI 

GHJ 

I8O

file2 contains many paragraphs:

dfghjo ABC kll 

njjgg bla bla GHJ 

njhjckhv chasjvackvh .. 

ihbjhi hbhibb jh jbiibi

I am using the command below to get the matching lines which contains word from file1 in file2:

grep -Ff file1 file2

(Gives output of lines where words of file1 found in file2)

I also need the words from file1 which are not found in file2.

Can anyone help in getting this output:

YUI 

I8O

I am looking for a one liner command (via grep, awk, sed), as I am using pssh command and can't use while or for loops.

edited Feb 19 at 9:23

terdon♦

67k13139221

asked Feb 19 at 9:20

Sin15

add a comment |

I have two files - file1 & file2.

file1 contains (only words):

ABC 

YUI 

GHJ 

I8O

file2 contains many paragraphs:

dfghjo ABC kll 

njjgg bla bla GHJ 

njhjckhv chasjvackvh .. 

ihbjhi hbhibb jh jbiibi

I am using the command below to get the matching lines which contains word from file1 in file2:

grep -Ff file1 file2

(Gives output of lines where words of file1 found in file2)

I also need the words from file1 which are not found in file2.

Can anyone help in getting this output:

YUI 

I8O

I am looking for a one liner command (via grep, awk, sed), as I am using pssh command and can't use while or for loops.

edited Feb 19 at 9:23

terdon♦

67k13139221

asked Feb 19 at 9:20

Sin15

add a comment |

I have two files - file1 & file2.

file1 contains (only words):

ABC 

YUI 

GHJ 

I8O

file2 contains many paragraphs:

dfghjo ABC kll 

njjgg bla bla GHJ 

njhjckhv chasjvackvh .. 

ihbjhi hbhibb jh jbiibi

I am using the command below to get the matching lines which contains word from file1 in file2:

grep -Ff file1 file2

(Gives output of lines where words of file1 found in file2)

I also need the words from file1 which are not found in file2.

Can anyone help in getting this output:

YUI 

I8O

I am looking for a one liner command (via grep, awk, sed), as I am using pssh command and can't use while or for loops.

edited Feb 19 at 9:23

terdon♦

67k13139221

asked Feb 19 at 9:20

Sin15

I have two files - file1 & file2.

file1 contains (only words):

ABC 

YUI 

GHJ 

I8O

file2 contains many paragraphs:

dfghjo ABC kll 

njjgg bla bla GHJ 

njhjckhv chasjvackvh .. 

ihbjhi hbhibb jh jbiibi

I am using the command below to get the matching lines which contains word from file1 in file2:

grep -Ff file1 file2

(Gives output of lines where words of file1 found in file2)

I also need the words from file1 which are not found in file2.

Can anyone help in getting this output:

YUI 

I8O

I am looking for a one liner command (via grep, awk, sed), as I am using pssh command and can't use while or for loops.

bash text-processing grep sed awk

edited Feb 19 at 9:23

terdon♦

67k13139221

asked Feb 19 at 9:20

Sin15

edited Feb 19 at 9:23

terdon♦

67k13139221

asked Feb 19 at 9:20

Sin15

edited Feb 19 at 9:23

terdon♦

67k13139221

edited Feb 19 at 9:23

terdon♦

67k13139221

edited Feb 19 at 9:23

terdon♦

67k13139221

asked Feb 19 at 9:20

Sin15

asked Feb 19 at 9:20

Sin15

asked Feb 19 at 9:20

Sin15

add a comment |

2 Answers
2

active

oldest

votes

Here's one way in awk:

$ awk 'NR==FNR{a[$1]++; next}{for(i in a){if($0 ~ i){found[i]++}}}END{for(i in a){if(!found[i]){print i}}}' file1 file2 

YUI

I8O

Or, a bit more legibly:

$ awk 'NR==FNR{

        a[$1]++; 

        next

       }

       {

        for(i in a){

            if($0 ~ i){

                found[i]++

            }

        }

       }

       END{

        for(i in a){

            if(!found[i]){

                print i

            }

        }

       }' file1 file2 

YUI

I8O

Explanation

NR==FNR : NR is the current line number and FNR is the current line number of the current file. When processing multiple files, the two will be equal only while reading the first file. So this is an easy way of saying "do this for the 1st file only".

a[$1]++; next : while reading the first file, save each word (the first and only field) in the array a and then skip to the next line. The next also ensures that the rest of the command is not run for the first file.

for(i in a){ if($0 ~ i){ found[i]++ } }: For each of the words found in the first file (the keys of array a), check if the current line matches that word. If it does, save the word in the found array. This is run for each line of the second input file.

END{ }: do this after you've processed all input files.

for(i in a){ if(!found[i]){ print i } }: for each of the words in a, if the word is not also in the found array, print that word.

Alternatively, you can use some of the core Linux utilities:

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

Explanation

$ grep -hoP 'w+' file1 file2

ABC

YUI

GHJ

I8O

dfghjo

ABC

kll

njjgg

bla

bla

GHJ

njhjckhv

chasjvackvh

ihbjhi

hbhibb

jh

jbiibi

This will print all the words found in each file. The -o flag means "only print the matching portion of the line", the -P enables Perl Compatible Regular Expressions (PCRE) which let us use w to mean "any word character" (so letters, numbers, _).

$ grep -hoP 'w+' file1 file2 | sort | uniq -u

chasjvackvh

dfghjo

hbhibb

I8O

ihbjhi

jbiibi

jh

kll

njhjckhv

njjgg

YUI

Now we pass the output of the previous command through sort and uniq -u to keep only unique matches: these are the words that are only present in one of the two files.

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

Finally, we feed this list of unique words to xargs and have it grep each of them in file1. Only those unique words that are present in file1 will be returned, and unique words present in file1 are therefore not present in file2.

edited Feb 19 at 9:42

answered Feb 19 at 9:32

terdon♦

67k13139221

Super, Its working . Thank a lot @terdon

– Sin15
Feb 19 at 9:35

2

@Sin15 you're welcome. Have a look at the updated answer, I added a much shorter and simpler version as well. Also, if this answer solved your issue, please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites. Feel free to wait for more answers and accept another one, just remember to eventually accept one.

– terdon♦
Feb 19 at 9:43

add a comment |

try this command:

grep -oFf file1 file2 | grep -vFf - file1

where first use file1 as PATTERN and get only the part of a matching line that matches PATTERN in file2, first command give you:

ABC

GHJ

then use this output as input file PATTERN and search line in file1 that doesn't match PATTERN, and you will get:

YUI

I8O

Tested on Red Hat Enterprise Linux ES release 4 (Nahant Update 3)

edited Feb 19 at 12:24

answered Feb 19 at 11:31

Lety

4,98521730

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1119459%2fget-non-matching-string-from-file1-to-file2%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Here's one way in awk:

$ awk 'NR==FNR{a[$1]++; next}{for(i in a){if($0 ~ i){found[i]++}}}END{for(i in a){if(!found[i]){print i}}}' file1 file2 

YUI

I8O

Or, a bit more legibly:

$ awk 'NR==FNR{

        a[$1]++; 

        next

       }

       {

        for(i in a){

            if($0 ~ i){

                found[i]++

            }

        }

       }

       END{

        for(i in a){

            if(!found[i]){

                print i

            }

        }

       }' file1 file2 

YUI

I8O

Explanation

NR==FNR : NR is the current line number and FNR is the current line number of the current file. When processing multiple files, the two will be equal only while reading the first file. So this is an easy way of saying "do this for the 1st file only".

a[$1]++; next : while reading the first file, save each word (the first and only field) in the array a and then skip to the next line. The next also ensures that the rest of the command is not run for the first file.

for(i in a){ if($0 ~ i){ found[i]++ } }: For each of the words found in the first file (the keys of array a), check if the current line matches that word. If it does, save the word in the found array. This is run for each line of the second input file.

END{ }: do this after you've processed all input files.

for(i in a){ if(!found[i]){ print i } }: for each of the words in a, if the word is not also in the found array, print that word.

Alternatively, you can use some of the core Linux utilities:

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

Explanation

$ grep -hoP 'w+' file1 file2

ABC

YUI

GHJ

I8O

dfghjo

ABC

kll

njjgg

bla

bla

GHJ

njhjckhv

chasjvackvh

ihbjhi

hbhibb

jh

jbiibi

$ grep -hoP 'w+' file1 file2 | sort | uniq -u

chasjvackvh

dfghjo

hbhibb

I8O

ihbjhi

jbiibi

jh

kll

njhjckhv

njjgg

YUI

Now we pass the output of the previous command through sort and uniq -u to keep only unique matches: these are the words that are only present in one of the two files.

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

edited Feb 19 at 9:42

answered Feb 19 at 9:32

terdon♦

67k13139221

Super, Its working . Thank a lot @terdon

– Sin15
Feb 19 at 9:35

2

@Sin15 you're welcome. Have a look at the updated answer, I added a much shorter and simpler version as well. Also, if this answer solved your issue, please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites. Feel free to wait for more answers and accept another one, just remember to eventually accept one.

– terdon♦
Feb 19 at 9:43

add a comment |

Here's one way in awk:

$ awk 'NR==FNR{a[$1]++; next}{for(i in a){if($0 ~ i){found[i]++}}}END{for(i in a){if(!found[i]){print i}}}' file1 file2 

YUI

I8O

Or, a bit more legibly:

$ awk 'NR==FNR{

        a[$1]++; 

        next

       }

       {

        for(i in a){

            if($0 ~ i){

                found[i]++

            }

        }

       }

       END{

        for(i in a){

            if(!found[i]){

                print i

            }

        }

       }' file1 file2 

YUI

I8O

Explanation

NR==FNR : NR is the current line number and FNR is the current line number of the current file. When processing multiple files, the two will be equal only while reading the first file. So this is an easy way of saying "do this for the 1st file only".

a[$1]++; next : while reading the first file, save each word (the first and only field) in the array a and then skip to the next line. The next also ensures that the rest of the command is not run for the first file.

for(i in a){ if($0 ~ i){ found[i]++ } }: For each of the words found in the first file (the keys of array a), check if the current line matches that word. If it does, save the word in the found array. This is run for each line of the second input file.

END{ }: do this after you've processed all input files.

for(i in a){ if(!found[i]){ print i } }: for each of the words in a, if the word is not also in the found array, print that word.

Alternatively, you can use some of the core Linux utilities:

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

Explanation

$ grep -hoP 'w+' file1 file2

ABC

YUI

GHJ

I8O

dfghjo

ABC

kll

njjgg

bla

bla

GHJ

njhjckhv

chasjvackvh

ihbjhi

hbhibb

jh

jbiibi

$ grep -hoP 'w+' file1 file2 | sort | uniq -u

chasjvackvh

dfghjo

hbhibb

I8O

ihbjhi

jbiibi

jh

kll

njhjckhv

njjgg

YUI

Now we pass the output of the previous command through sort and uniq -u to keep only unique matches: these are the words that are only present in one of the two files.

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

edited Feb 19 at 9:42

answered Feb 19 at 9:32

terdon♦

67k13139221

Super, Its working . Thank a lot @terdon

– Sin15
Feb 19 at 9:35

2

@Sin15 you're welcome. Have a look at the updated answer, I added a much shorter and simpler version as well. Also, if this answer solved your issue, please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites. Feel free to wait for more answers and accept another one, just remember to eventually accept one.

– terdon♦
Feb 19 at 9:43

add a comment |

Here's one way in awk:

$ awk 'NR==FNR{a[$1]++; next}{for(i in a){if($0 ~ i){found[i]++}}}END{for(i in a){if(!found[i]){print i}}}' file1 file2 

YUI

I8O

Or, a bit more legibly:

$ awk 'NR==FNR{

        a[$1]++; 

        next

       }

       {

        for(i in a){

            if($0 ~ i){

                found[i]++

            }

        }

       }

       END{

        for(i in a){

            if(!found[i]){

                print i

            }

        }

       }' file1 file2 

YUI

I8O

Explanation

NR==FNR : NR is the current line number and FNR is the current line number of the current file. When processing multiple files, the two will be equal only while reading the first file. So this is an easy way of saying "do this for the 1st file only".

a[$1]++; next : while reading the first file, save each word (the first and only field) in the array a and then skip to the next line. The next also ensures that the rest of the command is not run for the first file.

for(i in a){ if($0 ~ i){ found[i]++ } }: For each of the words found in the first file (the keys of array a), check if the current line matches that word. If it does, save the word in the found array. This is run for each line of the second input file.

END{ }: do this after you've processed all input files.

for(i in a){ if(!found[i]){ print i } }: for each of the words in a, if the word is not also in the found array, print that word.

Alternatively, you can use some of the core Linux utilities:

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

Explanation

$ grep -hoP 'w+' file1 file2

ABC

YUI

GHJ

I8O

dfghjo

ABC

kll

njjgg

bla

bla

GHJ

njhjckhv

chasjvackvh

ihbjhi

hbhibb

jh

jbiibi

$ grep -hoP 'w+' file1 file2 | sort | uniq -u

chasjvackvh

dfghjo

hbhibb

I8O

ihbjhi

jbiibi

jh

kll

njhjckhv

njjgg

YUI

Now we pass the output of the previous command through sort and uniq -u to keep only unique matches: these are the words that are only present in one of the two files.

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

edited Feb 19 at 9:42

answered Feb 19 at 9:32

terdon♦

67k13139221

Here's one way in awk:

$ awk 'NR==FNR{a[$1]++; next}{for(i in a){if($0 ~ i){found[i]++}}}END{for(i in a){if(!found[i]){print i}}}' file1 file2 

YUI

I8O

Or, a bit more legibly:

$ awk 'NR==FNR{

        a[$1]++; 

        next

       }

       {

        for(i in a){

            if($0 ~ i){

                found[i]++

            }

        }

       }

       END{

        for(i in a){

            if(!found[i]){

                print i

            }

        }

       }' file1 file2 

YUI

I8O

Explanation

NR==FNR : NR is the current line number and FNR is the current line number of the current file. When processing multiple files, the two will be equal only while reading the first file. So this is an easy way of saying "do this for the 1st file only".

a[$1]++; next : while reading the first file, save each word (the first and only field) in the array a and then skip to the next line. The next also ensures that the rest of the command is not run for the first file.

for(i in a){ if($0 ~ i){ found[i]++ } }: For each of the words found in the first file (the keys of array a), check if the current line matches that word. If it does, save the word in the found array. This is run for each line of the second input file.

END{ }: do this after you've processed all input files.

for(i in a){ if(!found[i]){ print i } }: for each of the words in a, if the word is not also in the found array, print that word.

Alternatively, you can use some of the core Linux utilities:

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

Explanation

$ grep -hoP 'w+' file1 file2

ABC

YUI

GHJ

I8O

dfghjo

ABC

kll

njjgg

bla

bla

GHJ

njhjckhv

chasjvackvh

ihbjhi

hbhibb

jh

jbiibi

$ grep -hoP 'w+' file1 file2 | sort | uniq -u

chasjvackvh

dfghjo

hbhibb

I8O

ihbjhi

jbiibi

jh

kll

njhjckhv

njjgg

YUI

Now we pass the output of the previous command through sort and uniq -u to keep only unique matches: these are the words that are only present in one of the two files.

$ grep -hoP 'w+' file1 file2 | sort | uniq -u | xargs -I{} grep {} file1

I8O

YUI

edited Feb 19 at 9:42

answered Feb 19 at 9:32

terdon♦

67k13139221

edited Feb 19 at 9:42

answered Feb 19 at 9:32

terdon♦

67k13139221

answered Feb 19 at 9:32

terdon♦

67k13139221

answered Feb 19 at 9:32

terdon♦

67k13139221

Super, Its working . Thank a lot @terdon

– Sin15
Feb 19 at 9:35

2

@Sin15 you're welcome. Have a look at the updated answer, I added a much shorter and simpler version as well. Also, if this answer solved your issue, please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites. Feel free to wait for more answers and accept another one, just remember to eventually accept one.

– terdon♦
Feb 19 at 9:43

add a comment |

Super, Its working . Thank a lot @terdon

– Sin15
Feb 19 at 9:35

2

@Sin15 you're welcome. Have a look at the updated answer, I added a much shorter and simpler version as well. Also, if this answer solved your issue, please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites. Feel free to wait for more answers and accept another one, just remember to eventually accept one.

– terdon♦
Feb 19 at 9:43

Super, Its working . Thank a lot @terdon

– Sin15
Feb 19 at 9:35

@Sin15 you're welcome. Have a look at the updated answer, I added a much shorter and simpler version as well. Also, if this answer solved your issue, please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites. Feel free to wait for more answers and accept another one, just remember to eventually accept one.

– terdon♦
Feb 19 at 9:43

add a comment |

try this command:

grep -oFf file1 file2 | grep -vFf - file1

where first use file1 as PATTERN and get only the part of a matching line that matches PATTERN in file2, first command give you:

ABC

GHJ

then use this output as input file PATTERN and search line in file1 that doesn't match PATTERN, and you will get:

YUI

I8O

Tested on Red Hat Enterprise Linux ES release 4 (Nahant Update 3)

edited Feb 19 at 12:24

answered Feb 19 at 11:31

Lety

4,98521730

add a comment |

try this command:

grep -oFf file1 file2 | grep -vFf - file1

where first use file1 as PATTERN and get only the part of a matching line that matches PATTERN in file2, first command give you:

ABC

GHJ

then use this output as input file PATTERN and search line in file1 that doesn't match PATTERN, and you will get:

YUI

I8O

Tested on Red Hat Enterprise Linux ES release 4 (Nahant Update 3)

edited Feb 19 at 12:24

answered Feb 19 at 11:31

Lety

4,98521730

add a comment |

try this command:

grep -oFf file1 file2 | grep -vFf - file1

where first use file1 as PATTERN and get only the part of a matching line that matches PATTERN in file2, first command give you:

ABC

GHJ

then use this output as input file PATTERN and search line in file1 that doesn't match PATTERN, and you will get:

YUI

I8O

Tested on Red Hat Enterprise Linux ES release 4 (Nahant Update 3)

edited Feb 19 at 12:24

answered Feb 19 at 11:31

Lety

4,98521730

try this command:

grep -oFf file1 file2 | grep -vFf - file1

where first use file1 as PATTERN and get only the part of a matching line that matches PATTERN in file2, first command give you:

ABC

GHJ

then use this output as input file PATTERN and search line in file1 that doesn't match PATTERN, and you will get:

YUI

I8O

Tested on Red Hat Enterprise Linux ES release 4 (Nahant Update 3)

edited Feb 19 at 12:24

answered Feb 19 at 11:31

Lety

4,98521730

edited Feb 19 at 12:24

answered Feb 19 at 11:31

Lety

4,98521730

answered Feb 19 at 11:31

Lety

4,98521730

answered Feb 19 at 11:31

Lety

4,98521730

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Ask Ubuntu!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

EzEeVm SnOn2N1,3ww1n5ehl7t6s1Qe

搜尋此網誌

Bdtyktl