AWK to print columns based on the columns summation












0















I have a file that has numeric values in the form of a matrix. I have written an awk script that prints the header, then adds 1 to the columns 'sum' if the values in the columns are less than 5 and greater than 0. Then, at the end, it prints the sum of each column. This part works fine:



awk '
BEGIN {FS=OFS=" "}
NR==1 {print}
NR>1 {for (i=1;i<=NF;i++) if ($i < 5 && $i > 0) a[i]+=1}
END {for (i=1;i<=NF;i++) printf $a[i]
}' snp_fake2.txt > tmp.txt`


My goal is to print the entire column if that columns sum is greater than some value THRESHOLD. I have tried adding an if statement after the second for loop to determine if the columns sum, a[i], is > THRESHOLD, and then printing the column:



awk '
BEGIN {FS=OFS=" "}
NR==1 {print}
NR>1 {for (i=1;i<=NF;i++) if ($i < 5 && $i > 0) a[i]+=1}
END {for (i=1;i<=NF;i++) if (a[i] < THRESHOLD) printf $i
}' snp_fake2.txt > tmp.txt`


But when I run this the script does not output the entire column, only a single number. How can I print the entire column instead of just the single value?










share|improve this question




















  • 1





    (1) Your use of the word “sum” is misleading.  You are dealing with the count of values that meet criteria.  (2) I believe that I sort-of understand what you want, but it would help if you would show an example of input and the output that you want to get from it. … … … … … … … Please do not respond in comments; edit your question to make it clearer and more complete.

    – Scott
    Feb 8 at 0:11


















0















I have a file that has numeric values in the form of a matrix. I have written an awk script that prints the header, then adds 1 to the columns 'sum' if the values in the columns are less than 5 and greater than 0. Then, at the end, it prints the sum of each column. This part works fine:



awk '
BEGIN {FS=OFS=" "}
NR==1 {print}
NR>1 {for (i=1;i<=NF;i++) if ($i < 5 && $i > 0) a[i]+=1}
END {for (i=1;i<=NF;i++) printf $a[i]
}' snp_fake2.txt > tmp.txt`


My goal is to print the entire column if that columns sum is greater than some value THRESHOLD. I have tried adding an if statement after the second for loop to determine if the columns sum, a[i], is > THRESHOLD, and then printing the column:



awk '
BEGIN {FS=OFS=" "}
NR==1 {print}
NR>1 {for (i=1;i<=NF;i++) if ($i < 5 && $i > 0) a[i]+=1}
END {for (i=1;i<=NF;i++) if (a[i] < THRESHOLD) printf $i
}' snp_fake2.txt > tmp.txt`


But when I run this the script does not output the entire column, only a single number. How can I print the entire column instead of just the single value?










share|improve this question




















  • 1





    (1) Your use of the word “sum” is misleading.  You are dealing with the count of values that meet criteria.  (2) I believe that I sort-of understand what you want, but it would help if you would show an example of input and the output that you want to get from it. … … … … … … … Please do not respond in comments; edit your question to make it clearer and more complete.

    – Scott
    Feb 8 at 0:11
















0












0








0








I have a file that has numeric values in the form of a matrix. I have written an awk script that prints the header, then adds 1 to the columns 'sum' if the values in the columns are less than 5 and greater than 0. Then, at the end, it prints the sum of each column. This part works fine:



awk '
BEGIN {FS=OFS=" "}
NR==1 {print}
NR>1 {for (i=1;i<=NF;i++) if ($i < 5 && $i > 0) a[i]+=1}
END {for (i=1;i<=NF;i++) printf $a[i]
}' snp_fake2.txt > tmp.txt`


My goal is to print the entire column if that columns sum is greater than some value THRESHOLD. I have tried adding an if statement after the second for loop to determine if the columns sum, a[i], is > THRESHOLD, and then printing the column:



awk '
BEGIN {FS=OFS=" "}
NR==1 {print}
NR>1 {for (i=1;i<=NF;i++) if ($i < 5 && $i > 0) a[i]+=1}
END {for (i=1;i<=NF;i++) if (a[i] < THRESHOLD) printf $i
}' snp_fake2.txt > tmp.txt`


But when I run this the script does not output the entire column, only a single number. How can I print the entire column instead of just the single value?










share|improve this question
















I have a file that has numeric values in the form of a matrix. I have written an awk script that prints the header, then adds 1 to the columns 'sum' if the values in the columns are less than 5 and greater than 0. Then, at the end, it prints the sum of each column. This part works fine:



awk '
BEGIN {FS=OFS=" "}
NR==1 {print}
NR>1 {for (i=1;i<=NF;i++) if ($i < 5 && $i > 0) a[i]+=1}
END {for (i=1;i<=NF;i++) printf $a[i]
}' snp_fake2.txt > tmp.txt`


My goal is to print the entire column if that columns sum is greater than some value THRESHOLD. I have tried adding an if statement after the second for loop to determine if the columns sum, a[i], is > THRESHOLD, and then printing the column:



awk '
BEGIN {FS=OFS=" "}
NR==1 {print}
NR>1 {for (i=1;i<=NF;i++) if ($i < 5 && $i > 0) a[i]+=1}
END {for (i=1;i<=NF;i++) if (a[i] < THRESHOLD) printf $i
}' snp_fake2.txt > tmp.txt`


But when I run this the script does not output the entire column, only a single number. How can I print the entire column instead of just the single value?







printing awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 8 at 6:47









Thor

4,70312439




4,70312439










asked Feb 7 at 22:23









ben stearben stear

64




64








  • 1





    (1) Your use of the word “sum” is misleading.  You are dealing with the count of values that meet criteria.  (2) I believe that I sort-of understand what you want, but it would help if you would show an example of input and the output that you want to get from it. … … … … … … … Please do not respond in comments; edit your question to make it clearer and more complete.

    – Scott
    Feb 8 at 0:11
















  • 1





    (1) Your use of the word “sum” is misleading.  You are dealing with the count of values that meet criteria.  (2) I believe that I sort-of understand what you want, but it would help if you would show an example of input and the output that you want to get from it. … … … … … … … Please do not respond in comments; edit your question to make it clearer and more complete.

    – Scott
    Feb 8 at 0:11










1




1





(1) Your use of the word “sum” is misleading.  You are dealing with the count of values that meet criteria.  (2) I believe that I sort-of understand what you want, but it would help if you would show an example of input and the output that you want to get from it. … … … … … … … Please do not respond in comments; edit your question to make it clearer and more complete.

– Scott
Feb 8 at 0:11







(1) Your use of the word “sum” is misleading.  You are dealing with the count of values that meet criteria.  (2) I believe that I sort-of understand what you want, but it would help if you would show an example of input and the output that you want to get from it. … … … … … … … Please do not respond in comments; edit your question to make it clearer and more complete.

– Scott
Feb 8 at 0:11












2 Answers
2






active

oldest

votes


















0














AWK processes the file one line at a time. It has no memory of previous lines. The END rule executes after the last line is processed. At this point AWK cannot print all the entries in column $i because it only knows a single value for column $i: the one from the last line.



Your goal requires two passes of the file: one to calculate the column sum, and a second to print out the entire column (for the appropriate columns). To do so, you could write a shell script that calls awk to calculate the sums, and then calls awk (or something else) to print the columns.






share|improve this answer
























  • I see. Thanks for clarifying things.

    – ben stear
    Feb 8 at 17:04



















0














If I understood right, one way is to use two dimensional array. It works with GNU awk.



echo -e '1 4 7n2 5 8n3 6 9' | awk '
{ for (i=1;i<=NF;i++) {
field[i][NR]=$i
if ($i < 5 && $i > 0) {
a[i]+=1
}
}
}
END {
for (i in a) {
if (a[i] > 2) {
for (j in field[i]) print field[i][j]
}
}
}'





share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "3"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1403330%2fawk-to-print-columns-based-on-the-columns-summation%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    AWK processes the file one line at a time. It has no memory of previous lines. The END rule executes after the last line is processed. At this point AWK cannot print all the entries in column $i because it only knows a single value for column $i: the one from the last line.



    Your goal requires two passes of the file: one to calculate the column sum, and a second to print out the entire column (for the appropriate columns). To do so, you could write a shell script that calls awk to calculate the sums, and then calls awk (or something else) to print the columns.






    share|improve this answer
























    • I see. Thanks for clarifying things.

      – ben stear
      Feb 8 at 17:04
















    0














    AWK processes the file one line at a time. It has no memory of previous lines. The END rule executes after the last line is processed. At this point AWK cannot print all the entries in column $i because it only knows a single value for column $i: the one from the last line.



    Your goal requires two passes of the file: one to calculate the column sum, and a second to print out the entire column (for the appropriate columns). To do so, you could write a shell script that calls awk to calculate the sums, and then calls awk (or something else) to print the columns.






    share|improve this answer
























    • I see. Thanks for clarifying things.

      – ben stear
      Feb 8 at 17:04














    0












    0








    0







    AWK processes the file one line at a time. It has no memory of previous lines. The END rule executes after the last line is processed. At this point AWK cannot print all the entries in column $i because it only knows a single value for column $i: the one from the last line.



    Your goal requires two passes of the file: one to calculate the column sum, and a second to print out the entire column (for the appropriate columns). To do so, you could write a shell script that calls awk to calculate the sums, and then calls awk (or something else) to print the columns.






    share|improve this answer













    AWK processes the file one line at a time. It has no memory of previous lines. The END rule executes after the last line is processed. At this point AWK cannot print all the entries in column $i because it only knows a single value for column $i: the one from the last line.



    Your goal requires two passes of the file: one to calculate the column sum, and a second to print out the entire column (for the appropriate columns). To do so, you could write a shell script that calls awk to calculate the sums, and then calls awk (or something else) to print the columns.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Feb 7 at 22:46









    ddffnnddffnn

    512




    512













    • I see. Thanks for clarifying things.

      – ben stear
      Feb 8 at 17:04



















    • I see. Thanks for clarifying things.

      – ben stear
      Feb 8 at 17:04

















    I see. Thanks for clarifying things.

    – ben stear
    Feb 8 at 17:04





    I see. Thanks for clarifying things.

    – ben stear
    Feb 8 at 17:04













    0














    If I understood right, one way is to use two dimensional array. It works with GNU awk.



    echo -e '1 4 7n2 5 8n3 6 9' | awk '
    { for (i=1;i<=NF;i++) {
    field[i][NR]=$i
    if ($i < 5 && $i > 0) {
    a[i]+=1
    }
    }
    }
    END {
    for (i in a) {
    if (a[i] > 2) {
    for (j in field[i]) print field[i][j]
    }
    }
    }'





    share|improve this answer




























      0














      If I understood right, one way is to use two dimensional array. It works with GNU awk.



      echo -e '1 4 7n2 5 8n3 6 9' | awk '
      { for (i=1;i<=NF;i++) {
      field[i][NR]=$i
      if ($i < 5 && $i > 0) {
      a[i]+=1
      }
      }
      }
      END {
      for (i in a) {
      if (a[i] > 2) {
      for (j in field[i]) print field[i][j]
      }
      }
      }'





      share|improve this answer


























        0












        0








        0







        If I understood right, one way is to use two dimensional array. It works with GNU awk.



        echo -e '1 4 7n2 5 8n3 6 9' | awk '
        { for (i=1;i<=NF;i++) {
        field[i][NR]=$i
        if ($i < 5 && $i > 0) {
        a[i]+=1
        }
        }
        }
        END {
        for (i in a) {
        if (a[i] > 2) {
        for (j in field[i]) print field[i][j]
        }
        }
        }'





        share|improve this answer













        If I understood right, one way is to use two dimensional array. It works with GNU awk.



        echo -e '1 4 7n2 5 8n3 6 9' | awk '
        { for (i=1;i<=NF;i++) {
        field[i][NR]=$i
        if ($i < 5 && $i > 0) {
        a[i]+=1
        }
        }
        }
        END {
        for (i in a) {
        if (a[i] > 2) {
        for (j in field[i]) print field[i][j]
        }
        }
        }'






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Feb 8 at 5:11









        PauloPaulo

        57428




        57428






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Super User!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1403330%2fawk-to-print-columns-based-on-the-columns-summation%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

            Mangá

            Eduardo VII do Reino Unido