Trim NA values from left and right of Rcpp::NumericVector











up vote
2
down vote

favorite












I receive data with NA values. I would like to trim off the leading and trailing NA values, but to leave the interior NA values in place.



I've written the below in Rcpp, but it is very much a hand rolled solution -- and i'm alive to the fact that i'm going to make mistakes doing things this way.



What is the idiomatic cpp / Rcpp (sugar?) way of doing something like this?



#include<Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericVector trimNA(Rcpp::NumericVector X) {

// find first and last non-NA value

double * F = X.begin();
double * pBegin = X.begin();
double * pEnd = X.end();
double * L = X.end() - 1; // note you must decrement by one as .end() is AFTER

// look for the values
while( F < pEnd && NumericVector::is_na(*F)) F++;
while( L > pBegin && NumericVector::is_na(*L)) L--;


// create trimmed X vector
Rcpp::NumericVector Xtrim(L - F + 1);

for( int i = 0; i < Xtrim.size(); i++) {
Xtrim[i] = *(F + i);
}

return Xtrim;
}


/*** R

x <- c(NA, NA, NA, 1:4, NA, 6:8, NA, 10, NA, NA, NA)
trimNA(x)
*/









share|improve this question
























  • I edited title to better reflect the intention. Please rollback if my edit is wrong.
    – Incomputable
    yesterday















up vote
2
down vote

favorite












I receive data with NA values. I would like to trim off the leading and trailing NA values, but to leave the interior NA values in place.



I've written the below in Rcpp, but it is very much a hand rolled solution -- and i'm alive to the fact that i'm going to make mistakes doing things this way.



What is the idiomatic cpp / Rcpp (sugar?) way of doing something like this?



#include<Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericVector trimNA(Rcpp::NumericVector X) {

// find first and last non-NA value

double * F = X.begin();
double * pBegin = X.begin();
double * pEnd = X.end();
double * L = X.end() - 1; // note you must decrement by one as .end() is AFTER

// look for the values
while( F < pEnd && NumericVector::is_na(*F)) F++;
while( L > pBegin && NumericVector::is_na(*L)) L--;


// create trimmed X vector
Rcpp::NumericVector Xtrim(L - F + 1);

for( int i = 0; i < Xtrim.size(); i++) {
Xtrim[i] = *(F + i);
}

return Xtrim;
}


/*** R

x <- c(NA, NA, NA, 1:4, NA, 6:8, NA, 10, NA, NA, NA)
trimNA(x)
*/









share|improve this question
























  • I edited title to better reflect the intention. Please rollback if my edit is wrong.
    – Incomputable
    yesterday













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I receive data with NA values. I would like to trim off the leading and trailing NA values, but to leave the interior NA values in place.



I've written the below in Rcpp, but it is very much a hand rolled solution -- and i'm alive to the fact that i'm going to make mistakes doing things this way.



What is the idiomatic cpp / Rcpp (sugar?) way of doing something like this?



#include<Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericVector trimNA(Rcpp::NumericVector X) {

// find first and last non-NA value

double * F = X.begin();
double * pBegin = X.begin();
double * pEnd = X.end();
double * L = X.end() - 1; // note you must decrement by one as .end() is AFTER

// look for the values
while( F < pEnd && NumericVector::is_na(*F)) F++;
while( L > pBegin && NumericVector::is_na(*L)) L--;


// create trimmed X vector
Rcpp::NumericVector Xtrim(L - F + 1);

for( int i = 0; i < Xtrim.size(); i++) {
Xtrim[i] = *(F + i);
}

return Xtrim;
}


/*** R

x <- c(NA, NA, NA, 1:4, NA, 6:8, NA, 10, NA, NA, NA)
trimNA(x)
*/









share|improve this question















I receive data with NA values. I would like to trim off the leading and trailing NA values, but to leave the interior NA values in place.



I've written the below in Rcpp, but it is very much a hand rolled solution -- and i'm alive to the fact that i'm going to make mistakes doing things this way.



What is the idiomatic cpp / Rcpp (sugar?) way of doing something like this?



#include<Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericVector trimNA(Rcpp::NumericVector X) {

// find first and last non-NA value

double * F = X.begin();
double * pBegin = X.begin();
double * pEnd = X.end();
double * L = X.end() - 1; // note you must decrement by one as .end() is AFTER

// look for the values
while( F < pEnd && NumericVector::is_na(*F)) F++;
while( L > pBegin && NumericVector::is_na(*L)) L--;


// create trimmed X vector
Rcpp::NumericVector Xtrim(L - F + 1);

for( int i = 0; i < Xtrim.size(); i++) {
Xtrim[i] = *(F + i);
}

return Xtrim;
}


/*** R

x <- c(NA, NA, NA, 1:4, NA, 6:8, NA, 10, NA, NA, NA)
trimNA(x)
*/






c++ rcpp






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday









Incomputable

6,42621453




6,42621453










asked yesterday









ricardo

180210




180210












  • I edited title to better reflect the intention. Please rollback if my edit is wrong.
    – Incomputable
    yesterday


















  • I edited title to better reflect the intention. Please rollback if my edit is wrong.
    – Incomputable
    yesterday
















I edited title to better reflect the intention. Please rollback if my edit is wrong.
– Incomputable
yesterday




I edited title to better reflect the intention. Please rollback if my edit is wrong.
– Incomputable
yesterday










2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










I don't know Rcpp well enough to be sure, but we should be able to reduce copying in one of two ways:




  • Accept the argument as a const reference, so that we don't copy in to the function, or

  • Modify our copy of the argument in-place, then return it as result (rather than copying again to Xtrim).


Both of these eliminate a vector copy - in general, the first is to be preferred, as the input vector is bigger than the output vector. If we must copy something, prefer the smallest copy.





The former option is simplest, if we're allowed to change the signature:



// The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
#include <RcppCommon.h>
#include <Rcpp/RObject.h>
#include <Rcpp/Dimension.h>

#include <Rcpp/Vector.h>

#include <algorithm>
#include <iterator>

// [[Rcpp::export]]
Rcpp::NumericVector trimNA(const Rcpp::NumericVector& x)
{
// remove trailing NA values
auto rbegin = std::make_reverse_iterator(x.end());
auto rend = std::make_reverse_iterator(x.begin());
auto last_real_value = std::find_if_not(rbegin, rend,
&Rcpp::NumericVector::is_na).base();

// remove leading NA values
auto first_real_value = std::find_if_not(x.begin(), last_real_value,
&Rcpp::NumericVector::is_na);

return {first_real_value, last_real_value};
}


Since the Rcpp vector doesn't provide standard-collection-like rbegin()/rend() methods, I had to manually create reverse iterators. That wouldn't be necessary with a standard vector.





For the latter option, we could use the vector's erase() method to remove the rightmost values, and then the leftmost ones (removing the leftmost first would work, but could require moving more data, making it less efficient).



Here's how that might work (untested):



// The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
#include <RcppCommon.h>
#include <Rcpp/RObject.h>
#include <Rcpp/Dimension.h>

#include <Rcpp/Vector.h>

#include <algorithm>
#include <iterator>

// [[Rcpp::export]]
Rcpp::NumericVector trimNA(Rcpp::NumericVector x)
{
// remove trailing NA values
auto rbegin = std::make_reverse_iterator(x.end());
auto rend = std::make_reverse_iterator(x.begin());
auto last_real_value = std::find_if_not(rbegin, rend,
&Rcpp::NumericVector::is_na).base();
x.erase(last_real_value, x.end());

// remove leading NA values
auto first_real_value = std::find_if_not(x.begin(), x.end(),
&Rcpp::NumericVector::is_na);
x.erase(x.begin(), first_real_value);

return x;
}




Other review items



The code doesn't work correctly if every value is an NA - we should stop when we reach F rather than pBegin here:



while( L > F && NumericVector::is_na(*L)) L--;


Use the standard indexing operator:



    Xtrim[i] = F[i];


(But it's better to use std::copy() instead of the loop).



Use actual iterators, rather than assuming you'll get bare pointers.



Don't include all of <Rcpp.h>, as that makes compilation really slow. Unfortunately, Rcpp's headers are broken because they don't include all they need - see my examples for workaround (that's still much faster to compile than the original).






share|improve this answer























  • I'm very new to c++, could you demonstrate how you'd use erase and explain the benefits?
    – ricardo
    yesterday


















up vote
2
down vote













Code Review



Code is good, except the naming. There are some pieces which could be improved



Use auto for iterators, as exact iterator type is not important, but it's properties are. Iterator properties are usually inherited from container it came from.



Use better constructor, there seems to be one which accepts pair of iterators, and in this case, pair of pointers denoting contiguous range.



Better (?) approach



The problem seems to be somewhat specific case of trimming in general. Note that trimming from the right and trimming from the left are symmetric, and std::reverse_iterator "normalizes" trimming from right (great improvements thanks to @Toby).



template <typename BidirIterator,
typename Predicate>
auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
auto left_edge = std::find_if_not(first, last, predicate);
auto right_edge = std::find_if_not(std::reverse_iterator(last),
std::reverse_iterator(left_edge),
predicate).base();

return std::pair{left_edge, right_edge};
}


When reversing a range, first becomes last, and vice versa, that is why std::reverse_iterator arguments were swapped. There is also C++17 feature in play (template class argument deduction).





Full code:



#include <utility>
#include <iterator>
#include <algorithm>

template <typename BidirIterator,
typename Predicate>
auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
auto left_edge = std::find_if_not(first, last, predicate);
auto right_edge = std::find_if_not(std::reverse_iterator(last),
std::reverse_iterator(left_edge),
predicate).base();

return std::pair{left_edge, right_edge};
}

#include <vector>
#include <stdexcept>

int main() {
std::vector<int> initial_values{2, 2, 3, 4, 5, 6};
auto predicate = (int x) { return x % 2 == 0; };
auto [new_first, new_last] = trim(initial_values.begin(),
initial_values.end(),
predicate);
std::vector<int> resulting_values(new_first, new_last);
std::vector<int> correct_result{3, 4, 5};
if (resulting_values != correct_result) {
throw std::logic_error("incorrect trimming occured");
}
}


Demo on Wandbox.





To make trimming a little bit easier to use, one might write something like this:



template <typename Container, typename Predicate>
Container trim_copy(Container&& container, Predicate predicate) {
auto [new_first, new_last] = trim(container.begin(), container.end(),
predicate);
return Container(new_first, new_last);
}


and pass a lambda calling the NumericVector::is_na:



auto predicate = (auto x) { return NumericVector::is_na(x); };


or may be pointer to the function directly.






share|improve this answer























    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207955%2ftrim-na-values-from-left-and-right-of-rcppnumericvector%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote



    accepted










    I don't know Rcpp well enough to be sure, but we should be able to reduce copying in one of two ways:




    • Accept the argument as a const reference, so that we don't copy in to the function, or

    • Modify our copy of the argument in-place, then return it as result (rather than copying again to Xtrim).


    Both of these eliminate a vector copy - in general, the first is to be preferred, as the input vector is bigger than the output vector. If we must copy something, prefer the smallest copy.





    The former option is simplest, if we're allowed to change the signature:



    // The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
    #include <RcppCommon.h>
    #include <Rcpp/RObject.h>
    #include <Rcpp/Dimension.h>

    #include <Rcpp/Vector.h>

    #include <algorithm>
    #include <iterator>

    // [[Rcpp::export]]
    Rcpp::NumericVector trimNA(const Rcpp::NumericVector& x)
    {
    // remove trailing NA values
    auto rbegin = std::make_reverse_iterator(x.end());
    auto rend = std::make_reverse_iterator(x.begin());
    auto last_real_value = std::find_if_not(rbegin, rend,
    &Rcpp::NumericVector::is_na).base();

    // remove leading NA values
    auto first_real_value = std::find_if_not(x.begin(), last_real_value,
    &Rcpp::NumericVector::is_na);

    return {first_real_value, last_real_value};
    }


    Since the Rcpp vector doesn't provide standard-collection-like rbegin()/rend() methods, I had to manually create reverse iterators. That wouldn't be necessary with a standard vector.





    For the latter option, we could use the vector's erase() method to remove the rightmost values, and then the leftmost ones (removing the leftmost first would work, but could require moving more data, making it less efficient).



    Here's how that might work (untested):



    // The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
    #include <RcppCommon.h>
    #include <Rcpp/RObject.h>
    #include <Rcpp/Dimension.h>

    #include <Rcpp/Vector.h>

    #include <algorithm>
    #include <iterator>

    // [[Rcpp::export]]
    Rcpp::NumericVector trimNA(Rcpp::NumericVector x)
    {
    // remove trailing NA values
    auto rbegin = std::make_reverse_iterator(x.end());
    auto rend = std::make_reverse_iterator(x.begin());
    auto last_real_value = std::find_if_not(rbegin, rend,
    &Rcpp::NumericVector::is_na).base();
    x.erase(last_real_value, x.end());

    // remove leading NA values
    auto first_real_value = std::find_if_not(x.begin(), x.end(),
    &Rcpp::NumericVector::is_na);
    x.erase(x.begin(), first_real_value);

    return x;
    }




    Other review items



    The code doesn't work correctly if every value is an NA - we should stop when we reach F rather than pBegin here:



    while( L > F && NumericVector::is_na(*L)) L--;


    Use the standard indexing operator:



        Xtrim[i] = F[i];


    (But it's better to use std::copy() instead of the loop).



    Use actual iterators, rather than assuming you'll get bare pointers.



    Don't include all of <Rcpp.h>, as that makes compilation really slow. Unfortunately, Rcpp's headers are broken because they don't include all they need - see my examples for workaround (that's still much faster to compile than the original).






    share|improve this answer























    • I'm very new to c++, could you demonstrate how you'd use erase and explain the benefits?
      – ricardo
      yesterday















    up vote
    2
    down vote



    accepted










    I don't know Rcpp well enough to be sure, but we should be able to reduce copying in one of two ways:




    • Accept the argument as a const reference, so that we don't copy in to the function, or

    • Modify our copy of the argument in-place, then return it as result (rather than copying again to Xtrim).


    Both of these eliminate a vector copy - in general, the first is to be preferred, as the input vector is bigger than the output vector. If we must copy something, prefer the smallest copy.





    The former option is simplest, if we're allowed to change the signature:



    // The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
    #include <RcppCommon.h>
    #include <Rcpp/RObject.h>
    #include <Rcpp/Dimension.h>

    #include <Rcpp/Vector.h>

    #include <algorithm>
    #include <iterator>

    // [[Rcpp::export]]
    Rcpp::NumericVector trimNA(const Rcpp::NumericVector& x)
    {
    // remove trailing NA values
    auto rbegin = std::make_reverse_iterator(x.end());
    auto rend = std::make_reverse_iterator(x.begin());
    auto last_real_value = std::find_if_not(rbegin, rend,
    &Rcpp::NumericVector::is_na).base();

    // remove leading NA values
    auto first_real_value = std::find_if_not(x.begin(), last_real_value,
    &Rcpp::NumericVector::is_na);

    return {first_real_value, last_real_value};
    }


    Since the Rcpp vector doesn't provide standard-collection-like rbegin()/rend() methods, I had to manually create reverse iterators. That wouldn't be necessary with a standard vector.





    For the latter option, we could use the vector's erase() method to remove the rightmost values, and then the leftmost ones (removing the leftmost first would work, but could require moving more data, making it less efficient).



    Here's how that might work (untested):



    // The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
    #include <RcppCommon.h>
    #include <Rcpp/RObject.h>
    #include <Rcpp/Dimension.h>

    #include <Rcpp/Vector.h>

    #include <algorithm>
    #include <iterator>

    // [[Rcpp::export]]
    Rcpp::NumericVector trimNA(Rcpp::NumericVector x)
    {
    // remove trailing NA values
    auto rbegin = std::make_reverse_iterator(x.end());
    auto rend = std::make_reverse_iterator(x.begin());
    auto last_real_value = std::find_if_not(rbegin, rend,
    &Rcpp::NumericVector::is_na).base();
    x.erase(last_real_value, x.end());

    // remove leading NA values
    auto first_real_value = std::find_if_not(x.begin(), x.end(),
    &Rcpp::NumericVector::is_na);
    x.erase(x.begin(), first_real_value);

    return x;
    }




    Other review items



    The code doesn't work correctly if every value is an NA - we should stop when we reach F rather than pBegin here:



    while( L > F && NumericVector::is_na(*L)) L--;


    Use the standard indexing operator:



        Xtrim[i] = F[i];


    (But it's better to use std::copy() instead of the loop).



    Use actual iterators, rather than assuming you'll get bare pointers.



    Don't include all of <Rcpp.h>, as that makes compilation really slow. Unfortunately, Rcpp's headers are broken because they don't include all they need - see my examples for workaround (that's still much faster to compile than the original).






    share|improve this answer























    • I'm very new to c++, could you demonstrate how you'd use erase and explain the benefits?
      – ricardo
      yesterday













    up vote
    2
    down vote



    accepted







    up vote
    2
    down vote



    accepted






    I don't know Rcpp well enough to be sure, but we should be able to reduce copying in one of two ways:




    • Accept the argument as a const reference, so that we don't copy in to the function, or

    • Modify our copy of the argument in-place, then return it as result (rather than copying again to Xtrim).


    Both of these eliminate a vector copy - in general, the first is to be preferred, as the input vector is bigger than the output vector. If we must copy something, prefer the smallest copy.





    The former option is simplest, if we're allowed to change the signature:



    // The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
    #include <RcppCommon.h>
    #include <Rcpp/RObject.h>
    #include <Rcpp/Dimension.h>

    #include <Rcpp/Vector.h>

    #include <algorithm>
    #include <iterator>

    // [[Rcpp::export]]
    Rcpp::NumericVector trimNA(const Rcpp::NumericVector& x)
    {
    // remove trailing NA values
    auto rbegin = std::make_reverse_iterator(x.end());
    auto rend = std::make_reverse_iterator(x.begin());
    auto last_real_value = std::find_if_not(rbegin, rend,
    &Rcpp::NumericVector::is_na).base();

    // remove leading NA values
    auto first_real_value = std::find_if_not(x.begin(), last_real_value,
    &Rcpp::NumericVector::is_na);

    return {first_real_value, last_real_value};
    }


    Since the Rcpp vector doesn't provide standard-collection-like rbegin()/rend() methods, I had to manually create reverse iterators. That wouldn't be necessary with a standard vector.





    For the latter option, we could use the vector's erase() method to remove the rightmost values, and then the leftmost ones (removing the leftmost first would work, but could require moving more data, making it less efficient).



    Here's how that might work (untested):



    // The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
    #include <RcppCommon.h>
    #include <Rcpp/RObject.h>
    #include <Rcpp/Dimension.h>

    #include <Rcpp/Vector.h>

    #include <algorithm>
    #include <iterator>

    // [[Rcpp::export]]
    Rcpp::NumericVector trimNA(Rcpp::NumericVector x)
    {
    // remove trailing NA values
    auto rbegin = std::make_reverse_iterator(x.end());
    auto rend = std::make_reverse_iterator(x.begin());
    auto last_real_value = std::find_if_not(rbegin, rend,
    &Rcpp::NumericVector::is_na).base();
    x.erase(last_real_value, x.end());

    // remove leading NA values
    auto first_real_value = std::find_if_not(x.begin(), x.end(),
    &Rcpp::NumericVector::is_na);
    x.erase(x.begin(), first_real_value);

    return x;
    }




    Other review items



    The code doesn't work correctly if every value is an NA - we should stop when we reach F rather than pBegin here:



    while( L > F && NumericVector::is_na(*L)) L--;


    Use the standard indexing operator:



        Xtrim[i] = F[i];


    (But it's better to use std::copy() instead of the loop).



    Use actual iterators, rather than assuming you'll get bare pointers.



    Don't include all of <Rcpp.h>, as that makes compilation really slow. Unfortunately, Rcpp's headers are broken because they don't include all they need - see my examples for workaround (that's still much faster to compile than the original).






    share|improve this answer














    I don't know Rcpp well enough to be sure, but we should be able to reduce copying in one of two ways:




    • Accept the argument as a const reference, so that we don't copy in to the function, or

    • Modify our copy of the argument in-place, then return it as result (rather than copying again to Xtrim).


    Both of these eliminate a vector copy - in general, the first is to be preferred, as the input vector is bigger than the output vector. If we must copy something, prefer the smallest copy.





    The former option is simplest, if we're allowed to change the signature:



    // The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
    #include <RcppCommon.h>
    #include <Rcpp/RObject.h>
    #include <Rcpp/Dimension.h>

    #include <Rcpp/Vector.h>

    #include <algorithm>
    #include <iterator>

    // [[Rcpp::export]]
    Rcpp::NumericVector trimNA(const Rcpp::NumericVector& x)
    {
    // remove trailing NA values
    auto rbegin = std::make_reverse_iterator(x.end());
    auto rend = std::make_reverse_iterator(x.begin());
    auto last_real_value = std::find_if_not(rbegin, rend,
    &Rcpp::NumericVector::is_na).base();

    // remove leading NA values
    auto first_real_value = std::find_if_not(x.begin(), last_real_value,
    &Rcpp::NumericVector::is_na);

    return {first_real_value, last_real_value};
    }


    Since the Rcpp vector doesn't provide standard-collection-like rbegin()/rend() methods, I had to manually create reverse iterators. That wouldn't be necessary with a standard vector.





    For the latter option, we could use the vector's erase() method to remove the rightmost values, and then the leftmost ones (removing the leftmost first would work, but could require moving more data, making it less efficient).



    Here's how that might work (untested):



    // The Rcpp Vector header is incomplete; it needs these dependencies.  :-(
    #include <RcppCommon.h>
    #include <Rcpp/RObject.h>
    #include <Rcpp/Dimension.h>

    #include <Rcpp/Vector.h>

    #include <algorithm>
    #include <iterator>

    // [[Rcpp::export]]
    Rcpp::NumericVector trimNA(Rcpp::NumericVector x)
    {
    // remove trailing NA values
    auto rbegin = std::make_reverse_iterator(x.end());
    auto rend = std::make_reverse_iterator(x.begin());
    auto last_real_value = std::find_if_not(rbegin, rend,
    &Rcpp::NumericVector::is_na).base();
    x.erase(last_real_value, x.end());

    // remove leading NA values
    auto first_real_value = std::find_if_not(x.begin(), x.end(),
    &Rcpp::NumericVector::is_na);
    x.erase(x.begin(), first_real_value);

    return x;
    }




    Other review items



    The code doesn't work correctly if every value is an NA - we should stop when we reach F rather than pBegin here:



    while( L > F && NumericVector::is_na(*L)) L--;


    Use the standard indexing operator:



        Xtrim[i] = F[i];


    (But it's better to use std::copy() instead of the loop).



    Use actual iterators, rather than assuming you'll get bare pointers.



    Don't include all of <Rcpp.h>, as that makes compilation really slow. Unfortunately, Rcpp's headers are broken because they don't include all they need - see my examples for workaround (that's still much faster to compile than the original).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday

























    answered yesterday









    Toby Speight

    22k536108




    22k536108












    • I'm very new to c++, could you demonstrate how you'd use erase and explain the benefits?
      – ricardo
      yesterday


















    • I'm very new to c++, could you demonstrate how you'd use erase and explain the benefits?
      – ricardo
      yesterday
















    I'm very new to c++, could you demonstrate how you'd use erase and explain the benefits?
    – ricardo
    yesterday




    I'm very new to c++, could you demonstrate how you'd use erase and explain the benefits?
    – ricardo
    yesterday












    up vote
    2
    down vote













    Code Review



    Code is good, except the naming. There are some pieces which could be improved



    Use auto for iterators, as exact iterator type is not important, but it's properties are. Iterator properties are usually inherited from container it came from.



    Use better constructor, there seems to be one which accepts pair of iterators, and in this case, pair of pointers denoting contiguous range.



    Better (?) approach



    The problem seems to be somewhat specific case of trimming in general. Note that trimming from the right and trimming from the left are symmetric, and std::reverse_iterator "normalizes" trimming from right (great improvements thanks to @Toby).



    template <typename BidirIterator,
    typename Predicate>
    auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
    auto left_edge = std::find_if_not(first, last, predicate);
    auto right_edge = std::find_if_not(std::reverse_iterator(last),
    std::reverse_iterator(left_edge),
    predicate).base();

    return std::pair{left_edge, right_edge};
    }


    When reversing a range, first becomes last, and vice versa, that is why std::reverse_iterator arguments were swapped. There is also C++17 feature in play (template class argument deduction).





    Full code:



    #include <utility>
    #include <iterator>
    #include <algorithm>

    template <typename BidirIterator,
    typename Predicate>
    auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
    auto left_edge = std::find_if_not(first, last, predicate);
    auto right_edge = std::find_if_not(std::reverse_iterator(last),
    std::reverse_iterator(left_edge),
    predicate).base();

    return std::pair{left_edge, right_edge};
    }

    #include <vector>
    #include <stdexcept>

    int main() {
    std::vector<int> initial_values{2, 2, 3, 4, 5, 6};
    auto predicate = (int x) { return x % 2 == 0; };
    auto [new_first, new_last] = trim(initial_values.begin(),
    initial_values.end(),
    predicate);
    std::vector<int> resulting_values(new_first, new_last);
    std::vector<int> correct_result{3, 4, 5};
    if (resulting_values != correct_result) {
    throw std::logic_error("incorrect trimming occured");
    }
    }


    Demo on Wandbox.





    To make trimming a little bit easier to use, one might write something like this:



    template <typename Container, typename Predicate>
    Container trim_copy(Container&& container, Predicate predicate) {
    auto [new_first, new_last] = trim(container.begin(), container.end(),
    predicate);
    return Container(new_first, new_last);
    }


    and pass a lambda calling the NumericVector::is_na:



    auto predicate = (auto x) { return NumericVector::is_na(x); };


    or may be pointer to the function directly.






    share|improve this answer



























      up vote
      2
      down vote













      Code Review



      Code is good, except the naming. There are some pieces which could be improved



      Use auto for iterators, as exact iterator type is not important, but it's properties are. Iterator properties are usually inherited from container it came from.



      Use better constructor, there seems to be one which accepts pair of iterators, and in this case, pair of pointers denoting contiguous range.



      Better (?) approach



      The problem seems to be somewhat specific case of trimming in general. Note that trimming from the right and trimming from the left are symmetric, and std::reverse_iterator "normalizes" trimming from right (great improvements thanks to @Toby).



      template <typename BidirIterator,
      typename Predicate>
      auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
      auto left_edge = std::find_if_not(first, last, predicate);
      auto right_edge = std::find_if_not(std::reverse_iterator(last),
      std::reverse_iterator(left_edge),
      predicate).base();

      return std::pair{left_edge, right_edge};
      }


      When reversing a range, first becomes last, and vice versa, that is why std::reverse_iterator arguments were swapped. There is also C++17 feature in play (template class argument deduction).





      Full code:



      #include <utility>
      #include <iterator>
      #include <algorithm>

      template <typename BidirIterator,
      typename Predicate>
      auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
      auto left_edge = std::find_if_not(first, last, predicate);
      auto right_edge = std::find_if_not(std::reverse_iterator(last),
      std::reverse_iterator(left_edge),
      predicate).base();

      return std::pair{left_edge, right_edge};
      }

      #include <vector>
      #include <stdexcept>

      int main() {
      std::vector<int> initial_values{2, 2, 3, 4, 5, 6};
      auto predicate = (int x) { return x % 2 == 0; };
      auto [new_first, new_last] = trim(initial_values.begin(),
      initial_values.end(),
      predicate);
      std::vector<int> resulting_values(new_first, new_last);
      std::vector<int> correct_result{3, 4, 5};
      if (resulting_values != correct_result) {
      throw std::logic_error("incorrect trimming occured");
      }
      }


      Demo on Wandbox.





      To make trimming a little bit easier to use, one might write something like this:



      template <typename Container, typename Predicate>
      Container trim_copy(Container&& container, Predicate predicate) {
      auto [new_first, new_last] = trim(container.begin(), container.end(),
      predicate);
      return Container(new_first, new_last);
      }


      and pass a lambda calling the NumericVector::is_na:



      auto predicate = (auto x) { return NumericVector::is_na(x); };


      or may be pointer to the function directly.






      share|improve this answer

























        up vote
        2
        down vote










        up vote
        2
        down vote









        Code Review



        Code is good, except the naming. There are some pieces which could be improved



        Use auto for iterators, as exact iterator type is not important, but it's properties are. Iterator properties are usually inherited from container it came from.



        Use better constructor, there seems to be one which accepts pair of iterators, and in this case, pair of pointers denoting contiguous range.



        Better (?) approach



        The problem seems to be somewhat specific case of trimming in general. Note that trimming from the right and trimming from the left are symmetric, and std::reverse_iterator "normalizes" trimming from right (great improvements thanks to @Toby).



        template <typename BidirIterator,
        typename Predicate>
        auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
        auto left_edge = std::find_if_not(first, last, predicate);
        auto right_edge = std::find_if_not(std::reverse_iterator(last),
        std::reverse_iterator(left_edge),
        predicate).base();

        return std::pair{left_edge, right_edge};
        }


        When reversing a range, first becomes last, and vice versa, that is why std::reverse_iterator arguments were swapped. There is also C++17 feature in play (template class argument deduction).





        Full code:



        #include <utility>
        #include <iterator>
        #include <algorithm>

        template <typename BidirIterator,
        typename Predicate>
        auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
        auto left_edge = std::find_if_not(first, last, predicate);
        auto right_edge = std::find_if_not(std::reverse_iterator(last),
        std::reverse_iterator(left_edge),
        predicate).base();

        return std::pair{left_edge, right_edge};
        }

        #include <vector>
        #include <stdexcept>

        int main() {
        std::vector<int> initial_values{2, 2, 3, 4, 5, 6};
        auto predicate = (int x) { return x % 2 == 0; };
        auto [new_first, new_last] = trim(initial_values.begin(),
        initial_values.end(),
        predicate);
        std::vector<int> resulting_values(new_first, new_last);
        std::vector<int> correct_result{3, 4, 5};
        if (resulting_values != correct_result) {
        throw std::logic_error("incorrect trimming occured");
        }
        }


        Demo on Wandbox.





        To make trimming a little bit easier to use, one might write something like this:



        template <typename Container, typename Predicate>
        Container trim_copy(Container&& container, Predicate predicate) {
        auto [new_first, new_last] = trim(container.begin(), container.end(),
        predicate);
        return Container(new_first, new_last);
        }


        and pass a lambda calling the NumericVector::is_na:



        auto predicate = (auto x) { return NumericVector::is_na(x); };


        or may be pointer to the function directly.






        share|improve this answer














        Code Review



        Code is good, except the naming. There are some pieces which could be improved



        Use auto for iterators, as exact iterator type is not important, but it's properties are. Iterator properties are usually inherited from container it came from.



        Use better constructor, there seems to be one which accepts pair of iterators, and in this case, pair of pointers denoting contiguous range.



        Better (?) approach



        The problem seems to be somewhat specific case of trimming in general. Note that trimming from the right and trimming from the left are symmetric, and std::reverse_iterator "normalizes" trimming from right (great improvements thanks to @Toby).



        template <typename BidirIterator,
        typename Predicate>
        auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
        auto left_edge = std::find_if_not(first, last, predicate);
        auto right_edge = std::find_if_not(std::reverse_iterator(last),
        std::reverse_iterator(left_edge),
        predicate).base();

        return std::pair{left_edge, right_edge};
        }


        When reversing a range, first becomes last, and vice versa, that is why std::reverse_iterator arguments were swapped. There is also C++17 feature in play (template class argument deduction).





        Full code:



        #include <utility>
        #include <iterator>
        #include <algorithm>

        template <typename BidirIterator,
        typename Predicate>
        auto trim(BidirIterator first, BidirIterator last, Predicate predicate) {
        auto left_edge = std::find_if_not(first, last, predicate);
        auto right_edge = std::find_if_not(std::reverse_iterator(last),
        std::reverse_iterator(left_edge),
        predicate).base();

        return std::pair{left_edge, right_edge};
        }

        #include <vector>
        #include <stdexcept>

        int main() {
        std::vector<int> initial_values{2, 2, 3, 4, 5, 6};
        auto predicate = (int x) { return x % 2 == 0; };
        auto [new_first, new_last] = trim(initial_values.begin(),
        initial_values.end(),
        predicate);
        std::vector<int> resulting_values(new_first, new_last);
        std::vector<int> correct_result{3, 4, 5};
        if (resulting_values != correct_result) {
        throw std::logic_error("incorrect trimming occured");
        }
        }


        Demo on Wandbox.





        To make trimming a little bit easier to use, one might write something like this:



        template <typename Container, typename Predicate>
        Container trim_copy(Container&& container, Predicate predicate) {
        auto [new_first, new_last] = trim(container.begin(), container.end(),
        predicate);
        return Container(new_first, new_last);
        }


        and pass a lambda calling the NumericVector::is_na:



        auto predicate = (auto x) { return NumericVector::is_na(x); };


        or may be pointer to the function directly.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited yesterday









        Toby Speight

        22k536108




        22k536108










        answered yesterday









        Incomputable

        6,42621453




        6,42621453






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207955%2ftrim-na-values-from-left-and-right-of-rcppnumericvector%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Mouse cursor on multiple screens with different PPI

            Agildo Ribeiro

            Sometime when accessing a menu: “Ubuntu 16.04 has experienced an internal error”