Most efficient way to find an entry in a C++ vector
up vote
7
down vote
favorite
I'm trying to construct an output table containing 80 rows of table status, that could be EMPTY
or USED
as below
+-------+-------+
| TABLE | STATE |
+-------+-------+
| 00 | USED |
| 01 | EMPTY |
| 02 | EMPTY |
..
..
| 79 | EMPTY |
+-------+-------+
I have a vector m_availTableNums
which contains a list of available table numbers. In the below example, I'm putting 20 random table numbers into the above vector such that all the remaining 60 would be empty. My logic below works fine.
Is there a scope for improvement here on the find logic?
#include <iostream>
#include <iomanip>
#include <cmath>
#include <string.h>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;
int main(int argc, const char * argv) {
std::vector<uint8_t> m_availTableNums;
char tmpStr[50];
uint8_t tableNum;
uint8_t randomCount;
srand(time(NULL));
for( tableNum=0; tableNum < 20; tableNum++ )
{
randomCount = ( rand() % 80 );
m_availTableNums.push_back( randomCount );
}
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
tmpStr[0]='';
sprintf(tmpStr, "| TABLE | STATE |");
printf("%sn", tmpStr);
tmpStr[0]='';
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
tableNum = 0;
for( tableNum=0; tableNum < 80 ; tableNum++ )
{
tmpStr[0]='';
if ( std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end() )
{
sprintf(tmpStr, "| %02d | EMPTY |", tableNum );
} else {
sprintf(tmpStr, "| %02d | USED |", tableNum );
}
printf("%sn", tmpStr);
}
tmpStr[0]='';
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
return 0;
}
c++ performance c++11 vectors
add a comment |
up vote
7
down vote
favorite
I'm trying to construct an output table containing 80 rows of table status, that could be EMPTY
or USED
as below
+-------+-------+
| TABLE | STATE |
+-------+-------+
| 00 | USED |
| 01 | EMPTY |
| 02 | EMPTY |
..
..
| 79 | EMPTY |
+-------+-------+
I have a vector m_availTableNums
which contains a list of available table numbers. In the below example, I'm putting 20 random table numbers into the above vector such that all the remaining 60 would be empty. My logic below works fine.
Is there a scope for improvement here on the find logic?
#include <iostream>
#include <iomanip>
#include <cmath>
#include <string.h>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;
int main(int argc, const char * argv) {
std::vector<uint8_t> m_availTableNums;
char tmpStr[50];
uint8_t tableNum;
uint8_t randomCount;
srand(time(NULL));
for( tableNum=0; tableNum < 20; tableNum++ )
{
randomCount = ( rand() % 80 );
m_availTableNums.push_back( randomCount );
}
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
tmpStr[0]='';
sprintf(tmpStr, "| TABLE | STATE |");
printf("%sn", tmpStr);
tmpStr[0]='';
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
tableNum = 0;
for( tableNum=0; tableNum < 80 ; tableNum++ )
{
tmpStr[0]='';
if ( std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end() )
{
sprintf(tmpStr, "| %02d | EMPTY |", tableNum );
} else {
sprintf(tmpStr, "| %02d | USED |", tableNum );
}
printf("%sn", tmpStr);
}
tmpStr[0]='';
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
return 0;
}
c++ performance c++11 vectors
1
Is that hardcoded 80 a coincidence (i.e. could be other numbers, too) or by design? Because if that's by design and if you really only want to know taken-or-not-taken, astd::bitset<80>
will do, too. This will not only be idiomatic, but also faster than any other solution. A bitset of that size fits in two quadwords, or one cache line, respectively. No runtime allocations (while inserting without having calledreserve()
like you do). Access is constant, single assembly instruction, it doesn't get easier or faster than that.
– Damon
Dec 5 at 19:59
@Damon It’s important to remember that not every machine is an x86 computer; many embedded systems for example use 8- or 16-bit processors and have no cache.
– Edward
Dec 6 at 10:27
@Edward: C++11 pretty much rules out 8- or 16-bit embedded systems, but yes in general you are right. Though a bit set worth 5 16-bit words will still be an order of magnitude more efficient to access than iterating astd::vector
even on a no-cache machine.
– Damon
Dec 6 at 11:52
In fact,g++
has supported 8-bit AVR processors for some years. I haveg++ 7.2.0
for AVR on my main computer right now and it supports C++17 on that platform.
– Edward
Dec 6 at 13:09
add a comment |
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I'm trying to construct an output table containing 80 rows of table status, that could be EMPTY
or USED
as below
+-------+-------+
| TABLE | STATE |
+-------+-------+
| 00 | USED |
| 01 | EMPTY |
| 02 | EMPTY |
..
..
| 79 | EMPTY |
+-------+-------+
I have a vector m_availTableNums
which contains a list of available table numbers. In the below example, I'm putting 20 random table numbers into the above vector such that all the remaining 60 would be empty. My logic below works fine.
Is there a scope for improvement here on the find logic?
#include <iostream>
#include <iomanip>
#include <cmath>
#include <string.h>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;
int main(int argc, const char * argv) {
std::vector<uint8_t> m_availTableNums;
char tmpStr[50];
uint8_t tableNum;
uint8_t randomCount;
srand(time(NULL));
for( tableNum=0; tableNum < 20; tableNum++ )
{
randomCount = ( rand() % 80 );
m_availTableNums.push_back( randomCount );
}
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
tmpStr[0]='';
sprintf(tmpStr, "| TABLE | STATE |");
printf("%sn", tmpStr);
tmpStr[0]='';
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
tableNum = 0;
for( tableNum=0; tableNum < 80 ; tableNum++ )
{
tmpStr[0]='';
if ( std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end() )
{
sprintf(tmpStr, "| %02d | EMPTY |", tableNum );
} else {
sprintf(tmpStr, "| %02d | USED |", tableNum );
}
printf("%sn", tmpStr);
}
tmpStr[0]='';
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
return 0;
}
c++ performance c++11 vectors
I'm trying to construct an output table containing 80 rows of table status, that could be EMPTY
or USED
as below
+-------+-------+
| TABLE | STATE |
+-------+-------+
| 00 | USED |
| 01 | EMPTY |
| 02 | EMPTY |
..
..
| 79 | EMPTY |
+-------+-------+
I have a vector m_availTableNums
which contains a list of available table numbers. In the below example, I'm putting 20 random table numbers into the above vector such that all the remaining 60 would be empty. My logic below works fine.
Is there a scope for improvement here on the find logic?
#include <iostream>
#include <iomanip>
#include <cmath>
#include <string.h>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;
int main(int argc, const char * argv) {
std::vector<uint8_t> m_availTableNums;
char tmpStr[50];
uint8_t tableNum;
uint8_t randomCount;
srand(time(NULL));
for( tableNum=0; tableNum < 20; tableNum++ )
{
randomCount = ( rand() % 80 );
m_availTableNums.push_back( randomCount );
}
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
tmpStr[0]='';
sprintf(tmpStr, "| TABLE | STATE |");
printf("%sn", tmpStr);
tmpStr[0]='';
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
tableNum = 0;
for( tableNum=0; tableNum < 80 ; tableNum++ )
{
tmpStr[0]='';
if ( std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end() )
{
sprintf(tmpStr, "| %02d | EMPTY |", tableNum );
} else {
sprintf(tmpStr, "| %02d | USED |", tableNum );
}
printf("%sn", tmpStr);
}
tmpStr[0]='';
sprintf(tmpStr, "+-------+-------+");
printf("%sn", tmpStr);
return 0;
}
c++ performance c++11 vectors
c++ performance c++11 vectors
edited Dec 5 at 19:25
200_success
128k15149412
128k15149412
asked Dec 5 at 9:23
Inian
14318
14318
1
Is that hardcoded 80 a coincidence (i.e. could be other numbers, too) or by design? Because if that's by design and if you really only want to know taken-or-not-taken, astd::bitset<80>
will do, too. This will not only be idiomatic, but also faster than any other solution. A bitset of that size fits in two quadwords, or one cache line, respectively. No runtime allocations (while inserting without having calledreserve()
like you do). Access is constant, single assembly instruction, it doesn't get easier or faster than that.
– Damon
Dec 5 at 19:59
@Damon It’s important to remember that not every machine is an x86 computer; many embedded systems for example use 8- or 16-bit processors and have no cache.
– Edward
Dec 6 at 10:27
@Edward: C++11 pretty much rules out 8- or 16-bit embedded systems, but yes in general you are right. Though a bit set worth 5 16-bit words will still be an order of magnitude more efficient to access than iterating astd::vector
even on a no-cache machine.
– Damon
Dec 6 at 11:52
In fact,g++
has supported 8-bit AVR processors for some years. I haveg++ 7.2.0
for AVR on my main computer right now and it supports C++17 on that platform.
– Edward
Dec 6 at 13:09
add a comment |
1
Is that hardcoded 80 a coincidence (i.e. could be other numbers, too) or by design? Because if that's by design and if you really only want to know taken-or-not-taken, astd::bitset<80>
will do, too. This will not only be idiomatic, but also faster than any other solution. A bitset of that size fits in two quadwords, or one cache line, respectively. No runtime allocations (while inserting without having calledreserve()
like you do). Access is constant, single assembly instruction, it doesn't get easier or faster than that.
– Damon
Dec 5 at 19:59
@Damon It’s important to remember that not every machine is an x86 computer; many embedded systems for example use 8- or 16-bit processors and have no cache.
– Edward
Dec 6 at 10:27
@Edward: C++11 pretty much rules out 8- or 16-bit embedded systems, but yes in general you are right. Though a bit set worth 5 16-bit words will still be an order of magnitude more efficient to access than iterating astd::vector
even on a no-cache machine.
– Damon
Dec 6 at 11:52
In fact,g++
has supported 8-bit AVR processors for some years. I haveg++ 7.2.0
for AVR on my main computer right now and it supports C++17 on that platform.
– Edward
Dec 6 at 13:09
1
1
Is that hardcoded 80 a coincidence (i.e. could be other numbers, too) or by design? Because if that's by design and if you really only want to know taken-or-not-taken, a
std::bitset<80>
will do, too. This will not only be idiomatic, but also faster than any other solution. A bitset of that size fits in two quadwords, or one cache line, respectively. No runtime allocations (while inserting without having called reserve()
like you do). Access is constant, single assembly instruction, it doesn't get easier or faster than that.– Damon
Dec 5 at 19:59
Is that hardcoded 80 a coincidence (i.e. could be other numbers, too) or by design? Because if that's by design and if you really only want to know taken-or-not-taken, a
std::bitset<80>
will do, too. This will not only be idiomatic, but also faster than any other solution. A bitset of that size fits in two quadwords, or one cache line, respectively. No runtime allocations (while inserting without having called reserve()
like you do). Access is constant, single assembly instruction, it doesn't get easier or faster than that.– Damon
Dec 5 at 19:59
@Damon It’s important to remember that not every machine is an x86 computer; many embedded systems for example use 8- or 16-bit processors and have no cache.
– Edward
Dec 6 at 10:27
@Damon It’s important to remember that not every machine is an x86 computer; many embedded systems for example use 8- or 16-bit processors and have no cache.
– Edward
Dec 6 at 10:27
@Edward: C++11 pretty much rules out 8- or 16-bit embedded systems, but yes in general you are right. Though a bit set worth 5 16-bit words will still be an order of magnitude more efficient to access than iterating a
std::vector
even on a no-cache machine.– Damon
Dec 6 at 11:52
@Edward: C++11 pretty much rules out 8- or 16-bit embedded systems, but yes in general you are right. Though a bit set worth 5 16-bit words will still be an order of magnitude more efficient to access than iterating a
std::vector
even on a no-cache machine.– Damon
Dec 6 at 11:52
In fact,
g++
has supported 8-bit AVR processors for some years. I have g++ 7.2.0
for AVR on my main computer right now and it supports C++17 on that platform.– Edward
Dec 6 at 13:09
In fact,
g++
has supported 8-bit AVR processors for some years. I have g++ 7.2.0
for AVR on my main computer right now and it supports C++17 on that platform.– Edward
Dec 6 at 13:09
add a comment |
4 Answers
4
active
oldest
votes
up vote
19
down vote
accepted
Header files
It's strange that this code uses the C header <string.h>
but the C++ versions of <cmath>
, <ctime>
and <cstdlib>
. I recommend sticking to the C++ headers except on the rare occasions that you need to compile the same code with a C compiler. In this case, I don't see anything using <cstring>
, so we can probably just drop that, along with <iomanip>
, <iostream>
and <cmath>
. And we need to add some missing includes: <cstdint>
and <cstdio>
.
Avoid using namespace std
The standard namespace is not one that's designed for wholesale import like this. Unexpected name collisions when you add another header or move to a newer C++ could even cause changes to the meaning of your program.
Use the appropriate signature for main()
Since we're ignoring the command-line arguments, we can use a main()
that takes no parameters:
int main()
Remove pointless temporary string
Instead of formatting into tmpStr
and immediately printing its contents to standard output, we can eliminate that variable by formatting directly to standard output (using the same format string). For example, instead of:
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "| TABLE | STATE |");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
we could simply write:
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
And instead of
tmpStr[0]='';
if ( std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end() )
{
std::sprintf(tmpStr, "| %02d | EMPTY |", tableNum );
} else {
std::sprintf(tmpStr, "| %02d | USED |", tableNum );
}
printf("%sn", tmpStr);
we would have:
if (std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()) {
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
Reduce duplication
Most of these statements are common:
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
The only bit that's different is the EMPTY
or USED
string. So let's decide that first:
const char *status =
std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()
? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", tableNum, status);
Prefer nullptr
value to NULL
macro
The C++ null pointer has strong type, whereas NULL
or 0
can be interpreted as integer.
Reduce scope of variables
randomCount
doesn't need to be valid outside the first for
loop, and we don't need to use the same tableNum
for both loops. Also, we could follow convention and use a short name for a short-lived loop index; i
is the usual choice:
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % 80;
m_availTableNums.push_back(randomCount);
}
for (std::uint8_t i = 0; i < 80; ++i) {
Avoid magic numbers
What's special about 80
? Could we need a different range? Let's give the constant a name, and then we can be sure that the loop matches this range:
constexpr std::uint8_t LIMIT = 80;
...
std::uint8_t randomCount = rand() % LIMIT;
...
for (std::uint8_t i = 0; i < LIMIT; ++i) {
A departure from specification
The description says
I'm putting 20 random table numbers into the above vector such that, all the remaining 60 would be empty.
That's not exactly what's happening, as we're sampling with replacement from the values 0..79. There's nothing to prevent duplicates being added (it's actually quite unlikely that there will be exactly 60 empty values).
Reduce the algorithmic complexity
Each time through the loop, we use std::find()
to see whether we have any matching elements. This is a linear search, so it examines elements in turn until it finds a match. Since it only finds a match one-quarter of the time, the other three-quarters will examine every element in the list, and the time it takes will be proportional to the list length - we say it scales as O(n), where n is the size of the vector. The complete loop therefore scales as O(mn), where m is the value of LIMIT
.
We can reduce the complexity to O(m + n) if we use some extra storage to store the values in a way that makes them easy to test. For example, we could populate a vector that's indexed by the values from m_availTableNums
:
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
If the range were much larger, we might use an (unordered) set instead of vector<bool>
. We might also choose vector<char>
instead of vector<bool>
for better speed at a cost of more space.
Simplified code
Here's my version, keeping to the spirit of the original (creating a list of indices, rather than changing to storing in the form we want to use them):
#include <algorithm>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <vector>
int main()
{
constexpr std::uint8_t LIMIT = 80;
std::vector<std::uint8_t> m_availTableNums;
std::srand(std::time(nullptr));
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % LIMIT;
m_availTableNums.push_back(randomCount);
}
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
std::puts("+-------+-------+");
}
Excellent points!++
, I do have to let know that I'm not an expert yet in C++, just in the beginner stages. These points are really useful. One question before I accept this answer. Your opinion on dave's comment above to use sort and remove elements at head?
– Inian
Dec 5 at 10:03
1
That's a possibility - in complexity terms, that's similar to my suggestion. To some extent, the best choice may depend on whether modifying the values is acceptable. If you go with the sorting, don't forget that you may need to remove multiple elements from the head of the vector when there are duplicate values in it (oh, and don't actually remove - that's expensive; just advance over them and leave them in place).
– Toby Speight
Dec 5 at 10:27
The only thing tmpStr is ever used for is to immediately pass it to printf() with a %s format. So why not just printf() directly?
- first make sure the string doesn't contain any characters with a special meaning for printf
– hanshenrik
Dec 5 at 15:41
@hanshenrik, perhaps I didn't word that very well. I meant we should replace thesprintf
intotmpStr
with aprintf
directly to standard output - same (or nearly same) format string. I'll see if I can edit to prevent misunderstandings.
– Toby Speight
Dec 5 at 15:49
1
@Graham, have you compared the generated code using iterators and using indices? Decent optimising compilers will transform indexing operations into pointer operations quite routinely.
– Toby Speight
Dec 6 at 13:50
|
show 8 more comments
up vote
8
down vote
I would seriously consider whether std::vector is the right data structure. Do you need the available table numbers to be in the order you got them, or can they be sorted? Do you need to allow duplicate table numbers? (I.e. if I add 42 twice, do I have one entry for 42 or two?)
If the data can be sorted and de-duplicated, then the most efficient way to find an entry is to store the data in a std::set. Storing the data takes a little longer (O(log n) for the set vs. amortized O(1) for the vector), but finding the data makes up for it (O(log n) for the set vs. O(n) for the vector). If you use a set, make sure to use set's find function (e.g. mySet.find(tableNum)
rather than std::find(mySet.begin(),mySet.end(),tableNum)
), or there's no benefit.
On a more general code-style note, since you're using C++ data structures anyway, you should probably use std::string
and std::cout
instead of char *
/char
and printf
.
add a comment |
up vote
3
down vote
A std::set
or std::bitset
might be the better containers to use for this, but assuming we're going to stick with std::vector
, here's some improvements on this.
See Toby Speight's answer for some of the improvements I used but did not cover.
Avoid using unsigned numbers
Don't use unsigned int
or uint8_t
. Counterintuitively, it's generally best to use signed int
types even when a value should never be negative. Operations on mixed signed types come with a lot of problems. It's an unfortunate mistake the standard went with unsigned numbers for container sizes. Unsigned numbers are best kept to bitwise operations.
Store whether an index is used at that index
Instead of storing a list of indexes that have been used, let's just store a bool
at each possible index. This uses more space, but then no extra searching has to be done, just a single loop. (Notice I've changed m_availTableNums
to availTableNums
. The prefix m_
is conventionally only used for class members).
std::vector<bool> availTableNums (80);
Putting the 80 here will initialize the vector with 80 false values. Now when we fill the table, we choose random indices to insert true
which indicates a value of used.
std::srand(std::time(nullptr));
for(int i=0; i < 20; i++)
{
availTableNums[std::rand() % availTableNums.size()] = true;
}
Using modern RNG facilities
std::rand
has no guarantees of the "randomness" of the numbers it generates. It's fine if you want to run up a quick test, but in production code we could modify the previous example to make it a little more robust.
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
Here we use std::size_t
, even though it's an unsigned type, as it is better to not have to cast from one type to another if we could avoid it. The random number generator will automatically pick from the correct range so we no longer need the modulus operator.
Use C++ string streams
There is no need for a temporary string, and C++ streams are often a better choice than the C printf family of functions. We can write the header of the table as:
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
The extra <<
s are left for style but not necessary. Now for the printing inside the loop, use std::setw(2)
to make sure the printed index is within 2 columns. The value of this resets if string is sent to std::cout
so it will not effect the rest of the line.
The use of std::endl
will cause the output to always be flushed from the buffer and will have a small performance penalty. However, I am usually quite generous with them unless it is shown they have a sizeable impact on performance. It can make code easier to debug if output is consistently flushed and you know exactly where something went wrong.
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
We again choose an unsigned index as comparing two unsigned types is better than comparing mixed signs or casting one to the other. The conditional expression ?:
allows us to concisely choose between the two strings depending on whether the index is true or false.
Putting it all together with only the relavent headers, we get:
#include <ctime>
#include <iomanip>
#include <iostream>
#include <random>
#include <string>
#include <vector>
int main()
{
std::vector<bool> availTableNums (80);
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
std::cout << "+-------+-------+" << std::endl;
return 0;
}
+1Avoid using unsigned numbers
: Really good advice. Hard to get across especially withstd::size_t
being unsigned but you addressed that point well.
– Martin York
Dec 6 at 0:20
1
Avoid advising usingstd::endl
. Do he really need to pay for a flush? Furthermore, in yourTABLE | STATE
you can safely keep the 1rst<<
and remove others.
– Calak
Dec 6 at 0:51
2
I'd consider the unsigned numbers a preference thing. There are some very strong reasons to use unsigned numbers. To call their use a "mistake" in the STL suggests you haven't thought those through.
– Cort Ammon
Dec 6 at 1:02
@Calak - I put in a note about the performance penalty, but haven't been convinced it's always that bad. I left the extra<<
in as a style choice but I could probably note that.
– TheLoneMilkMan
Dec 6 at 1:29
@CortAmmon - Well I'm basing that off of the opinion of Stroustrup and others involved in the STL design. I'm aware there's not a consensus but the issue isn't usually brought up enough.
– TheLoneMilkMan
Dec 6 at 1:42
add a comment |
up vote
2
down vote
The above are good advice. Do you need exactly N values true? If so, the above loop might be better as (pieces left out for simplicity):
const std::size_t N = 80;
const std::size_t K = 20;
std::unordered_set<std::size_t> exists;
// Fill exists with K random values between [0 .. N], inclusive.
std::default_random_engine rng{std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, N};
while (exists.size() < K) {
exists.insert(uniform_dist(rng));
}
// Print each value in order, in a table.
for(auto i = uniform_dist.min(); i <= uniform_dist.max(); ++i) {
std::cout << "| " << std::setw(2) << i << " | "
<< ((exists.count(i) == 0) ? "EMPTY" : "USED ")
<< " |" << std::endl;
}
(Nice catch, with the exception of picking the symbolic name for the total number in your question about the requirements instead of that for the "true sub-set".)
– greybeard
Dec 8 at 7:56
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209055%2fmost-efficient-way-to-find-an-entry-in-a-c-vector%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
19
down vote
accepted
Header files
It's strange that this code uses the C header <string.h>
but the C++ versions of <cmath>
, <ctime>
and <cstdlib>
. I recommend sticking to the C++ headers except on the rare occasions that you need to compile the same code with a C compiler. In this case, I don't see anything using <cstring>
, so we can probably just drop that, along with <iomanip>
, <iostream>
and <cmath>
. And we need to add some missing includes: <cstdint>
and <cstdio>
.
Avoid using namespace std
The standard namespace is not one that's designed for wholesale import like this. Unexpected name collisions when you add another header or move to a newer C++ could even cause changes to the meaning of your program.
Use the appropriate signature for main()
Since we're ignoring the command-line arguments, we can use a main()
that takes no parameters:
int main()
Remove pointless temporary string
Instead of formatting into tmpStr
and immediately printing its contents to standard output, we can eliminate that variable by formatting directly to standard output (using the same format string). For example, instead of:
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "| TABLE | STATE |");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
we could simply write:
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
And instead of
tmpStr[0]='';
if ( std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end() )
{
std::sprintf(tmpStr, "| %02d | EMPTY |", tableNum );
} else {
std::sprintf(tmpStr, "| %02d | USED |", tableNum );
}
printf("%sn", tmpStr);
we would have:
if (std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()) {
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
Reduce duplication
Most of these statements are common:
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
The only bit that's different is the EMPTY
or USED
string. So let's decide that first:
const char *status =
std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()
? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", tableNum, status);
Prefer nullptr
value to NULL
macro
The C++ null pointer has strong type, whereas NULL
or 0
can be interpreted as integer.
Reduce scope of variables
randomCount
doesn't need to be valid outside the first for
loop, and we don't need to use the same tableNum
for both loops. Also, we could follow convention and use a short name for a short-lived loop index; i
is the usual choice:
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % 80;
m_availTableNums.push_back(randomCount);
}
for (std::uint8_t i = 0; i < 80; ++i) {
Avoid magic numbers
What's special about 80
? Could we need a different range? Let's give the constant a name, and then we can be sure that the loop matches this range:
constexpr std::uint8_t LIMIT = 80;
...
std::uint8_t randomCount = rand() % LIMIT;
...
for (std::uint8_t i = 0; i < LIMIT; ++i) {
A departure from specification
The description says
I'm putting 20 random table numbers into the above vector such that, all the remaining 60 would be empty.
That's not exactly what's happening, as we're sampling with replacement from the values 0..79. There's nothing to prevent duplicates being added (it's actually quite unlikely that there will be exactly 60 empty values).
Reduce the algorithmic complexity
Each time through the loop, we use std::find()
to see whether we have any matching elements. This is a linear search, so it examines elements in turn until it finds a match. Since it only finds a match one-quarter of the time, the other three-quarters will examine every element in the list, and the time it takes will be proportional to the list length - we say it scales as O(n), where n is the size of the vector. The complete loop therefore scales as O(mn), where m is the value of LIMIT
.
We can reduce the complexity to O(m + n) if we use some extra storage to store the values in a way that makes them easy to test. For example, we could populate a vector that's indexed by the values from m_availTableNums
:
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
If the range were much larger, we might use an (unordered) set instead of vector<bool>
. We might also choose vector<char>
instead of vector<bool>
for better speed at a cost of more space.
Simplified code
Here's my version, keeping to the spirit of the original (creating a list of indices, rather than changing to storing in the form we want to use them):
#include <algorithm>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <vector>
int main()
{
constexpr std::uint8_t LIMIT = 80;
std::vector<std::uint8_t> m_availTableNums;
std::srand(std::time(nullptr));
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % LIMIT;
m_availTableNums.push_back(randomCount);
}
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
std::puts("+-------+-------+");
}
Excellent points!++
, I do have to let know that I'm not an expert yet in C++, just in the beginner stages. These points are really useful. One question before I accept this answer. Your opinion on dave's comment above to use sort and remove elements at head?
– Inian
Dec 5 at 10:03
1
That's a possibility - in complexity terms, that's similar to my suggestion. To some extent, the best choice may depend on whether modifying the values is acceptable. If you go with the sorting, don't forget that you may need to remove multiple elements from the head of the vector when there are duplicate values in it (oh, and don't actually remove - that's expensive; just advance over them and leave them in place).
– Toby Speight
Dec 5 at 10:27
The only thing tmpStr is ever used for is to immediately pass it to printf() with a %s format. So why not just printf() directly?
- first make sure the string doesn't contain any characters with a special meaning for printf
– hanshenrik
Dec 5 at 15:41
@hanshenrik, perhaps I didn't word that very well. I meant we should replace thesprintf
intotmpStr
with aprintf
directly to standard output - same (or nearly same) format string. I'll see if I can edit to prevent misunderstandings.
– Toby Speight
Dec 5 at 15:49
1
@Graham, have you compared the generated code using iterators and using indices? Decent optimising compilers will transform indexing operations into pointer operations quite routinely.
– Toby Speight
Dec 6 at 13:50
|
show 8 more comments
up vote
19
down vote
accepted
Header files
It's strange that this code uses the C header <string.h>
but the C++ versions of <cmath>
, <ctime>
and <cstdlib>
. I recommend sticking to the C++ headers except on the rare occasions that you need to compile the same code with a C compiler. In this case, I don't see anything using <cstring>
, so we can probably just drop that, along with <iomanip>
, <iostream>
and <cmath>
. And we need to add some missing includes: <cstdint>
and <cstdio>
.
Avoid using namespace std
The standard namespace is not one that's designed for wholesale import like this. Unexpected name collisions when you add another header or move to a newer C++ could even cause changes to the meaning of your program.
Use the appropriate signature for main()
Since we're ignoring the command-line arguments, we can use a main()
that takes no parameters:
int main()
Remove pointless temporary string
Instead of formatting into tmpStr
and immediately printing its contents to standard output, we can eliminate that variable by formatting directly to standard output (using the same format string). For example, instead of:
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "| TABLE | STATE |");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
we could simply write:
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
And instead of
tmpStr[0]='';
if ( std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end() )
{
std::sprintf(tmpStr, "| %02d | EMPTY |", tableNum );
} else {
std::sprintf(tmpStr, "| %02d | USED |", tableNum );
}
printf("%sn", tmpStr);
we would have:
if (std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()) {
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
Reduce duplication
Most of these statements are common:
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
The only bit that's different is the EMPTY
or USED
string. So let's decide that first:
const char *status =
std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()
? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", tableNum, status);
Prefer nullptr
value to NULL
macro
The C++ null pointer has strong type, whereas NULL
or 0
can be interpreted as integer.
Reduce scope of variables
randomCount
doesn't need to be valid outside the first for
loop, and we don't need to use the same tableNum
for both loops. Also, we could follow convention and use a short name for a short-lived loop index; i
is the usual choice:
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % 80;
m_availTableNums.push_back(randomCount);
}
for (std::uint8_t i = 0; i < 80; ++i) {
Avoid magic numbers
What's special about 80
? Could we need a different range? Let's give the constant a name, and then we can be sure that the loop matches this range:
constexpr std::uint8_t LIMIT = 80;
...
std::uint8_t randomCount = rand() % LIMIT;
...
for (std::uint8_t i = 0; i < LIMIT; ++i) {
A departure from specification
The description says
I'm putting 20 random table numbers into the above vector such that, all the remaining 60 would be empty.
That's not exactly what's happening, as we're sampling with replacement from the values 0..79. There's nothing to prevent duplicates being added (it's actually quite unlikely that there will be exactly 60 empty values).
Reduce the algorithmic complexity
Each time through the loop, we use std::find()
to see whether we have any matching elements. This is a linear search, so it examines elements in turn until it finds a match. Since it only finds a match one-quarter of the time, the other three-quarters will examine every element in the list, and the time it takes will be proportional to the list length - we say it scales as O(n), where n is the size of the vector. The complete loop therefore scales as O(mn), where m is the value of LIMIT
.
We can reduce the complexity to O(m + n) if we use some extra storage to store the values in a way that makes them easy to test. For example, we could populate a vector that's indexed by the values from m_availTableNums
:
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
If the range were much larger, we might use an (unordered) set instead of vector<bool>
. We might also choose vector<char>
instead of vector<bool>
for better speed at a cost of more space.
Simplified code
Here's my version, keeping to the spirit of the original (creating a list of indices, rather than changing to storing in the form we want to use them):
#include <algorithm>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <vector>
int main()
{
constexpr std::uint8_t LIMIT = 80;
std::vector<std::uint8_t> m_availTableNums;
std::srand(std::time(nullptr));
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % LIMIT;
m_availTableNums.push_back(randomCount);
}
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
std::puts("+-------+-------+");
}
Excellent points!++
, I do have to let know that I'm not an expert yet in C++, just in the beginner stages. These points are really useful. One question before I accept this answer. Your opinion on dave's comment above to use sort and remove elements at head?
– Inian
Dec 5 at 10:03
1
That's a possibility - in complexity terms, that's similar to my suggestion. To some extent, the best choice may depend on whether modifying the values is acceptable. If you go with the sorting, don't forget that you may need to remove multiple elements from the head of the vector when there are duplicate values in it (oh, and don't actually remove - that's expensive; just advance over them and leave them in place).
– Toby Speight
Dec 5 at 10:27
The only thing tmpStr is ever used for is to immediately pass it to printf() with a %s format. So why not just printf() directly?
- first make sure the string doesn't contain any characters with a special meaning for printf
– hanshenrik
Dec 5 at 15:41
@hanshenrik, perhaps I didn't word that very well. I meant we should replace thesprintf
intotmpStr
with aprintf
directly to standard output - same (or nearly same) format string. I'll see if I can edit to prevent misunderstandings.
– Toby Speight
Dec 5 at 15:49
1
@Graham, have you compared the generated code using iterators and using indices? Decent optimising compilers will transform indexing operations into pointer operations quite routinely.
– Toby Speight
Dec 6 at 13:50
|
show 8 more comments
up vote
19
down vote
accepted
up vote
19
down vote
accepted
Header files
It's strange that this code uses the C header <string.h>
but the C++ versions of <cmath>
, <ctime>
and <cstdlib>
. I recommend sticking to the C++ headers except on the rare occasions that you need to compile the same code with a C compiler. In this case, I don't see anything using <cstring>
, so we can probably just drop that, along with <iomanip>
, <iostream>
and <cmath>
. And we need to add some missing includes: <cstdint>
and <cstdio>
.
Avoid using namespace std
The standard namespace is not one that's designed for wholesale import like this. Unexpected name collisions when you add another header or move to a newer C++ could even cause changes to the meaning of your program.
Use the appropriate signature for main()
Since we're ignoring the command-line arguments, we can use a main()
that takes no parameters:
int main()
Remove pointless temporary string
Instead of formatting into tmpStr
and immediately printing its contents to standard output, we can eliminate that variable by formatting directly to standard output (using the same format string). For example, instead of:
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "| TABLE | STATE |");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
we could simply write:
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
And instead of
tmpStr[0]='';
if ( std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end() )
{
std::sprintf(tmpStr, "| %02d | EMPTY |", tableNum );
} else {
std::sprintf(tmpStr, "| %02d | USED |", tableNum );
}
printf("%sn", tmpStr);
we would have:
if (std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()) {
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
Reduce duplication
Most of these statements are common:
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
The only bit that's different is the EMPTY
or USED
string. So let's decide that first:
const char *status =
std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()
? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", tableNum, status);
Prefer nullptr
value to NULL
macro
The C++ null pointer has strong type, whereas NULL
or 0
can be interpreted as integer.
Reduce scope of variables
randomCount
doesn't need to be valid outside the first for
loop, and we don't need to use the same tableNum
for both loops. Also, we could follow convention and use a short name for a short-lived loop index; i
is the usual choice:
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % 80;
m_availTableNums.push_back(randomCount);
}
for (std::uint8_t i = 0; i < 80; ++i) {
Avoid magic numbers
What's special about 80
? Could we need a different range? Let's give the constant a name, and then we can be sure that the loop matches this range:
constexpr std::uint8_t LIMIT = 80;
...
std::uint8_t randomCount = rand() % LIMIT;
...
for (std::uint8_t i = 0; i < LIMIT; ++i) {
A departure from specification
The description says
I'm putting 20 random table numbers into the above vector such that, all the remaining 60 would be empty.
That's not exactly what's happening, as we're sampling with replacement from the values 0..79. There's nothing to prevent duplicates being added (it's actually quite unlikely that there will be exactly 60 empty values).
Reduce the algorithmic complexity
Each time through the loop, we use std::find()
to see whether we have any matching elements. This is a linear search, so it examines elements in turn until it finds a match. Since it only finds a match one-quarter of the time, the other three-quarters will examine every element in the list, and the time it takes will be proportional to the list length - we say it scales as O(n), where n is the size of the vector. The complete loop therefore scales as O(mn), where m is the value of LIMIT
.
We can reduce the complexity to O(m + n) if we use some extra storage to store the values in a way that makes them easy to test. For example, we could populate a vector that's indexed by the values from m_availTableNums
:
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
If the range were much larger, we might use an (unordered) set instead of vector<bool>
. We might also choose vector<char>
instead of vector<bool>
for better speed at a cost of more space.
Simplified code
Here's my version, keeping to the spirit of the original (creating a list of indices, rather than changing to storing in the form we want to use them):
#include <algorithm>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <vector>
int main()
{
constexpr std::uint8_t LIMIT = 80;
std::vector<std::uint8_t> m_availTableNums;
std::srand(std::time(nullptr));
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % LIMIT;
m_availTableNums.push_back(randomCount);
}
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
std::puts("+-------+-------+");
}
Header files
It's strange that this code uses the C header <string.h>
but the C++ versions of <cmath>
, <ctime>
and <cstdlib>
. I recommend sticking to the C++ headers except on the rare occasions that you need to compile the same code with a C compiler. In this case, I don't see anything using <cstring>
, so we can probably just drop that, along with <iomanip>
, <iostream>
and <cmath>
. And we need to add some missing includes: <cstdint>
and <cstdio>
.
Avoid using namespace std
The standard namespace is not one that's designed for wholesale import like this. Unexpected name collisions when you add another header or move to a newer C++ could even cause changes to the meaning of your program.
Use the appropriate signature for main()
Since we're ignoring the command-line arguments, we can use a main()
that takes no parameters:
int main()
Remove pointless temporary string
Instead of formatting into tmpStr
and immediately printing its contents to standard output, we can eliminate that variable by formatting directly to standard output (using the same format string). For example, instead of:
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "| TABLE | STATE |");
std::printf("%sn", tmpStr);
tmpStr[0]='';
std::sprintf(tmpStr, "+-------+-------+");
std::printf("%sn", tmpStr);
we could simply write:
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
And instead of
tmpStr[0]='';
if ( std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end() )
{
std::sprintf(tmpStr, "| %02d | EMPTY |", tableNum );
} else {
std::sprintf(tmpStr, "| %02d | USED |", tableNum );
}
printf("%sn", tmpStr);
we would have:
if (std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()) {
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
Reduce duplication
Most of these statements are common:
std::printf("| %02d | EMPTY |n", tableNum);
} else {
std::printf("| %02d | USED |n", tableNum);
}
The only bit that's different is the EMPTY
or USED
string. So let's decide that first:
const char *status =
std::find(m_availTableNums.begin(), m_availTableNums.end(), tableNum) != m_availTableNums.end()
? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", tableNum, status);
Prefer nullptr
value to NULL
macro
The C++ null pointer has strong type, whereas NULL
or 0
can be interpreted as integer.
Reduce scope of variables
randomCount
doesn't need to be valid outside the first for
loop, and we don't need to use the same tableNum
for both loops. Also, we could follow convention and use a short name for a short-lived loop index; i
is the usual choice:
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % 80;
m_availTableNums.push_back(randomCount);
}
for (std::uint8_t i = 0; i < 80; ++i) {
Avoid magic numbers
What's special about 80
? Could we need a different range? Let's give the constant a name, and then we can be sure that the loop matches this range:
constexpr std::uint8_t LIMIT = 80;
...
std::uint8_t randomCount = rand() % LIMIT;
...
for (std::uint8_t i = 0; i < LIMIT; ++i) {
A departure from specification
The description says
I'm putting 20 random table numbers into the above vector such that, all the remaining 60 would be empty.
That's not exactly what's happening, as we're sampling with replacement from the values 0..79. There's nothing to prevent duplicates being added (it's actually quite unlikely that there will be exactly 60 empty values).
Reduce the algorithmic complexity
Each time through the loop, we use std::find()
to see whether we have any matching elements. This is a linear search, so it examines elements in turn until it finds a match. Since it only finds a match one-quarter of the time, the other three-quarters will examine every element in the list, and the time it takes will be proportional to the list length - we say it scales as O(n), where n is the size of the vector. The complete loop therefore scales as O(mn), where m is the value of LIMIT
.
We can reduce the complexity to O(m + n) if we use some extra storage to store the values in a way that makes them easy to test. For example, we could populate a vector that's indexed by the values from m_availTableNums
:
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
If the range were much larger, we might use an (unordered) set instead of vector<bool>
. We might also choose vector<char>
instead of vector<bool>
for better speed at a cost of more space.
Simplified code
Here's my version, keeping to the spirit of the original (creating a list of indices, rather than changing to storing in the form we want to use them):
#include <algorithm>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <vector>
int main()
{
constexpr std::uint8_t LIMIT = 80;
std::vector<std::uint8_t> m_availTableNums;
std::srand(std::time(nullptr));
for (std::uint8_t i = 0; i < 20; ++i) {
std::uint8_t randomCount = rand() % LIMIT;
m_availTableNums.push_back(randomCount);
}
std::puts("+-------+-------+n"
"| TABLE | STATE |n"
"+-------+-------+");
auto by_val = std::vector<bool>(LIMIT, false);
for (auto value: m_availTableNums)
by_val[value] = true;
for (std::uint8_t i = 0; i < LIMIT; ++i) {
const char *status = by_val[i] ? "EMPTY" : "USED";
std::printf("| %02d | %-5s |n", i, status);
}
std::puts("+-------+-------+");
}
edited Dec 6 at 8:57
answered Dec 5 at 9:52
Toby Speight
23.3k639111
23.3k639111
Excellent points!++
, I do have to let know that I'm not an expert yet in C++, just in the beginner stages. These points are really useful. One question before I accept this answer. Your opinion on dave's comment above to use sort and remove elements at head?
– Inian
Dec 5 at 10:03
1
That's a possibility - in complexity terms, that's similar to my suggestion. To some extent, the best choice may depend on whether modifying the values is acceptable. If you go with the sorting, don't forget that you may need to remove multiple elements from the head of the vector when there are duplicate values in it (oh, and don't actually remove - that's expensive; just advance over them and leave them in place).
– Toby Speight
Dec 5 at 10:27
The only thing tmpStr is ever used for is to immediately pass it to printf() with a %s format. So why not just printf() directly?
- first make sure the string doesn't contain any characters with a special meaning for printf
– hanshenrik
Dec 5 at 15:41
@hanshenrik, perhaps I didn't word that very well. I meant we should replace thesprintf
intotmpStr
with aprintf
directly to standard output - same (or nearly same) format string. I'll see if I can edit to prevent misunderstandings.
– Toby Speight
Dec 5 at 15:49
1
@Graham, have you compared the generated code using iterators and using indices? Decent optimising compilers will transform indexing operations into pointer operations quite routinely.
– Toby Speight
Dec 6 at 13:50
|
show 8 more comments
Excellent points!++
, I do have to let know that I'm not an expert yet in C++, just in the beginner stages. These points are really useful. One question before I accept this answer. Your opinion on dave's comment above to use sort and remove elements at head?
– Inian
Dec 5 at 10:03
1
That's a possibility - in complexity terms, that's similar to my suggestion. To some extent, the best choice may depend on whether modifying the values is acceptable. If you go with the sorting, don't forget that you may need to remove multiple elements from the head of the vector when there are duplicate values in it (oh, and don't actually remove - that's expensive; just advance over them and leave them in place).
– Toby Speight
Dec 5 at 10:27
The only thing tmpStr is ever used for is to immediately pass it to printf() with a %s format. So why not just printf() directly?
- first make sure the string doesn't contain any characters with a special meaning for printf
– hanshenrik
Dec 5 at 15:41
@hanshenrik, perhaps I didn't word that very well. I meant we should replace thesprintf
intotmpStr
with aprintf
directly to standard output - same (or nearly same) format string. I'll see if I can edit to prevent misunderstandings.
– Toby Speight
Dec 5 at 15:49
1
@Graham, have you compared the generated code using iterators and using indices? Decent optimising compilers will transform indexing operations into pointer operations quite routinely.
– Toby Speight
Dec 6 at 13:50
Excellent points!
++
, I do have to let know that I'm not an expert yet in C++, just in the beginner stages. These points are really useful. One question before I accept this answer. Your opinion on dave's comment above to use sort and remove elements at head?– Inian
Dec 5 at 10:03
Excellent points!
++
, I do have to let know that I'm not an expert yet in C++, just in the beginner stages. These points are really useful. One question before I accept this answer. Your opinion on dave's comment above to use sort and remove elements at head?– Inian
Dec 5 at 10:03
1
1
That's a possibility - in complexity terms, that's similar to my suggestion. To some extent, the best choice may depend on whether modifying the values is acceptable. If you go with the sorting, don't forget that you may need to remove multiple elements from the head of the vector when there are duplicate values in it (oh, and don't actually remove - that's expensive; just advance over them and leave them in place).
– Toby Speight
Dec 5 at 10:27
That's a possibility - in complexity terms, that's similar to my suggestion. To some extent, the best choice may depend on whether modifying the values is acceptable. If you go with the sorting, don't forget that you may need to remove multiple elements from the head of the vector when there are duplicate values in it (oh, and don't actually remove - that's expensive; just advance over them and leave them in place).
– Toby Speight
Dec 5 at 10:27
The only thing tmpStr is ever used for is to immediately pass it to printf() with a %s format. So why not just printf() directly?
- first make sure the string doesn't contain any characters with a special meaning for printf– hanshenrik
Dec 5 at 15:41
The only thing tmpStr is ever used for is to immediately pass it to printf() with a %s format. So why not just printf() directly?
- first make sure the string doesn't contain any characters with a special meaning for printf– hanshenrik
Dec 5 at 15:41
@hanshenrik, perhaps I didn't word that very well. I meant we should replace the
sprintf
into tmpStr
with a printf
directly to standard output - same (or nearly same) format string. I'll see if I can edit to prevent misunderstandings.– Toby Speight
Dec 5 at 15:49
@hanshenrik, perhaps I didn't word that very well. I meant we should replace the
sprintf
into tmpStr
with a printf
directly to standard output - same (or nearly same) format string. I'll see if I can edit to prevent misunderstandings.– Toby Speight
Dec 5 at 15:49
1
1
@Graham, have you compared the generated code using iterators and using indices? Decent optimising compilers will transform indexing operations into pointer operations quite routinely.
– Toby Speight
Dec 6 at 13:50
@Graham, have you compared the generated code using iterators and using indices? Decent optimising compilers will transform indexing operations into pointer operations quite routinely.
– Toby Speight
Dec 6 at 13:50
|
show 8 more comments
up vote
8
down vote
I would seriously consider whether std::vector is the right data structure. Do you need the available table numbers to be in the order you got them, or can they be sorted? Do you need to allow duplicate table numbers? (I.e. if I add 42 twice, do I have one entry for 42 or two?)
If the data can be sorted and de-duplicated, then the most efficient way to find an entry is to store the data in a std::set. Storing the data takes a little longer (O(log n) for the set vs. amortized O(1) for the vector), but finding the data makes up for it (O(log n) for the set vs. O(n) for the vector). If you use a set, make sure to use set's find function (e.g. mySet.find(tableNum)
rather than std::find(mySet.begin(),mySet.end(),tableNum)
), or there's no benefit.
On a more general code-style note, since you're using C++ data structures anyway, you should probably use std::string
and std::cout
instead of char *
/char
and printf
.
add a comment |
up vote
8
down vote
I would seriously consider whether std::vector is the right data structure. Do you need the available table numbers to be in the order you got them, or can they be sorted? Do you need to allow duplicate table numbers? (I.e. if I add 42 twice, do I have one entry for 42 or two?)
If the data can be sorted and de-duplicated, then the most efficient way to find an entry is to store the data in a std::set. Storing the data takes a little longer (O(log n) for the set vs. amortized O(1) for the vector), but finding the data makes up for it (O(log n) for the set vs. O(n) for the vector). If you use a set, make sure to use set's find function (e.g. mySet.find(tableNum)
rather than std::find(mySet.begin(),mySet.end(),tableNum)
), or there's no benefit.
On a more general code-style note, since you're using C++ data structures anyway, you should probably use std::string
and std::cout
instead of char *
/char
and printf
.
add a comment |
up vote
8
down vote
up vote
8
down vote
I would seriously consider whether std::vector is the right data structure. Do you need the available table numbers to be in the order you got them, or can they be sorted? Do you need to allow duplicate table numbers? (I.e. if I add 42 twice, do I have one entry for 42 or two?)
If the data can be sorted and de-duplicated, then the most efficient way to find an entry is to store the data in a std::set. Storing the data takes a little longer (O(log n) for the set vs. amortized O(1) for the vector), but finding the data makes up for it (O(log n) for the set vs. O(n) for the vector). If you use a set, make sure to use set's find function (e.g. mySet.find(tableNum)
rather than std::find(mySet.begin(),mySet.end(),tableNum)
), or there's no benefit.
On a more general code-style note, since you're using C++ data structures anyway, you should probably use std::string
and std::cout
instead of char *
/char
and printf
.
I would seriously consider whether std::vector is the right data structure. Do you need the available table numbers to be in the order you got them, or can they be sorted? Do you need to allow duplicate table numbers? (I.e. if I add 42 twice, do I have one entry for 42 or two?)
If the data can be sorted and de-duplicated, then the most efficient way to find an entry is to store the data in a std::set. Storing the data takes a little longer (O(log n) for the set vs. amortized O(1) for the vector), but finding the data makes up for it (O(log n) for the set vs. O(n) for the vector). If you use a set, make sure to use set's find function (e.g. mySet.find(tableNum)
rather than std::find(mySet.begin(),mySet.end(),tableNum)
), or there's no benefit.
On a more general code-style note, since you're using C++ data structures anyway, you should probably use std::string
and std::cout
instead of char *
/char
and printf
.
edited Dec 5 at 15:59
Toby Speight
23.3k639111
23.3k639111
answered Dec 5 at 15:24
user186363
812
812
add a comment |
add a comment |
up vote
3
down vote
A std::set
or std::bitset
might be the better containers to use for this, but assuming we're going to stick with std::vector
, here's some improvements on this.
See Toby Speight's answer for some of the improvements I used but did not cover.
Avoid using unsigned numbers
Don't use unsigned int
or uint8_t
. Counterintuitively, it's generally best to use signed int
types even when a value should never be negative. Operations on mixed signed types come with a lot of problems. It's an unfortunate mistake the standard went with unsigned numbers for container sizes. Unsigned numbers are best kept to bitwise operations.
Store whether an index is used at that index
Instead of storing a list of indexes that have been used, let's just store a bool
at each possible index. This uses more space, but then no extra searching has to be done, just a single loop. (Notice I've changed m_availTableNums
to availTableNums
. The prefix m_
is conventionally only used for class members).
std::vector<bool> availTableNums (80);
Putting the 80 here will initialize the vector with 80 false values. Now when we fill the table, we choose random indices to insert true
which indicates a value of used.
std::srand(std::time(nullptr));
for(int i=0; i < 20; i++)
{
availTableNums[std::rand() % availTableNums.size()] = true;
}
Using modern RNG facilities
std::rand
has no guarantees of the "randomness" of the numbers it generates. It's fine if you want to run up a quick test, but in production code we could modify the previous example to make it a little more robust.
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
Here we use std::size_t
, even though it's an unsigned type, as it is better to not have to cast from one type to another if we could avoid it. The random number generator will automatically pick from the correct range so we no longer need the modulus operator.
Use C++ string streams
There is no need for a temporary string, and C++ streams are often a better choice than the C printf family of functions. We can write the header of the table as:
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
The extra <<
s are left for style but not necessary. Now for the printing inside the loop, use std::setw(2)
to make sure the printed index is within 2 columns. The value of this resets if string is sent to std::cout
so it will not effect the rest of the line.
The use of std::endl
will cause the output to always be flushed from the buffer and will have a small performance penalty. However, I am usually quite generous with them unless it is shown they have a sizeable impact on performance. It can make code easier to debug if output is consistently flushed and you know exactly where something went wrong.
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
We again choose an unsigned index as comparing two unsigned types is better than comparing mixed signs or casting one to the other. The conditional expression ?:
allows us to concisely choose between the two strings depending on whether the index is true or false.
Putting it all together with only the relavent headers, we get:
#include <ctime>
#include <iomanip>
#include <iostream>
#include <random>
#include <string>
#include <vector>
int main()
{
std::vector<bool> availTableNums (80);
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
std::cout << "+-------+-------+" << std::endl;
return 0;
}
+1Avoid using unsigned numbers
: Really good advice. Hard to get across especially withstd::size_t
being unsigned but you addressed that point well.
– Martin York
Dec 6 at 0:20
1
Avoid advising usingstd::endl
. Do he really need to pay for a flush? Furthermore, in yourTABLE | STATE
you can safely keep the 1rst<<
and remove others.
– Calak
Dec 6 at 0:51
2
I'd consider the unsigned numbers a preference thing. There are some very strong reasons to use unsigned numbers. To call their use a "mistake" in the STL suggests you haven't thought those through.
– Cort Ammon
Dec 6 at 1:02
@Calak - I put in a note about the performance penalty, but haven't been convinced it's always that bad. I left the extra<<
in as a style choice but I could probably note that.
– TheLoneMilkMan
Dec 6 at 1:29
@CortAmmon - Well I'm basing that off of the opinion of Stroustrup and others involved in the STL design. I'm aware there's not a consensus but the issue isn't usually brought up enough.
– TheLoneMilkMan
Dec 6 at 1:42
add a comment |
up vote
3
down vote
A std::set
or std::bitset
might be the better containers to use for this, but assuming we're going to stick with std::vector
, here's some improvements on this.
See Toby Speight's answer for some of the improvements I used but did not cover.
Avoid using unsigned numbers
Don't use unsigned int
or uint8_t
. Counterintuitively, it's generally best to use signed int
types even when a value should never be negative. Operations on mixed signed types come with a lot of problems. It's an unfortunate mistake the standard went with unsigned numbers for container sizes. Unsigned numbers are best kept to bitwise operations.
Store whether an index is used at that index
Instead of storing a list of indexes that have been used, let's just store a bool
at each possible index. This uses more space, but then no extra searching has to be done, just a single loop. (Notice I've changed m_availTableNums
to availTableNums
. The prefix m_
is conventionally only used for class members).
std::vector<bool> availTableNums (80);
Putting the 80 here will initialize the vector with 80 false values. Now when we fill the table, we choose random indices to insert true
which indicates a value of used.
std::srand(std::time(nullptr));
for(int i=0; i < 20; i++)
{
availTableNums[std::rand() % availTableNums.size()] = true;
}
Using modern RNG facilities
std::rand
has no guarantees of the "randomness" of the numbers it generates. It's fine if you want to run up a quick test, but in production code we could modify the previous example to make it a little more robust.
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
Here we use std::size_t
, even though it's an unsigned type, as it is better to not have to cast from one type to another if we could avoid it. The random number generator will automatically pick from the correct range so we no longer need the modulus operator.
Use C++ string streams
There is no need for a temporary string, and C++ streams are often a better choice than the C printf family of functions. We can write the header of the table as:
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
The extra <<
s are left for style but not necessary. Now for the printing inside the loop, use std::setw(2)
to make sure the printed index is within 2 columns. The value of this resets if string is sent to std::cout
so it will not effect the rest of the line.
The use of std::endl
will cause the output to always be flushed from the buffer and will have a small performance penalty. However, I am usually quite generous with them unless it is shown they have a sizeable impact on performance. It can make code easier to debug if output is consistently flushed and you know exactly where something went wrong.
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
We again choose an unsigned index as comparing two unsigned types is better than comparing mixed signs or casting one to the other. The conditional expression ?:
allows us to concisely choose between the two strings depending on whether the index is true or false.
Putting it all together with only the relavent headers, we get:
#include <ctime>
#include <iomanip>
#include <iostream>
#include <random>
#include <string>
#include <vector>
int main()
{
std::vector<bool> availTableNums (80);
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
std::cout << "+-------+-------+" << std::endl;
return 0;
}
+1Avoid using unsigned numbers
: Really good advice. Hard to get across especially withstd::size_t
being unsigned but you addressed that point well.
– Martin York
Dec 6 at 0:20
1
Avoid advising usingstd::endl
. Do he really need to pay for a flush? Furthermore, in yourTABLE | STATE
you can safely keep the 1rst<<
and remove others.
– Calak
Dec 6 at 0:51
2
I'd consider the unsigned numbers a preference thing. There are some very strong reasons to use unsigned numbers. To call their use a "mistake" in the STL suggests you haven't thought those through.
– Cort Ammon
Dec 6 at 1:02
@Calak - I put in a note about the performance penalty, but haven't been convinced it's always that bad. I left the extra<<
in as a style choice but I could probably note that.
– TheLoneMilkMan
Dec 6 at 1:29
@CortAmmon - Well I'm basing that off of the opinion of Stroustrup and others involved in the STL design. I'm aware there's not a consensus but the issue isn't usually brought up enough.
– TheLoneMilkMan
Dec 6 at 1:42
add a comment |
up vote
3
down vote
up vote
3
down vote
A std::set
or std::bitset
might be the better containers to use for this, but assuming we're going to stick with std::vector
, here's some improvements on this.
See Toby Speight's answer for some of the improvements I used but did not cover.
Avoid using unsigned numbers
Don't use unsigned int
or uint8_t
. Counterintuitively, it's generally best to use signed int
types even when a value should never be negative. Operations on mixed signed types come with a lot of problems. It's an unfortunate mistake the standard went with unsigned numbers for container sizes. Unsigned numbers are best kept to bitwise operations.
Store whether an index is used at that index
Instead of storing a list of indexes that have been used, let's just store a bool
at each possible index. This uses more space, but then no extra searching has to be done, just a single loop. (Notice I've changed m_availTableNums
to availTableNums
. The prefix m_
is conventionally only used for class members).
std::vector<bool> availTableNums (80);
Putting the 80 here will initialize the vector with 80 false values. Now when we fill the table, we choose random indices to insert true
which indicates a value of used.
std::srand(std::time(nullptr));
for(int i=0; i < 20; i++)
{
availTableNums[std::rand() % availTableNums.size()] = true;
}
Using modern RNG facilities
std::rand
has no guarantees of the "randomness" of the numbers it generates. It's fine if you want to run up a quick test, but in production code we could modify the previous example to make it a little more robust.
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
Here we use std::size_t
, even though it's an unsigned type, as it is better to not have to cast from one type to another if we could avoid it. The random number generator will automatically pick from the correct range so we no longer need the modulus operator.
Use C++ string streams
There is no need for a temporary string, and C++ streams are often a better choice than the C printf family of functions. We can write the header of the table as:
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
The extra <<
s are left for style but not necessary. Now for the printing inside the loop, use std::setw(2)
to make sure the printed index is within 2 columns. The value of this resets if string is sent to std::cout
so it will not effect the rest of the line.
The use of std::endl
will cause the output to always be flushed from the buffer and will have a small performance penalty. However, I am usually quite generous with them unless it is shown they have a sizeable impact on performance. It can make code easier to debug if output is consistently flushed and you know exactly where something went wrong.
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
We again choose an unsigned index as comparing two unsigned types is better than comparing mixed signs or casting one to the other. The conditional expression ?:
allows us to concisely choose between the two strings depending on whether the index is true or false.
Putting it all together with only the relavent headers, we get:
#include <ctime>
#include <iomanip>
#include <iostream>
#include <random>
#include <string>
#include <vector>
int main()
{
std::vector<bool> availTableNums (80);
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
std::cout << "+-------+-------+" << std::endl;
return 0;
}
A std::set
or std::bitset
might be the better containers to use for this, but assuming we're going to stick with std::vector
, here's some improvements on this.
See Toby Speight's answer for some of the improvements I used but did not cover.
Avoid using unsigned numbers
Don't use unsigned int
or uint8_t
. Counterintuitively, it's generally best to use signed int
types even when a value should never be negative. Operations on mixed signed types come with a lot of problems. It's an unfortunate mistake the standard went with unsigned numbers for container sizes. Unsigned numbers are best kept to bitwise operations.
Store whether an index is used at that index
Instead of storing a list of indexes that have been used, let's just store a bool
at each possible index. This uses more space, but then no extra searching has to be done, just a single loop. (Notice I've changed m_availTableNums
to availTableNums
. The prefix m_
is conventionally only used for class members).
std::vector<bool> availTableNums (80);
Putting the 80 here will initialize the vector with 80 false values. Now when we fill the table, we choose random indices to insert true
which indicates a value of used.
std::srand(std::time(nullptr));
for(int i=0; i < 20; i++)
{
availTableNums[std::rand() % availTableNums.size()] = true;
}
Using modern RNG facilities
std::rand
has no guarantees of the "randomness" of the numbers it generates. It's fine if you want to run up a quick test, but in production code we could modify the previous example to make it a little more robust.
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
Here we use std::size_t
, even though it's an unsigned type, as it is better to not have to cast from one type to another if we could avoid it. The random number generator will automatically pick from the correct range so we no longer need the modulus operator.
Use C++ string streams
There is no need for a temporary string, and C++ streams are often a better choice than the C printf family of functions. We can write the header of the table as:
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
The extra <<
s are left for style but not necessary. Now for the printing inside the loop, use std::setw(2)
to make sure the printed index is within 2 columns. The value of this resets if string is sent to std::cout
so it will not effect the rest of the line.
The use of std::endl
will cause the output to always be flushed from the buffer and will have a small performance penalty. However, I am usually quite generous with them unless it is shown they have a sizeable impact on performance. It can make code easier to debug if output is consistently flushed and you know exactly where something went wrong.
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
We again choose an unsigned index as comparing two unsigned types is better than comparing mixed signs or casting one to the other. The conditional expression ?:
allows us to concisely choose between the two strings depending on whether the index is true or false.
Putting it all together with only the relavent headers, we get:
#include <ctime>
#include <iomanip>
#include <iostream>
#include <random>
#include <string>
#include <vector>
int main()
{
std::vector<bool> availTableNums (80);
std::default_random_engine engine {std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, availTableNums.size()};
for(int i=0; i < 20; i++)
{
availTableNums[uniform_dist(e1)] = true;
}
std::cout << "+-------+-------+n"
<< "| TABLE | STATE |n"
<< "+-------+-------+" << std::endl;
for(auto i=0u; i < availTableNums.size(); i++)
{
std::cout << "| " << std::setw(2) << i << " | "
<< (availTableNums[i] ? "USED " : "EMPTY")
<< " |" << std::endl;
}
std::cout << "+-------+-------+" << std::endl;
return 0;
}
edited Dec 6 at 1:31
answered Dec 5 at 23:04
TheLoneMilkMan
1312
1312
+1Avoid using unsigned numbers
: Really good advice. Hard to get across especially withstd::size_t
being unsigned but you addressed that point well.
– Martin York
Dec 6 at 0:20
1
Avoid advising usingstd::endl
. Do he really need to pay for a flush? Furthermore, in yourTABLE | STATE
you can safely keep the 1rst<<
and remove others.
– Calak
Dec 6 at 0:51
2
I'd consider the unsigned numbers a preference thing. There are some very strong reasons to use unsigned numbers. To call their use a "mistake" in the STL suggests you haven't thought those through.
– Cort Ammon
Dec 6 at 1:02
@Calak - I put in a note about the performance penalty, but haven't been convinced it's always that bad. I left the extra<<
in as a style choice but I could probably note that.
– TheLoneMilkMan
Dec 6 at 1:29
@CortAmmon - Well I'm basing that off of the opinion of Stroustrup and others involved in the STL design. I'm aware there's not a consensus but the issue isn't usually brought up enough.
– TheLoneMilkMan
Dec 6 at 1:42
add a comment |
+1Avoid using unsigned numbers
: Really good advice. Hard to get across especially withstd::size_t
being unsigned but you addressed that point well.
– Martin York
Dec 6 at 0:20
1
Avoid advising usingstd::endl
. Do he really need to pay for a flush? Furthermore, in yourTABLE | STATE
you can safely keep the 1rst<<
and remove others.
– Calak
Dec 6 at 0:51
2
I'd consider the unsigned numbers a preference thing. There are some very strong reasons to use unsigned numbers. To call their use a "mistake" in the STL suggests you haven't thought those through.
– Cort Ammon
Dec 6 at 1:02
@Calak - I put in a note about the performance penalty, but haven't been convinced it's always that bad. I left the extra<<
in as a style choice but I could probably note that.
– TheLoneMilkMan
Dec 6 at 1:29
@CortAmmon - Well I'm basing that off of the opinion of Stroustrup and others involved in the STL design. I'm aware there's not a consensus but the issue isn't usually brought up enough.
– TheLoneMilkMan
Dec 6 at 1:42
+1
Avoid using unsigned numbers
: Really good advice. Hard to get across especially with std::size_t
being unsigned but you addressed that point well.– Martin York
Dec 6 at 0:20
+1
Avoid using unsigned numbers
: Really good advice. Hard to get across especially with std::size_t
being unsigned but you addressed that point well.– Martin York
Dec 6 at 0:20
1
1
Avoid advising using
std::endl
. Do he really need to pay for a flush? Furthermore, in your TABLE | STATE
you can safely keep the 1rst <<
and remove others.– Calak
Dec 6 at 0:51
Avoid advising using
std::endl
. Do he really need to pay for a flush? Furthermore, in your TABLE | STATE
you can safely keep the 1rst <<
and remove others.– Calak
Dec 6 at 0:51
2
2
I'd consider the unsigned numbers a preference thing. There are some very strong reasons to use unsigned numbers. To call their use a "mistake" in the STL suggests you haven't thought those through.
– Cort Ammon
Dec 6 at 1:02
I'd consider the unsigned numbers a preference thing. There are some very strong reasons to use unsigned numbers. To call their use a "mistake" in the STL suggests you haven't thought those through.
– Cort Ammon
Dec 6 at 1:02
@Calak - I put in a note about the performance penalty, but haven't been convinced it's always that bad. I left the extra
<<
in as a style choice but I could probably note that.– TheLoneMilkMan
Dec 6 at 1:29
@Calak - I put in a note about the performance penalty, but haven't been convinced it's always that bad. I left the extra
<<
in as a style choice but I could probably note that.– TheLoneMilkMan
Dec 6 at 1:29
@CortAmmon - Well I'm basing that off of the opinion of Stroustrup and others involved in the STL design. I'm aware there's not a consensus but the issue isn't usually brought up enough.
– TheLoneMilkMan
Dec 6 at 1:42
@CortAmmon - Well I'm basing that off of the opinion of Stroustrup and others involved in the STL design. I'm aware there's not a consensus but the issue isn't usually brought up enough.
– TheLoneMilkMan
Dec 6 at 1:42
add a comment |
up vote
2
down vote
The above are good advice. Do you need exactly N values true? If so, the above loop might be better as (pieces left out for simplicity):
const std::size_t N = 80;
const std::size_t K = 20;
std::unordered_set<std::size_t> exists;
// Fill exists with K random values between [0 .. N], inclusive.
std::default_random_engine rng{std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, N};
while (exists.size() < K) {
exists.insert(uniform_dist(rng));
}
// Print each value in order, in a table.
for(auto i = uniform_dist.min(); i <= uniform_dist.max(); ++i) {
std::cout << "| " << std::setw(2) << i << " | "
<< ((exists.count(i) == 0) ? "EMPTY" : "USED ")
<< " |" << std::endl;
}
(Nice catch, with the exception of picking the symbolic name for the total number in your question about the requirements instead of that for the "true sub-set".)
– greybeard
Dec 8 at 7:56
add a comment |
up vote
2
down vote
The above are good advice. Do you need exactly N values true? If so, the above loop might be better as (pieces left out for simplicity):
const std::size_t N = 80;
const std::size_t K = 20;
std::unordered_set<std::size_t> exists;
// Fill exists with K random values between [0 .. N], inclusive.
std::default_random_engine rng{std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, N};
while (exists.size() < K) {
exists.insert(uniform_dist(rng));
}
// Print each value in order, in a table.
for(auto i = uniform_dist.min(); i <= uniform_dist.max(); ++i) {
std::cout << "| " << std::setw(2) << i << " | "
<< ((exists.count(i) == 0) ? "EMPTY" : "USED ")
<< " |" << std::endl;
}
(Nice catch, with the exception of picking the symbolic name for the total number in your question about the requirements instead of that for the "true sub-set".)
– greybeard
Dec 8 at 7:56
add a comment |
up vote
2
down vote
up vote
2
down vote
The above are good advice. Do you need exactly N values true? If so, the above loop might be better as (pieces left out for simplicity):
const std::size_t N = 80;
const std::size_t K = 20;
std::unordered_set<std::size_t> exists;
// Fill exists with K random values between [0 .. N], inclusive.
std::default_random_engine rng{std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, N};
while (exists.size() < K) {
exists.insert(uniform_dist(rng));
}
// Print each value in order, in a table.
for(auto i = uniform_dist.min(); i <= uniform_dist.max(); ++i) {
std::cout << "| " << std::setw(2) << i << " | "
<< ((exists.count(i) == 0) ? "EMPTY" : "USED ")
<< " |" << std::endl;
}
The above are good advice. Do you need exactly N values true? If so, the above loop might be better as (pieces left out for simplicity):
const std::size_t N = 80;
const std::size_t K = 20;
std::unordered_set<std::size_t> exists;
// Fill exists with K random values between [0 .. N], inclusive.
std::default_random_engine rng{std::time()};
std::uniform_int_distribution<std::size_t> uniform_dist {0, N};
while (exists.size() < K) {
exists.insert(uniform_dist(rng));
}
// Print each value in order, in a table.
for(auto i = uniform_dist.min(); i <= uniform_dist.max(); ++i) {
std::cout << "| " << std::setw(2) << i << " | "
<< ((exists.count(i) == 0) ? "EMPTY" : "USED ")
<< " |" << std::endl;
}
answered Dec 8 at 3:16
Laramie
211
211
(Nice catch, with the exception of picking the symbolic name for the total number in your question about the requirements instead of that for the "true sub-set".)
– greybeard
Dec 8 at 7:56
add a comment |
(Nice catch, with the exception of picking the symbolic name for the total number in your question about the requirements instead of that for the "true sub-set".)
– greybeard
Dec 8 at 7:56
(Nice catch, with the exception of picking the symbolic name for the total number in your question about the requirements instead of that for the "true sub-set".)
– greybeard
Dec 8 at 7:56
(Nice catch, with the exception of picking the symbolic name for the total number in your question about the requirements instead of that for the "true sub-set".)
– greybeard
Dec 8 at 7:56
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209055%2fmost-efficient-way-to-find-an-entry-in-a-c-vector%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Is that hardcoded 80 a coincidence (i.e. could be other numbers, too) or by design? Because if that's by design and if you really only want to know taken-or-not-taken, a
std::bitset<80>
will do, too. This will not only be idiomatic, but also faster than any other solution. A bitset of that size fits in two quadwords, or one cache line, respectively. No runtime allocations (while inserting without having calledreserve()
like you do). Access is constant, single assembly instruction, it doesn't get easier or faster than that.– Damon
Dec 5 at 19:59
@Damon It’s important to remember that not every machine is an x86 computer; many embedded systems for example use 8- or 16-bit processors and have no cache.
– Edward
Dec 6 at 10:27
@Edward: C++11 pretty much rules out 8- or 16-bit embedded systems, but yes in general you are right. Though a bit set worth 5 16-bit words will still be an order of magnitude more efficient to access than iterating a
std::vector
even on a no-cache machine.– Damon
Dec 6 at 11:52
In fact,
g++
has supported 8-bit AVR processors for some years. I haveg++ 7.2.0
for AVR on my main computer right now and it supports C++17 on that platform.– Edward
Dec 6 at 13:09