What does the C++ compiler do to ensure that different but adjacent memory locations are safe to be used on...
up vote
29
down vote
favorite
Lets say I have a struct:
struct Foo {
char a; // read and written to by thread 1 only
char b; // read and written to by thread 2 only
};
Now from what I understand, the C++ standard guarantees the safety of the above when two threads operate on the two different memory locations.
I would think though that, since char a and char b, fall within the same cache line, that the compiler has to do extra syncing.
What exactly happens here?
c++ multithreading thread-safety
|
show 1 more comment
up vote
29
down vote
favorite
Lets say I have a struct:
struct Foo {
char a; // read and written to by thread 1 only
char b; // read and written to by thread 2 only
};
Now from what I understand, the C++ standard guarantees the safety of the above when two threads operate on the two different memory locations.
I would think though that, since char a and char b, fall within the same cache line, that the compiler has to do extra syncing.
What exactly happens here?
c++ multithreading thread-safety
10
On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
– geza
yesterday
6
Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
– geza
yesterday
4
This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
– NathanOliver
yesterday
1
I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
– Ross Ridge
yesterday
1
@RossRidge - well, there's the even more obsolete TMS9900 that has the same issue (only with 8-bit values, as it assumes 16-bit alignment) -- there's a port of gcc 3.4 to the architecture, but I don't know whether any particular C++ variant is supported. I'm also not aware of any extant dual-processor TMS9900 machines that would ever actually see this issue. The TMS9900 has instructions that operate on single bytes, but the bus implementation always fetches both bytes in a 16-bit word and rewrites the unchanged one.
– Jules
17 hours ago
|
show 1 more comment
up vote
29
down vote
favorite
up vote
29
down vote
favorite
Lets say I have a struct:
struct Foo {
char a; // read and written to by thread 1 only
char b; // read and written to by thread 2 only
};
Now from what I understand, the C++ standard guarantees the safety of the above when two threads operate on the two different memory locations.
I would think though that, since char a and char b, fall within the same cache line, that the compiler has to do extra syncing.
What exactly happens here?
c++ multithreading thread-safety
Lets say I have a struct:
struct Foo {
char a; // read and written to by thread 1 only
char b; // read and written to by thread 2 only
};
Now from what I understand, the C++ standard guarantees the safety of the above when two threads operate on the two different memory locations.
I would think though that, since char a and char b, fall within the same cache line, that the compiler has to do extra syncing.
What exactly happens here?
c++ multithreading thread-safety
c++ multithreading thread-safety
asked yesterday
Nathan Doromal
1,45111420
1,45111420
10
On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
– geza
yesterday
6
Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
– geza
yesterday
4
This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
– NathanOliver
yesterday
1
I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
– Ross Ridge
yesterday
1
@RossRidge - well, there's the even more obsolete TMS9900 that has the same issue (only with 8-bit values, as it assumes 16-bit alignment) -- there's a port of gcc 3.4 to the architecture, but I don't know whether any particular C++ variant is supported. I'm also not aware of any extant dual-processor TMS9900 machines that would ever actually see this issue. The TMS9900 has instructions that operate on single bytes, but the bus implementation always fetches both bytes in a 16-bit word and rewrites the unchanged one.
– Jules
17 hours ago
|
show 1 more comment
10
On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
– geza
yesterday
6
Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
– geza
yesterday
4
This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
– NathanOliver
yesterday
1
I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
– Ross Ridge
yesterday
1
@RossRidge - well, there's the even more obsolete TMS9900 that has the same issue (only with 8-bit values, as it assumes 16-bit alignment) -- there's a port of gcc 3.4 to the architecture, but I don't know whether any particular C++ variant is supported. I'm also not aware of any extant dual-processor TMS9900 machines that would ever actually see this issue. The TMS9900 has instructions that operate on single bytes, but the bus implementation always fetches both bytes in a 16-bit word and rewrites the unchanged one.
– Jules
17 hours ago
10
10
On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
– geza
yesterday
On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
– geza
yesterday
6
6
Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
– geza
yesterday
Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
– geza
yesterday
4
4
This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
– NathanOliver
yesterday
This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
– NathanOliver
yesterday
1
1
I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
– Ross Ridge
yesterday
I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
– Ross Ridge
yesterday
1
1
@RossRidge - well, there's the even more obsolete TMS9900 that has the same issue (only with 8-bit values, as it assumes 16-bit alignment) -- there's a port of gcc 3.4 to the architecture, but I don't know whether any particular C++ variant is supported. I'm also not aware of any extant dual-processor TMS9900 machines that would ever actually see this issue. The TMS9900 has instructions that operate on single bytes, but the bus implementation always fetches both bytes in a 16-bit word and rewrites the unchanged one.
– Jules
17 hours ago
@RossRidge - well, there's the even more obsolete TMS9900 that has the same issue (only with 8-bit values, as it assumes 16-bit alignment) -- there's a port of gcc 3.4 to the architecture, but I don't know whether any particular C++ variant is supported. I'm also not aware of any extant dual-processor TMS9900 machines that would ever actually see this issue. The TMS9900 has instructions that operate on single bytes, but the bus implementation always fetches both bytes in a 16-bit word and rewrites the unchanged one.
– Jules
17 hours ago
|
show 1 more comment
2 Answers
2
active
oldest
votes
up vote
27
down vote
accepted
This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from
char a[2];
// or
char a, b;
In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.
However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.
1
care should be taken to prevent this from happening when possible.
How would you suggest one going about doing that?
– ArtB
yesterday
3
@ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
– SergeyA
yesterday
1
@ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
– davidbak
yesterday
@ArtB: C++17 provides interference sizes to help guide such design.
– Davis Herring
yesterday
add a comment |
up vote
17
down vote
As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:
std::array<std::uint8_t, 8u> c;
void f()
{
c[0] ^= 0xfa;
c[3] ^= 0x10;
c[6] ^= 0x8b;
c[7] ^= 0x92;
}
Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):
load r0, *(std::uint64_t *) &c[0]
xor r0, 0x928b0000100000fa
store r0, *(std::uint64_t *) &c[0]
This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c
at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.
For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6]
and c[7]
can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.
(That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
27
down vote
accepted
This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from
char a[2];
// or
char a, b;
In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.
However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.
1
care should be taken to prevent this from happening when possible.
How would you suggest one going about doing that?
– ArtB
yesterday
3
@ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
– SergeyA
yesterday
1
@ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
– davidbak
yesterday
@ArtB: C++17 provides interference sizes to help guide such design.
– Davis Herring
yesterday
add a comment |
up vote
27
down vote
accepted
This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from
char a[2];
// or
char a, b;
In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.
However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.
1
care should be taken to prevent this from happening when possible.
How would you suggest one going about doing that?
– ArtB
yesterday
3
@ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
– SergeyA
yesterday
1
@ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
– davidbak
yesterday
@ArtB: C++17 provides interference sizes to help guide such design.
– Davis Herring
yesterday
add a comment |
up vote
27
down vote
accepted
up vote
27
down vote
accepted
This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from
char a[2];
// or
char a, b;
In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.
However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.
This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from
char a[2];
// or
char a, b;
In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.
However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.
edited yesterday
answered yesterday
SergeyA
40k53781
40k53781
1
care should be taken to prevent this from happening when possible.
How would you suggest one going about doing that?
– ArtB
yesterday
3
@ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
– SergeyA
yesterday
1
@ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
– davidbak
yesterday
@ArtB: C++17 provides interference sizes to help guide such design.
– Davis Herring
yesterday
add a comment |
1
care should be taken to prevent this from happening when possible.
How would you suggest one going about doing that?
– ArtB
yesterday
3
@ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
– SergeyA
yesterday
1
@ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
– davidbak
yesterday
@ArtB: C++17 provides interference sizes to help guide such design.
– Davis Herring
yesterday
1
1
care should be taken to prevent this from happening when possible.
How would you suggest one going about doing that?– ArtB
yesterday
care should be taken to prevent this from happening when possible.
How would you suggest one going about doing that?– ArtB
yesterday
3
3
@ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
– SergeyA
yesterday
@ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
– SergeyA
yesterday
1
1
@ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
– davidbak
yesterday
@ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
– davidbak
yesterday
@ArtB: C++17 provides interference sizes to help guide such design.
– Davis Herring
yesterday
@ArtB: C++17 provides interference sizes to help guide such design.
– Davis Herring
yesterday
add a comment |
up vote
17
down vote
As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:
std::array<std::uint8_t, 8u> c;
void f()
{
c[0] ^= 0xfa;
c[3] ^= 0x10;
c[6] ^= 0x8b;
c[7] ^= 0x92;
}
Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):
load r0, *(std::uint64_t *) &c[0]
xor r0, 0x928b0000100000fa
store r0, *(std::uint64_t *) &c[0]
This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c
at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.
For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6]
and c[7]
can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.
(That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)
add a comment |
up vote
17
down vote
As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:
std::array<std::uint8_t, 8u> c;
void f()
{
c[0] ^= 0xfa;
c[3] ^= 0x10;
c[6] ^= 0x8b;
c[7] ^= 0x92;
}
Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):
load r0, *(std::uint64_t *) &c[0]
xor r0, 0x928b0000100000fa
store r0, *(std::uint64_t *) &c[0]
This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c
at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.
For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6]
and c[7]
can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.
(That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)
add a comment |
up vote
17
down vote
up vote
17
down vote
As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:
std::array<std::uint8_t, 8u> c;
void f()
{
c[0] ^= 0xfa;
c[3] ^= 0x10;
c[6] ^= 0x8b;
c[7] ^= 0x92;
}
Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):
load r0, *(std::uint64_t *) &c[0]
xor r0, 0x928b0000100000fa
store r0, *(std::uint64_t *) &c[0]
This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c
at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.
For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6]
and c[7]
can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.
(That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)
As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:
std::array<std::uint8_t, 8u> c;
void f()
{
c[0] ^= 0xfa;
c[3] ^= 0x10;
c[6] ^= 0x8b;
c[7] ^= 0x92;
}
Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):
load r0, *(std::uint64_t *) &c[0]
xor r0, 0x928b0000100000fa
store r0, *(std::uint64_t *) &c[0]
This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c
at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.
For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6]
and c[7]
can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.
(That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)
answered yesterday
Arne Vogel
3,73011125
3,73011125
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376806%2fwhat-does-the-c-compiler-do-to-ensure-that-different-but-adjacent-memory-locat%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
10
On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
– geza
yesterday
6
Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
– geza
yesterday
4
This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
– NathanOliver
yesterday
1
I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
– Ross Ridge
yesterday
1
@RossRidge - well, there's the even more obsolete TMS9900 that has the same issue (only with 8-bit values, as it assumes 16-bit alignment) -- there's a port of gcc 3.4 to the architecture, but I don't know whether any particular C++ variant is supported. I'm also not aware of any extant dual-processor TMS9900 machines that would ever actually see this issue. The TMS9900 has instructions that operate on single bytes, but the bus implementation always fetches both bytes in a 16-bit word and rewrites the unchanged one.
– Jules
17 hours ago