Do you need a mutex to protect an int?
Recently I ran into few pieces of code here and there that assumed that int is an atomic type. I.e. when you modify value of the variable from two or more different threads at the same time, all of the changes you’ve made to the value will remain intact.
But really, can you modify variables of basic types (integers, floats, etc), from two or more threads, at the same time, without screwing their value?
Until now I’ve been very precautious with questions of this kind. I’ve spend enormous amount of time solving synchronization problems. So when I have a variable that being modified or accessed from two or more threads I always put some mutex or semaphore (spinlock in kernel) around it – no questions asked.
Still, I decided to check things out, perhaps one more time. I wanted to see what happens with a variable if two or more threads modifying it. So, I’ve written a short program that demonstrates what happens when you do something like this.
Here’s the code. See my comments below to understand what it does and how it does it.
#include <stdio.h> #include <pthread.h> #include <unistd.h> #include <stdlib.h> #include <sched.h> #include <linux/unistd.h> #include <sys/syscall.h> #include <errno.h> #define INC_TO 1000000 // one million... int global_int = 0; pid_t gettid( void ) { return syscall( __NR_gettid ); } void *thread_routine( void *arg ) { int i; int proc_num = (int)(long)arg; cpu_set_t set; CPU_ZERO( &set ); CPU_SET( proc_num, &set ); if (sched_setaffinity( gettid(), sizeof( cpu_set_t ), &set )) { perror( "sched_setaffinity" ); return NULL; } for (i = 0; i < INC_TO; i++) { global_int++; } return NULL; } int main() { int procs = 0; int i; pthread_t *thrs; // Getting number of CPUs procs = (int)sysconf( _SC_NPROCESSORS_ONLN ); if (procs < 0) { perror( "sysconf" ); return -1; } thrs = malloc( sizeof( pthread_t ) * procs ); if (thrs == NULL) { perror( "malloc" ); return -1; } printf( "Starting %d threads...\n", procs ); for (i = 0; i < procs; i++) { if (pthread_create( &thrs[i], NULL, thread_routine, (void *)(long)i )) { perror( "pthread_create" ); procs = i; break; } } for (i = 0; i < procs; i++) pthread_join( thrs[i], NULL ); free( thrs ); printf( "After doing all the math, global_int value is: %d\n", global_int ); printf( "Expected value is: %d\n", INC_TO * procs ); return 0; }
You can download the code and a Makefile here.
The setup is simple. Number of threads is as number of CPUs in your machine (this includes cores and even hyperthreaded semi-cores). Each thread affined to certain core with sched_setaffinity(). I.e. scheduler configured to run certain thread on certain core, to make sure that all cores access that variable. Each thread increases a global integer named global_int one million times.
In the meantime, main thread waits for the worker threads to do their job. Once they’re done, it prints the value of global_int.
And the result is:
Starting 4 threads... After doing all the math, global_int value is: 1908090 Expected value is: 4000000
I guess numbers speak for themselves.
In case you still have some doubts, consider this. When I compile the code with -O2 compiler option (enabling optimization), the value of the global_int is 4000000, as you may have expected. So, it is not really black and white. Also, things may be different on different computers.
However, the bottom line is that you want to be on the safe side. This code is a little simple proof that in some cases, simultaneous access to certain basic variable can cause its value to become ambiguous. So, to be on the safe side, protect your shared variables, no matter if it is a complex data structure or simple integers.
If you have further questions, comment here or send me an email to alex@alexonlinux.com.
PS: I think this issue deserves full scale article. What do you think?
Hello Alexander, I was looking around your site,
the articles I read were very useful.
I should mention that the above program can work
without mutexes or locks.
For such quantities usually
the “inc” or “add” instruction is used ,
that all the recent processors has.
So if the “global_int++” is replaced with
__asm__ __volatile__(“inc %0″ :”=m” (global_int) :”m” (global_int));
it should work. Sorry for the gcc__IDIOT__assembly, I never learned
it’s syntax completely…
@Stavros
Actually, no, it will not work. This is exactly what I’ve been trying to say in this article. Try running it on multi-core system and you will see it for yourself – replace global_int++ with pieces of assembly below and see the result.
This is the version that should produce correct result.
__asm__ __volatile__(“lock; incl %0;” : “=m”(global_int) : );
This is the one you’ve suggested (fixed assembly :-)). It will not work.
__asm__ __volatile__(“incl %0;” : “=m”(global_int) : );
BTW, I figured out why it produced correct result when compiled with -O2. It appears that with -O2 gcc gets rid of entire loop and simply adds INC_TO to global_int – it doesn’t do 1,000,000 increments, but adds 1,000,000 to the variable
There are two separate issues here. Read or write access to an int *is* atomic (on an x86 – if you are dealing with different sized processors, or different sized data, things can get more complicated). Reads and writes that are bigger than the processor’s size, such as accessing a 64-bit long long on a 32-bit x86, are *not* atomic (though accessing a 64-bit double *might* be atomic).
However, incrementing an int consists of three operations – read the old value, increment it, and write the new value. Each of these individually is atomic, but the combination is not. Even if a single assembly instruction is used, the three parts are still broken up in the processor, and by the bus logic.
But you can still use unlocked ints for atomic access, if you are careful about exactly what you do with them. For example, replace the for() loop in thread_routine with this code:
while (*((volatile int *) &global_int) != proc_num) ;
global_int = proc_num + 1;
(The “volatile” hack is to limit the compiler’s optimisation – check the generated assembly, especially with -O2, to see why it’s needed.)
Here you are only requiring atomic reads and writes, and the code will work properly.
David, I am a little confused with some of the things in your comment.
Here you say that accessing int is an atomic operation.
Here you say that accessing int is not an atomic operation.
I don’t understand the code. I didn’t try it, but I think I understand the volatile trick. Yet the rest of it is quiet confusing. What is proc_num? Why there’s a semicolon at the end of while loop?
for (i = 0; i < INC_TO; i++)
{
// procs = sysconf( _SC_NPROCESSORS_ONLN );
// #define INC_TO 100000 ( 5 zeros )
while( (*(volatile int*)(&global_int)) % procs != proc_num );
global_int ++;
}
The semicolon at the end of the while loop is because the while loop’s body is empty – it’s a busy waiting loop. The “volatile” access typecast is necessary otherwise the compiler’s optimiser would simply read “global_int” once, compare it to “proc_num”, and then either skip the loop or jump forever – “volatile” forces the compiler to re-read “global_int” for each iteration. Finally, “proc_num” is the processor number, taken from the argument given to thread_route() – it’s in your code.
I’d highly recommend you look at the generated assembly code for examples such as these, also using different optimisations (-O0, -O1 and -O2). It would give an insight into exactly what’s happening, and among other things show the necessity of “volatile”. There is a general rule that if code does not function identically (except for time and code size) with and without optimisation, then the code is wrong.
I can see that the semicolon at the end of the loop is because its a busy loop. I am wondering what it is needed for? I mean, if you replace the for loop in the program with while loop as you’ve suggested, then the program will never leave the loop. It will spin infinitely, waiting for global_int to become proc_num. This will never happen, because while threads will wait in a busy loop, the value of global_int will remain unchanged. Unless I’m missing something, I’d say that the code you’ve suggested is broken.
As for volatile, I assume you mean the fact I didn’t know why the program was working with -O2. Yet later I found that with -O2 gcc optimizes entire loop out, leaving simple: global_int += INC_TO;
I suppose you wanted to say that the program behaves differently when compiled with and without optimization. You’re right of course and I will fix as you suggests.
It’s certainly possible that I’ve made a mistake in the code, as I haven’t tried it at all. The idea, however, is that the threads will busy-wait until “global_int” is equal to their “proc_num” argument, passed at the thread creation. Since “global_int” is initialised to 0, the first thread (with a “proc_num” of 0) will break out of its loop, and set “global_int” to 1. This will then free the next thread’s loop, and so on. In practice, you’d probably find the threads starting and exiting before the next thread has even been fully created, making it a bit boring – you could create the threads in reverse order to give them all a chance to run.
Regarding optimisation, I was just making the point that optimisation does not affect the functionality of a program unless the code is incorrect. This is probably no surprise to you, but it does come as a surprise to many people dealing with these issues. In support lists for embedded development tools, I regularly see people complaining that “the compiler is broken” because their code works with no optimisation (or when using an inferior compiler), but fails when optimised. “Volatile” is one of the tools a programmer has for making sure the program does what he wants it to do, rather than just hoping the compiler will read his mind!
But “volatile” and it’s connection with optimisation is straying a bit from your topic here. It’s perhaps worth an article of its own one day, if you are interested in such low-level topics.
Alexander Sandler,
This is a good article that gives a good understanding of a common problem with simple C++ types manipulated in threads. Good job !
I was wondering what __sync_fetch_and_add() does if the “int global_int” data is unaligned ? I guess it is better to force alignement with a compilation option. However I didn’t test anything yet. You do not mention any specific compilation option about this.
Nicolas.
Thank you David for clearing up the biggest misunderstanding in this post that increment is a multiline operation. the read and write individually are atomic
Thank you all for the insights and comments.
@David Brown: Do you have any specific reference that you would recommend? I’d like to learn more about those memory barriers which you’ve commented before. Would you mind sharing how did you get your background on that?
Hi Alexander,
according to your knowledge, are there any elementary data types
(e.g. integers, etc.) for which assignment is guaranteed to be
atomic on all architectures? I tried to have a look, e.g. to the
ISO C standard, but have found nothing so far.
Thanks
-Angelo
There are no data types that are guaranteed to be atomic on all architectures. However, you can get close if you limit yourself to some architectures. In particular, any 32-bit (or more) cpu will be able to do a simple read or write to an aligned 32-bit int atomically (and ints will typically be aligned unless you specifically avoid alignment). And most devices (but not all) are able to do atomic writes to an 8-bit byte.
Don’t bother trying to look in the standards for this sort of thing – you won’t find anything relevant.
@Angelo Borsotti
See David Brown‘s reply
@David Brown
Thanks for the reply
[…] [3] 关于内存对齐、bit field等 –《Linux C编程一站式学习》 [4] Do you need mutex to protect an ‘int’? [5] C++ Concurrency in Action [6] http://www.newsmth.net/bbscon.php?bid=335&id=236629 […]
[…] [3] 关于内存对齐、bit field等 –《Linux C编程一站式学习》 [4] Do you need mutex to protect an ‘int’? [5] C++ Concurrency in Action [6] Multithreaded simple data type access and atomic […]
[…] 今天看到一系列有趣的文章:1, 2, 3 […]
[…] this article I would like to continue subject I started in my previous two posts (post 1 and post2). Question I am trying to answer is what is the most efficient, yet safe way of accessing […]
[…] [3] 关于内存对齐、bit field等 –《Linux C编程一站式学习》 [4]Do you need mutex to protect an ‘int’? [5]C++ Concurrency in Action [6]Multithreaded simple data type access and atomic variables [6] […]
I didnt understsand how the value of global_int is not same as expected.. please be brief ..
[…] [3] 关于内存对齐、bit field等 –《Linux C编程一站式学习》 [4]Do you need mutex to protect an ‘int’? [5]C++ Concurrency in Action [6]Multithreaded simple data type access and atomic variables [6] […]
[…] [3] 关于内存对齐、bit field等 –《Linux C编程一站式学习》 [4] Do you need mutex to protect an ‘int’? [5] C++ Concurrency in Action [6] Multithreaded simple data type access and atomic […]
[…] [3] 关于内存对齐、bit field等 –《Linux C编程一站式学习》 [4] Do you need mutex to protect an ‘int’? [5] C++ Concurrency in Action [6] Multithreaded simple data type access and […]
[…] [4] Do you need mutex to protect an ‘int’? […]
[…] is a follow up on my previous post about need to protect access to even simplest variables when working in multi-threaded environment. […]
[…] [3] 关于内存对齐、bit field等 –《Linux C编程一站式学习》 [4] Do you need mutex to protect an ‘int’? [5] C++ Concurrency in Action [6] Multithreaded simple data type access and atomic variables […]