Race Condition

Here we have a simple program racing.c:

We have one global variable with 0 at the beginning, two threads all wants to perform + 1 to this variable 1000000 times, what is the result after these two threads end? Will the result be 20000000?

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define NITERS 10000000

void *count (void *arg);

volatile unsigned int cnt = 0;

int main () {

    pthread_t tid1, tid2;
    pthread_create (&tid1, NULL, count, NULL);
    pthread_create (&tid2, NULL, count, NULL);

    pthread_join (tid1, NULL);
    pthread_join (tid2, NULL);
    printf ("cnt:%d\n", cnt);
    exit (0);

}

void *count (void *arg) {

    volatile int i = 0;

    for (; i < NITERS; i++) {
        cnt++;
    }

    return NULL;
}

Let's try this program now:

We can find result is not 20000000 and each time we run this program we will get a different result. Why we get these result? How can we avoid this situation?