πŸ‡―πŸ‡΅ ζ—₯本θͺž | πŸ‡ΊπŸ‡Έ English
Advertisement

C pthreads Intro

POSIX threads for concurrent programming: create / join, data races, mutex, deadlock.

Threads vs processes

A process is a running program with its own memory. A thread is a flow of execution inside a process.
POSIX threads: the standard API on Linux/macOS. Include <pthread.h> and compile with -pthread.

pthread_create / join

Minimal example: two threads print different messages.
#include <stdio.h>
#include <pthread.h>

// Thread entry point: both arg and return are void*
void *worker(void *arg) {
    int id = *(int *)arg;
    printf("hello from thread %d\n", id);
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    int a = 1, b = 2;

    pthread_create(&t1, NULL, worker, &a);
    pthread_create(&t2, NULL, worker, &b);

    pthread_join(t1, NULL);  // wait for t1
    pthread_join(t2, NULL);

    printf("main done\n");
    return 0;
}
$ gcc -pthread hello_pt.c -o hello_pt $ ./hello_pt hello from thread 1 hello from thread 2 main done # order may vary
Always join: without it, main can exit before threads finish, and you leak resources. (Unless you deliberately detach the thread.)

Receiving a return value

void *compute(void *arg) {
    int x = *(int *)arg;
    int *result = malloc(sizeof(int));
    *result = x * x;
    return result;
}

// in main:
void *ret;
pthread_join(t, &ret);
printf("result = %d\n", *(int *)ret);
free(ret);

Data races (demo)

If multiple threads modify the same variable without synchronization, the result is unpredictable. This is a data race.
#include <stdio.h>
#include <pthread.h>

#define N 4
#define LOOPS 1000000

long counter = 0;         // shared

void *worker(void *arg) {
    for (int i = 0; i < LOOPS; i++) {
        counter++;              // NOT atomic!
    }
    return NULL;
}

int main(void) {
    pthread_t th[N];
    for (int i = 0; i < N; i++)
        pthread_create(&th[i], NULL, worker, NULL);
    for (int i = 0; i < N; i++)
        pthread_join(th[i], NULL);

    printf("expected: %d\n", N * LOOPS);
    printf("actual:   %ld\n", counter);
    return 0;
}
$ ./race expected: 4000000 actual: 2841739 # different every run
Why it breaks: counter++ looks atomic but is really "load, add, store" at the CPU level. Two threads doing it concurrently lose updates.

Fix with mutex

A mutex is a lock only one thread can hold at a time. Wrap critical sections with lock/unlock.
#include <stdio.h>
#include <pthread.h>

#define N 4
#define LOOPS 1000000

long counter = 0;
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

void *worker(void *arg) {
    for (int i = 0; i < LOOPS; i++) {
        pthread_mutex_lock(&mtx);
        counter++;
        pthread_mutex_unlock(&mtx);
    }
    return NULL;
}

int main(void) {
    pthread_t th[N];
    for (int i = 0; i < N; i++)
        pthread_create(&th[i], NULL, worker, NULL);
    for (int i = 0; i < N; i++)
        pthread_join(th[i], NULL);

    printf("counter = %ld\n", counter);  // always 4000000
    pthread_mutex_destroy(&mtx);
    return 0;
}
Granularity: smaller critical sections β†’ better parallelism, but too small and lock overhead dominates. Measure to decide.

For simple counters, atomics are faster

#include <stdatomic.h>

atomic_long counter = 0;

// in worker:
atomic_fetch_add(&counter, 1);   // safe, lock-free
<stdatomic.h> is C11. Use atomics for simple counters; use mutex to protect complex invariants.

Avoiding deadlock

Two threads locking two mutexes in opposite orders can wait for each other forever.
// BAD: different lock orders per thread
// Thread A: lock(m1) β†’ lock(m2)
// Thread B: lock(m2) β†’ lock(m1)
// β†’ both hold one and wait for the other

Strategies

  1. Unify the order β€” assign numbers to mutexes and always lock lowest first. No cycle is possible.
  2. trylock β€” pthread_mutex_trylock fails instead of waiting. If you can't grab the second lock, release the first and retry.
  3. Use fewer locks β€” one well-scoped lock is usually simpler than several fine-grained ones.
Tools: valgrind --tool=helgrind ./app detects data races and potential deadlocks.

Challenges

Challenge 1: Parallel sum
Sum a 1,000,000-element array with 4 threads, 250,000 elements each. Each thread accumulates into a local variable first, then adds once under a mutex (to minimize contention).
Challenge 2: Producer / consumer
Implement a 10-slot ring buffer where one thread pushes and another pops. Use pthread_cond_t to wait when empty/full.
Challenge 3: atomic vs mutex
Implement the counter with both mutex and atomic. Time them with clock_gettime for 1, 2, 4, 8 threads and plot.
Challenge 4: Cause and detect a deadlock
Write a program with two mutexes locked in opposite orders. Run under helgrind to see the warning, then fix by unifying order.