πŸ‡―πŸ‡΅ ζ—₯本θͺž | πŸ‡ΊπŸ‡Έ English

C pthreads Intro

POSIX threads for concurrent programming: create / join, data races, mutex, deadlock.

Threads vs processes

A process is a running program with its own memory. A thread is a flow of execution within a process.
POSIX threads: the standard threading API on Linux and macOS. Include <pthread.h> and compile with -pthread.

pthread_create / join

A minimal example: two threads each print a different message.
#include <stdio.h>
#include <pthread.h>

// Thread entry point: both arg and return are void*
void *worker(void *arg) {
    int id = *(int *)arg;
    printf("hello from thread %d\n", id);
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    int a = 1, b = 2;

    pthread_create(&t1, NULL, worker, &a);
    pthread_create(&t2, NULL, worker, &b);

    pthread_join(t1, NULL);  // wait for t1
    pthread_join(t2, NULL);

    printf("main done\n");
    return 0;
}
$ gcc -pthread hello_pt.c -o hello_pt $ ./hello_pt hello from thread 1 hello from thread 2 main done # order may vary
Always join: without it, main can exit before the threads finish and you'll leak resources. (Unless you deliberately detach the thread.)

Receiving a return value

void *compute(void *arg) {
    int x = *(int *)arg;
    int *result = malloc(sizeof(int));
    *result = x * x;
    return result;
}

// in main:
void *ret;
pthread_join(t, &ret);
printf("result = %d\n", *(int *)ret);
free(ret);

Data races (demo)

When multiple threads modify the same variable without synchronization, the result becomes unpredictable. That's a data race.
#include <stdio.h>
#include <pthread.h>

#define N 4
#define LOOPS 1000000

long counter = 0;         // shared

void *worker(void *arg) {
    for (int i = 0; i < LOOPS; i++) {
        counter++;              // NOT atomic!
    }
    return NULL;
}

int main(void) {
    pthread_t th[N];
    for (int i = 0; i < N; i++)
        pthread_create(&th[i], NULL, worker, NULL);
    for (int i = 0; i < N; i++)
        pthread_join(th[i], NULL);

    printf("expected: %d\n", N * LOOPS);
    printf("actual:   %ld\n", counter);
    return 0;
}
$ ./race expected: 4000000 actual: 2841739 # different every run
Why it breaks: counter++ looks atomic, but at the CPU level it's really three steps β€” load, add, store. Two threads doing it concurrently will lose updates.

Fix with mutex

A mutex is a lock only one thread can hold at a time. Wrap critical sections with lock and unlock.
#include <stdio.h>
#include <pthread.h>

#define N 4
#define LOOPS 1000000

long counter = 0;
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

void *worker(void *arg) {
    for (int i = 0; i < LOOPS; i++) {
        pthread_mutex_lock(&mtx);
        counter++;
        pthread_mutex_unlock(&mtx);
    }
    return NULL;
}

int main(void) {
    pthread_t th[N];
    for (int i = 0; i < N; i++)
        pthread_create(&th[i], NULL, worker, NULL);
    for (int i = 0; i < N; i++)
        pthread_join(th[i], NULL);

    printf("counter = %ld\n", counter);  // always 4000000
    pthread_mutex_destroy(&mtx);
    return 0;
}
Granularity: smaller critical sections mean better parallelism, but if they're too small, lock overhead dominates. Measure to decide.

For simple counters, atomics are faster

#include <stdatomic.h>

atomic_long counter = 0;

// in worker:
atomic_fetch_add(&counter, 1);   // safe, lock-free
<stdatomic.h> is C11. Use atomics for simple counters, and mutexes for protecting more complex invariants.

Avoiding deadlock

Two threads locking two mutexes in opposite orders can end up waiting on each other forever.
// BAD: different lock orders per thread
// Thread A: lock(m1) β†’ lock(m2)
// Thread B: lock(m2) β†’ lock(m1)
// β†’ both hold one and wait for the other

Strategies

  1. Fix the order β€” give each mutex a number and always lock the lowest one first. No cycle can form.
  2. trylock β€” pthread_mutex_trylock fails instead of waiting. If you can't grab the second lock, release the first and retry.
  3. Use fewer locks β€” one well-scoped lock is usually simpler than several fine-grained ones.
Tools: valgrind --tool=helgrind ./app detects data races and potential deadlocks.

Challenges

Challenge 1: Parallel sum
Sum a 1,000,000-element array using 4 threads of 250,000 elements each. Each thread accumulates into a local variable first, then adds once under a mutex to minimize contention.
Challenge 2: Producer / consumer
Implement a 10-slot ring buffer where one thread pushes and another pops. Use pthread_cond_t to wait when the buffer is empty or full.
Challenge 3: atomic vs mutex
Implement the counter with both a mutex and an atomic. Time each with clock_gettime for 1, 2, 4, and 8 threads, then plot the results.
Challenge 4: Cause and detect a deadlock
Write a program that locks two mutexes in opposite orders. Run it under helgrind to see the warning, then fix it by locking in a consistent order.

Review Quiz

Check your understanding of this lesson!

Q1. Which function creates a new POSIX thread?

pthread_create
thread_start
fork_thread

pthread_create(&tid, NULL, func, arg) launches a new thread that starts executing at func.

Q2. What do you use for mutual exclusion when multiple threads access shared data?

pthread_mutex_t (a mutex)
pthread_sema_t
pthread_lock

Code wrapped in pthread_mutex_lock / unlock can only be executed by one thread at a time.

Q3. Which function waits for a created thread to finish?

pthread_join
pthread_wait
pthread_end

pthread_join(tid, &retval) blocks until that thread finishes. Skip it, and its resources won't be reclaimed.