Below is a compact, field-tested roadmap for building a multitasking scheduler on a microcontroller—starting simple (cooperative) and moving to a true preemptive scheduler (Cortex-M example). You can lift these patterns into STM32, nRF52, RP2040 (M0+/M4/M33), or adapt to AVR/RISC-V easily.
1) Cooperative (super-loop) scheduler — the 1-hour win
Idea: Each task runs a little, then calls yield() or returns. No context switch; simplest and very stable for small systems.
Core pieces
- Task Control Block (TCB): state + next-run time
- Tick source: SysTick/Timer ISR increments sys_time_ms
- Dispatcher: pick the next ready task (round-robin or priority)
typedef void (*task_fn)(void *);
typedef struct {
task_fn fn;
void *arg;
uint32_t next_run_ms; // when this task may run again
uint8_t prio; // 0 = highest
uint8_t ready; // set by events/ISRs
} tcb_t;
volatile uint32_t sys_time_ms;
void SysTick_Handler(void){ sys_time_ms++; } // 1 ms tick
void run_scheduler(tcb_t *tasks, int n){
for(;;){
for(int p=0; p<32; ++p){ // simple priority scan
for(int i=0;i<n;i++){
if(tasks[i].prio==p &&
tasks[i].ready &&
(int32_t)(sys_time_ms - tasks[i].next_run_ms) >= 0){
tasks[i].ready = 0;
tasks[i].fn(tasks[i].arg); // cooperative run
}
}
}
}
}
// Helpers
static inline void task_delay_until(tcb_t *t, uint32_t delta_ms){
t->next_run_ms = sys_time_ms + delta_ms;
t->ready = 1; // allow reschedule when time reached
}
When to use: Tight RAM/flash, simple timing, low risk.
Limits: One misbehaving task can block others; no preemption.
2) Preemptive scheduler (fixed-priority, round-robin) — Cortex-M pattern
Idea: A timer (SysTick) triggers scheduling; PendSV performs the context switch. Each task has its own stack; the CPU can preempt tasks.
2.1 Data structures
typedef struct tcb {
uint32_t *sp; // process stack pointer (PSP)
uint8_t prio; // 0 = highest
uint8_t state; // READY, BLOCKED, ...
uint32_t delay; // ticks to wake
struct tcb *next; // for ready lists (round-robin)
} tcb_t;
#define MAX_PRIO 32
static tcb_t *ready_head[MAX_PRIO];
static uint32_t ready_bitmap; // bit p set => prio p has ready tasks
static tcb_t *current;
volatile uint32_t tick;
O(1) priority pick:
static inline int highest_ready_prio(void){
return 31 - __CLZ(ready_bitmap); // GCC builtin on Cortex-M3+
}
static tcb_t* pick_next(void){
int p = highest_ready_prio();
tcb_t *t = ready_head[p];
if(!t) return NULL;
ready_head[p] = t->next; // round-robin rotate
// put back to tail if still READY:
if(ready_head[p]){ /* splice t at tail */ }
return t;
}
2.2 Creating a task (prepare its initial stack)
On Cortex-M, exception entry/return auto-saves R0–R3,R12,LR,PC,xPSR. We place a fake frame so “return” jumps into the task function.
extern void task_exit_trap(void); // if a task returns
uint32_t *stack_init(uint32_t *top, void (*entry)(void*), void *arg){
*(--top) = 0x01000000; // xPSR (Thumb)
*(--top) = (uint32_t)entry; // PC
*(--top) = (uint32_t)task_exit_trap;// LR
*(--top) = 0; // R12
*(--top) = 0; // R3
*(--top) = 0; // R2
*(--top) = 0; // R1
*(--top) = (uint32_t)arg; // R0
// Space for R4-R11 saved/restored by PendSV:
for(int i=0;i<8;i++) *(--top)=0;
return top;
}
2.3 Tick + wakeups + time slicing
void SysTick_Handler(void){
tick++;
// decrement delays; if zero => READY (set bitmap, enqueue)
// optional: time-slice current task -> pend a context switch
SCB->ICSR = SCB_ICSR_PENDSVSET_Msk; // request switch
}
2.4 Triggering a switch (yield)
static inline void yield(void){
SCB->ICSR = SCB_ICSR_PENDSVSET_Msk;
}
2.5 Context switch in PendSV (Cortex-M, naked handler)
Saves R4–R11 of current task to its PSP, picks next, restores R4–R11, updates PSP, and exits.
__attribute__((naked)) void PendSV_Handler(void){
__asm volatile(
"MRS r0, psp n" // r0 = PSP
"CBZ r0, 1f n" // first switch? skip save
"STMDB r0!, {r4-r11} n" // save callee-saved regs
"LDR r1, =current n"
"LDR r1, [r1] n"
"STR r0, [r1] n" // current->sp = PSP
"1: n"
"PUSH {lr} n"
"BL pick_next n" // r0 = next tcb*
"LDR r1, =current n"
"STR r0, [r1] n"
"LDR r0, [r0] n" // r0 = next->sp
"LDMIA r0!, {r4-r11} n" // restore callee-saved
"MSR psp, r0 n"
"POP {lr} n"
"BX lr n"
);
}
If your core has FPU, either disable lazy stacking or extend save/restore to S16–S31 when the FPCA bit is set.
2.6 Start the first task
Set PendSV to lowest priority; SysTick a bit higher. Load first PSP and “exception-return” to thread mode using PSP.
static void start_first_task(void){
// Priority setup
NVIC_SetPriority(PendSV_IRQn, 0xFF);
NVIC_SetPriority(SysTick_IRQn, 0x80);
current = pick_next(); // choose initial task
__set_PSP((uint32_t)current->sp); // load its PSP
__set_CONTROL(0x02); // Thread mode uses PSP
__ISB();
// Simulate exception return to pop the prepared frame:
__asm volatile(
"LDR r0, =0xFFFFFFFD n" // return to Thread, use PSP
"BX r0 n"
);
}
3) Scheduling policy options
- Fixed-priority preemptive (common for MCUs): Highest ready priority runs; optional round-robin within same priority by rotating list on tick.
- Rate-monotonic (RM): Assign priority by period; great for periodic control loops.
- Earliest-Deadline-First (EDF): Highest urgency first; needs a min-heap of deadlines.
- Tickless idle: If no earlier wakeups, stop SysTick and sleep until the next event (save power).
4) Synchronization & IPC (must-haves)
- Events / semaphores: unblock tasks from ISRs via give_from_isr(); PendSV after ISR if a higher-prio task unblocked.
- Mutex with priority inheritance: avoid priority inversion (important if a low-prio task guards shared I²C/SPI).
- Queues/Ring buffers: single-producer/single-consumer works lock-free; multi-producer => protect with IRQ mask or mutex.
Critical section (Cortex-M):
static inline uint32_t irq_save(void){
uint32_t primask = __get_PRIMASK();
__disable_irq();
__DMB();
return primask;
}
static inline void irq_restore(uint32_t pm){
__DMB();
__set_PRIMASK(pm);
}
5) Sizing & safety checks
- Per-task stack: profile with a watermark (fill with 0xA5, check high-water mark after stress).
- Tick rate: 1 kHz is fine for most; go lower if you use tickless idle.
- Interrupt latency: Keep ISRs short; defer work to tasks.
- Watchdog: kick it from a supervisor task that verifies “I ran the last cycle of all critical tasks.”
6) Bring-up checklist
- SysTick fires and tick increases
- PendSV priority is lowest
- First task stack frame built correctly (xPSR=0x01000000, PC=entry, LR=exit trap)
- Context switch saves/restores R4–R11 and PSP
- Yield & time-slice cause visible task alternation
- Delay/sleep unblocks tasks at the right tick
- Optional: FPU context save verified (if M4F/M33F)