Go 语言的协程调度模型

进程、线程和协程

在操作系统中，进程是资源分配的基本单位，线程是 CPU 调度的基本单位，而协程则是用户态的轻量级线程。

协程通过调度器在单个线程中实现多任务并发执行。协程的调度器负责管理协程的生命周期和执行顺序，通常使用协作式调度或抢占式调度。

GPM 调度模型

GPM 调度模型中的 G、P、M 分别代表 Goroutine、Processor 和 Machine，它通过将 Goroutine 映射到 Processor 上，并在 Machine 上执行，实现了高效的并发执行。

Goroutine 的轻量级特性和协作式调度使得 Go 语言能够轻松处理数以万计的并发任务。GPM 模型的设计理念和实现方式为 Go 语言的高性能并发编程提供了强大的支持。

「图片来自 TonyBai」

底层数据结构

go1.23 对应结构体定义如下，省略部分字段和方法，完整源代码见 Github

type g struct {
    stack       stack   // offset known to runtime/cgo
    stackguard0 uintptr // offset known to liblink
    stackguard1 uintptr // offset known to liblink

    _panic *_panic // innermost panic - offset known to liblink
    _defer *_defer // innermost defer
    m      *m      // current m; offset known to arm liblink
    sched  gobuf

    goid uint64

    // 省略其他字段 ...
}

type m struct {
    g0   *g // Goroutine with scheduling stack
    curg *g // current running Goroutine

    p     puintptr // attached p for executing go code (nil if not executing go code)
    nextp puintptr
    oldp  puintptr // the p that was attached before executing a syscall

    // 省略其他字段 ...
}

type p struct {
    m muintptr // back-link to associated m (nil if idle)

    // Queue of runnable Goroutines. Accessed without lock.
    runqhead uint32
    runqtail uint32
    runq     [256]guintptr
    runnext  guintptr

    // 省略其他字段 ...
}

type schedt struct {
    midle        muintptr // idle m's waiting for work
    nmidle       int32    // number of idle m's waiting for work
    nmidlelocked int32    // number of locked m's waiting for work
    mnext        int64    // number of m's that have been created and next M ID
    maxmcount    int32    // maximum number of m's allowed (or die)
    nmsys        int32    // number of system m's not counted for deadlock
    nmfreed      int64    // cumulative number of freed m's

    // Global runnable queue.
    runq     gQueue
    runqsize int32

    // 省略其他字段 ...
}

G（Goroutine）

Goroutine 是 Go 语言的轻量级线程，具有以下特点：

轻量级：Goroutine 的创建和销毁开销非常小，通常只需要几 KB 的栈空间。
协作式调度：Goroutine 通过调度器进行管理，调度器会在 Goroutine 阻塞或主动让出 CPU 时进行切换。
非抢占式：Goroutine 的调度是非抢占式的，只有在 Goroutine 主动让出 CPU 时，调度器才会进行切换。

P（Processor）

P 代表处理器，是 Go 语言调度器的核心组件。每个 P 都有一个本地运行队列，用于存储待执行的 Goroutine。

P 的主要职责是：

管理 Goroutine：P 负责管理本地运行队列中的 Goroutine，并将其分配给 M 执行。
调度 Goroutine：P 通过调度器将 Goroutine 分配给 M 执行，并在 Goroutine 阻塞或主动让出 CPU 时进行切换。

M（Machine）

M 代表机器，是实际执行 Goroutine 的操作系统线程，每个 M 都有自己的栈和寄存器状态。

M 里面存了两个比较重要的东西：

g0：会深度参与运行时的调度过程，比如 Goroutine 的创建、内存分配等
curg：代表当前正在线程上执行的 Goroutine。

由于 P 负责 M 与 G 的关联，所以 M 中还存储了与 P 相关的数据：

p：正在运行代码的处理器
nextp：暂存的处理器
oldp：系统调用之前的线程的处理器

最多会有 GOMAXPROCS 个活跃线程能够正常运行，默认情况下 GOMAXPROCS 被设置为内核数，假如有四个内核，那么默认就创建四个线程，每一个线程对应一个 runtime.m 结构体。线程数等于 CPU 内核个数的原因是，每个线程分配到一个 CPU 上就不至于出现线程的上下文切换，可以保证系统开销降到最低。

工作原理

GPM 模型的工作原理可以概括为以下几个步骤：

通过关键字创建一个新的 G 时，调度器会将其添加到 P 的本地运行队列中，如果 P 的本地队列已经满了就会保存在全局队列中。
M 会从 P 的本地队列中获取一个可执行状态的 G 来执行，如果 P 的本地队列为空，则尝试从全局队列取一批 G，若全局队列也为空，就会向其他的 M-P 组合偷取一个可执行的 G 来执行。
M 执行分配给它的 G，并在执行过程中进行上下文切换。当 G 阻塞或主动让出 CPU 时，M 会释放绑定的P，把 P 转移给其他空闲的线程执行。
P 会从本地运行队列中获取下一个 G，并将其分配给 M 执行。这个过程会持续进行，直到所有 G 执行完毕。

                            +-------------------- sysmon ---------------//------+
                            |                                                   |
                            |                                                   |
               +---+      +---+-------+                   +--------+          +---+---+
go func() ---> | G | ---> | P | local | <=== balance ===> | global | <--//--- | P | M |
               +---+      +---+-------+                   +--------+          +---+---+
                            |                                 |                 |
                            |      +---+                      |                 |
                            +----> | M | <--- findrunnable ---+--- steal <--//--+
                                   +---+
                                     |
                                   mstart
                                     |
              +--- execute <----- schedule
              |                      |
              |                      |
              +--> G.fn --> goexit --+

关键设计策略

Work Stealing：当 P 的本地运行队列为空时，M 会尝试从其他 P 的本地队列中偷取 Goroutine，这样可以充分利用多核 CPU 的并发能力。
Hand Off: 当 M 因系统调用或其他原因阻塞时，调度器会将 M 的本地运行队列 P 转移到其他空闲的 M 去执行，以避免 Goroutine 的执行被阻塞。
抢占式调度：当 Goroutine 执行时间过长时，调度器会强制切换到其他 Goroutine，以避免单个 Goroutine 占用过多 CPU 时间。
全局队列与负载均衡：全局队列作为本地队列的补充，确保所有 Goroutine 都能被执行。调度器会在 P 之间进行负载均衡，以确保每个 P 的本地队列都能保持一定的工作量。