Graceful shutdown: lợi ích và lý do phải có

Dương Nguyễn Hoàng Luân included in Golang DevOps

2025-08-28 661 words 4 minutes

Contents

Mở đầu

Triển khai bản mới, scale xuống pod, hay dừng dịch vụ để bảo trì — tất cả đều đụng đến bài toán tắt hệ thống. Nếu ta “giết” tiến trình ngay lập tức, request đang xử lý bị cắt ngang, dữ liệu có thể dở dang, hàng đợi bỏ dở, connection rò rỉ.

Graceful shutdown là kỹ thuật giúp dịch vụ dừng có trật tự: ngừng nhận việc mới, chờ nốt việc đang làm, đóng tài nguyên, và báo trạng thái rõ ràng cho hạ tầng xung quanh.

Nói đơn giản: graceful shutdown là cách bạn đóng cửa quán cà phê — không tiếp khách mới, pha nốt ly cuối cùng, rửa ly, tắt máy, rồi mới khóa cửa.

Tại sao graceful shutdown quan trọng?

Khi bạn kill một process đang chạy, đây là những gì xảy ra:

Request đang xử lý bị cắt ngang → client nhận 502/503, dữ liệu không nhất quán
Connection DB/Redis không được đóng → leak connection, pool exhausted
Message queue bị bỏ dở → message mất hoặc bị duplicate
Kubernetes không biết pod đã dừng → traffic vẫn gửi đến pod đang tắt

Trong dự án thực tế, tôi từng chứng kiến graceful shutdown ngăn chặn hàng trăn lỗi 502 mỗi lần deploy — chỉ vì hệ thống cũ dùng kill -9 trực tiếp.

Triển khai trong Go

Cơ bản với signal handling

        
        
        
    
package main

import (
    "context"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    server := &http.Server{Addr: ":8086"}

    // Start server trong goroutine
    go func() {
        if err := server.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatalf("Server error: %v", err)
        }
    }()

    // Chờ SIGTERM hoặc SIGINT
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
    <-quit

    log.Println("Shutting down server...")

    // Timeout cho graceful shutdown
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    if err := server.Shutdown(ctx); err != nil {
        log.Fatalf("Server forced to shutdown: %v", err)
    }

    log.Println("Server exited gracefully")
}

Với cleanup nhiều tài nguyên

Trong dự án thực tế, bạn không chỉ có HTTP server. Còn có DB pool, Redis, message queue, background workers… Tất cả cần được đóng đúng thứ tự:

        
        
        
    
type CleanupManager struct {
    handlers []func(ctx context.Context) error
}

func (c *CleanupManager) Add(fn func(ctx context.Context) error) {
    c.handlers = append(c.handlers, fn)
}

func (c *CleanupManager) Run(timeout time.Duration) {
    ctx, cancel := context.WithTimeout(context.Background(), timeout)
    defer cancel()

    for _, h := range c.handlers {
        if err := h(ctx); err != nil {
            log.Printf("Cleanup error: %v", err)
        }
    }
}

func main() {
    server := &http.Server{Addr: ":8086"}
    cleanup := NewCleanupManager()

    cleanup.Add(func(ctx context.Context) error {
        return server.Shutdown(ctx)
    })
    cleanup.Add(func(ctx context.Context) error {
        return dbPool.Close()
    })
    cleanup.Add(func(ctx context.Context) error {
        return redisClient.Close()
    })
    cleanup.Add(func(ctx context.Context) error {
        return worker.Stop(ctx)
    })

    go func() {
        if err := server.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatalf("Server error: %v", err)
        }
    }()

    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
    <-quit

    log.Println("Shutting down...")
    cleanup.Run(30 * time.Second)
    log.Println("Done.")
}

Tích hợp với Kubernetes

Trong môi trường Kubernetes, graceful shutdown cần phối hợp với nhiều thành phần:

Liveness & Readiness probes

        
        
        
    
livenessProbe:
  httpGet:
    path: /healthz
    port: 8086
  initialDelaySeconds: 10
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8086
  initialDelaySeconds: 5
  periodSeconds: 5

PreStop hook

        
lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 10"]

PreStop hook giúp load balancer ngừng gửi traffic đến pod trước khi SIGTERM đến. Điều này tránh trường hợp pod đã dừng xử lý nhưng traffic vẫn được gửi đến.

TerminationGracePeriodSeconds

        
terminationGracePeriodSeconds: 60

Đủ thời gian cho graceful shutdown + cleanup. Nếu pod không dừng trong khoảng thời gian này, Kubernetes sẽ force kill với SIGKILL.

Checklist graceful shutdown

Capture SIGTERM/SIGINT
Ngừng nhận request mới (server.Shutdown)
Chờ request đang xử lý hoàn thành (với timeout)
Đóng DB connections
Đóng Redis/cache connections
Flush logs
Ghi exit code 0
Kubernetes: đủ terminationGracePeriodSeconds
Kubernetes: PreStop hook nếu cần

“Cách bạn tắt hệ thống quyết định UX và data integrity nhiều hơn cách bạn start.”

Bạn đã từng gặp incident do thiếu graceful shutdown? Comment bên dưới để mình cùng học nhé! 🛑