A simple and customizable load test tool in Go

We plan to load test our product before public Beta, with two goals in mind.

  1. Find out bottlenecks: figure out road maps of performance improvement and prepare for oncalls.
  2. Understand how much workload we can support with fixed resources. This shapes the pricing strategy and determines number of Beta partners to onboard, without burning runway.

Because it is complicated to generate meaningful loads to the system, we cannot use general tools like K6. So I implemented a simple customized loadtest tool that allows Subject Matter Experts to write load on their own. Thanks to Go's amazing support of concurrency, the code (the final version) turns out to be quite clean. We'll go through the design and code, share learnings, and hope you enjoy.

Design

  • Load Test CLI: A main.go that starts a single load runner.
  • Load Runner: Configures and schedules (round robin) multiple task runners.
  • Task Runner: Configures and runs a single task.
  • Task: An interface that defines three methods: Setup, Run and Cleanup. Task runner calls Run periodically, in cycles.

Here is an example config that runs 3 task runners concurrently.

 1lifetimeInSeconds: 70 # how long does the load runner run.
 2concurrentRunners: 3 # how many task runners to run concurrently.
 3taskRunners: # definitions of each task runner.
 4  - taskRunner:
 5    runner: 
 6      lifetimeInSeconds: 30 # how long does the task runner run.
 7      cycleIntervalInSeconds: 5 # how long does the task runner wait before next cycle.
 8      jitterInSeconds: 2 # jitter to add on top of cycleIntervalInSeconds.
 9    task: # definition of a task
10      type: "foo" # type of a task. Think it as a name of a template.
11      message: "Hello" # parameters specific to the given type of task.
12  - taskRunner:
13    runner: 
14      lifetimeInSeconds: 20
15      cycleIntervalInSeconds: 2
16      jitterInSeconds: 1
17    task:
18      type: "foo"
19      message: "World"

Given the example config, the loadtest start with 3 tasks, two with the "Hello" and one with the "World". The reason of two "Hello"s is because loadtest takes tasks in round robin.

Code

The complete source code is at pengfyi_resource/tree/main/simple-loadtest. In this section we cover important code decisions.

The Task interface. tasks/task.go

 1// Task describes a task to be run by a TaskRunner.
 2// Each task has three stages, moving in one direction.
 3//  1. Setup: Get the task ready.
 4//  2. Run: TaskRunner calls Run periodically.
 5//  3. Cleanup: Operations to clean up resources if needed.
 6//
 7// Each method must return when the context is canceled.
 8type Task interface {
 9  Setup(context.Context) error
10
11  Run(context.Context) error
12
13  Cleanup(context.Context) error
14}

The TaskRunner struct. runners/task_runner.go

 1// TaskRunner runs a task's Run() periodically. After each run, the runner waits for
 2// sometime, determined by a interval with jitter. The runner can also register a
 3// onStop() to run right before the runner stops.
 4type TaskRunner struct {
 5  id     uuid.UUID
 6  logger *zap.Logger
 7
 8  lifetime      time.Duration
 9  cycleInterval time.Duration
10  jitter        time.Duration
11
12  // callback when the task runner stopped.
13  onStop func()
14
15  startOnce sync.Once
16}
17
18// Start runs the task till ctx is canceled.
19func (r *TaskRunner) Start(ctx context.Context, task tasks.Task) {
20  l := r.logger.With(zap.String("runner-id", r.id.String()))
21
22  periodicTicker := time.NewTicker(r.cycleInterval)
23  stop := func() {
24    periodicTicker.Stop()
25    l.Info("stopping runner, cleanup")
26    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
27    defer cancel()
28    err := task.Cleanup(ctx)
29    if err != nil && err != context.Canceled {
30      l.Error("task cleanup error", zap.Error(err))
31    }
32    if r.onStop != nil {
33      r.onStop()
34    }
35  }
36
37  r.startOnce.Do(func() {
38    ctx, cancel := context.WithTimeout(ctx, r.lifetime)
39    defer cancel()
40    if err := task.Setup(ctx); err != nil {
41      l.Error("cannot setup task", zap.Error(err))
42      return
43    }
44    l.Info("task setup success. going to run in cycles")
45    for {
46      select {
47      case <-ctx.Done():
48        stop()
49        return
50      case <-periodicTicker.C:
51        l.Info("run task in a new cycle")
52        err := task.Run(ctx)
53        if err != nil && err != context.Canceled {
54          l.Error("task run error", zap.Error(err))
55        }
56        periodicTicker.Reset(r.cycleInterval +
57          time.Duration(rand.Int63n(int64(r.jitter))))
58      }
59    }
60  })
61}

The LoadRunner struct. runners/load_runner.go

 1// LoadRunner runs multiple tasks to generate load. It maintains a certain population
 2// of TaskRunners.
 3type LoadRunner struct {
 4  id uuid.UUID
 5
 6  logger *zap.Logger
 7
 8  lifetime time.Duration
 9
10  // Number of targeted active runners.
11  targetedRunnerCnt int
12
13  // Cancel func for each task runner. Right now, it's not been used because
14  // the load runner cancel its own context, which is the parent of task runner's
15  // context. The cancel func is here to support cancel specific task runner
16  // before the task runner's context is canceled.
17  runnerCancels map[uuid.UUID]func()
18
19  runnerProvider TaskRunnerProvider
20
21  stoppedRunnerCh chan uuid.UUID
22
23  wg        sync.WaitGroup
24  startOnce sync.Once
25}
26
27// Start maintains a population of active task runners.
28func (r *LoadRunner) Start(ctx context.Context) {
29  ctx, cancel := context.WithTimeout(ctx, r.lifetime)
30  defer cancel()
31  addRunner := func() {
32    newRunner, newTask := r.runnerProvider.NextRunner(), r.runnerProvider.NextTask()
33    runner, err := newRunner()
34    if err != nil {
35      r.logger.Error("cannot create task runner", zap.Error(err))
36      return
37    }
38    task, err := newTask()
39    if err != nil {
40      r.logger.Error("cannot create task runner", zap.Error(err))
41      return
42    }
43    r.logger.Info("created runner and task", zap.String("runner-id", runner.id.String()))
44
45    runner.WithOptions(WithOnStop(func() {
46      r.stoppedRunnerCh <- runner.id
47    }))
48    r.wg.Add(1)
49    runnerCtx, runnerCancel := context.WithCancel(ctx)
50    go func() {
51      defer r.wg.Done()
52      runner.Start(runnerCtx, task)
53    }()
54    r.runnerCancels[runner.id] = runnerCancel
55  }
56
57  r.startOnce.Do(func() {
58    for len(r.runnerCancels) < r.targetedRunnerCnt {
59      addRunner()
60    }
61    for {
62      select {
63      case <-ctx.Done():
64        r.logger.Info("stopping load runner. wait for task runners to stop")
65        r.wg.Wait()
66        return
67      case runnerID := <-r.stoppedRunnerCh:
68        r.logger.Info("task runner stopped", zap.String("runner-id", runnerID.String()))
69        delete(r.runnerCancels, runnerID)
70        r.logger.Info("going to add a new runner")
71        addRunner()
72      }
73    }
74  })
75}

The core of the loadtest are TaskRunner::Start(ctx) and LoadRunner::Start(ctx). As you can see, both are short, each of about 50 lines. The key idea is to keep the loadrunner and taskrunner dumb. Runners should only know enough when and how to run, not what the run does. In code, when we need to start a new runner, we call addRunner which calls NextRunner and NextTask. nextRunner and NextTask are functions passed to runners by code that knows "what thr tun does". The loadrunner only knows "when and how to run".

 1  r.startOnce.Do(func() {
 2    for len(r.runnerCancels) < r.targetedRunnerCnt {
 3      addRunner() // When to run. Case 1. On startup, populate K runners. 
 4    }
 5    for {
 6      select {
 7      case runnerID := <-r.stoppedRunnerCh:
 8        addRunner() // When to run. Case 2. When a runner stops, populate a new runner.
 9      }
10    }
11  })
12
13  addRunner := func() {
14    newRunner, newTask := r.runnerProvider.NextRunner(), r.runnerProvider.NextTask()
15    runner, task := newRunner(), newTask()
16    runnerCtx, runnerCancel := context.WithCancel(ctx)
17    go func() {
18      runner.Start(runnerCtx, task)
19    }()
20  }

Learnings

The code went through a few iterations. Here are some learnings along the way.

  1. context.Context is a modern, idiomatic and clean approach for many types of concurrency control. In the first version of the code, I used time.After, and operated channels explicitly to implement the Done Pattern. After Go introduced Context in 1.7, I think many concurrency pattern can now be better done by Context. Learn more at Go Concurrency Patterns: Context.

  2. Object Oriented Programming (OOP) paralyzed my analysis at the beginning. How many classes? What attributes each class should have? Where to add Interface and Inheritance? You get the idea. The antidote is to focus on behaviors. Focus on functions instead of attributes and states will likely give you an easier time. Object-Oriented Programming is Bad is a good watch, whether you like OOP or not.