Priority Map: A hash map with access of the minimum value
I implemented a job scheduler at work recently. Each job has a unique ID and an expiration
time. The job ID is immutable while the expiration time may change. The job scheduler
schedules the job of the earliest expiration time. Go comes with
heap.Interface
and map
. My first
implementation uses a hashmap and a slice that implements the heap.Interface. The hashmap
maps the Job and the index of the Job in the heap array. It works but I
need to operate two data structures, the heap array and map in application code.
It makes sense to encapsulate the heap and hashmap into one data structure with reasonable APIs.
So I implemented the PriorityMap
.
What is the PriorityMap?
PriorityMap is a combination of a hash map and a binary heap. It has following APIs.
Get(key) Value
: Returns the value associated with the key.Set(key, value)
: Inserts a key-value pair if the key does not exist. Otherwise, updates the value.Delete(key)
: Deletes the key-value.Top() (K, V, bool)
: Returns the key-value pair of the smallest value. Returns false if the set is empty.Pop() (K, V, bool)
: Removes and returns the key-value pair of the smallest value. returns the number of key-value pairs.Size() int
: Returns the number of key-value pairs.Map() map[K]*Element[K, V]
: Returns the underlying map. It is here to provide an efficient way of iterating over all key-value pairs. Most of the time, you don't need it.
PriorityMap is useful for workloads that requires:
- A Key-Value store that supports Get, Set, and Delete by key.
- A Heap that supports access the key-value pair of the smallest value.
For example,
- Job Scheduler that schedules job with the highest priority.
- Implementing greedy algorithms like the Dijkstra's algorithm and Prim's algorithm.
Comparison with similar data structures
There are three similar data structures:
- TreeMap: key-value pairs are sorted by key.
- SortedSet: key-values are sorted by value.
- PriorityQueue: access the smallest element out of queue.
Both TreeMap and SortedSet keep all key-value pairs sorted all the time. TreeMap keeps order using a balanced binary tree, and SortedSet uses a skip list. PriorityMap, however, does not sort all key-value pairs. Only the top of Heap is guaranteed an order, the minimum, among all pairs. PriorityQueue does not store key-value pairs, so it does not supports access by key. It does not support update an element's priority either.
Implementation and Example
PriorityMap is implemented at priority_map.go See the following two examples for a quick start.
1// Example of simple Key Value type.
2import "github.com/pengubco/algorithms/priority_map"
3
4func main() {
5 pm := priority_map.NewPriorityMap[int, int](func(a, b int) bool {
6 return a < b
7 })
8 pm.Set(1, 10)
9 pm.Set(2, 10)
10 pm.Set(3, 30)
11 fmt.Printf("size: %d\n", pm.Size()) // "size: 3"
12 if v, ok := pm.Get(1); ok {
13 fmt.Printf("key: 1, value: %d\n", v) // "key: 1, value: 10"
14 }
15 if k, v, ok := pm.Top(); ok {
16 fmt.Printf("key: %d, value: %d\n", k, v) // "key: 1, value: 10" or "key: 2, value: 10"
17 }
18}
1// Example of advanced Key Value type.
2import "github.com/pengubco/algorithms/priority_map"
3
4type Job struct {
5 id int
6 expiration time.Time
7 name string
8}
9
10func main() {
11 pm := priority_map.NewPriorityMap[int, *Job](func(v1, v2 *Job) bool {
12 return v1.expiration.Before(v2.expiration)
13 })
14 now := time.Now()
15 jobs := []Job{
16 {1, now.Add(-1 * time.Minute), "job 1"},
17 {2, now.Add(-2 * time.Minute), "job 2"},
18 {3, now.Add(-3 * time.Minute), "job 3"},
19 }
20 for i, _ := range jobs {
21 pm.Set(jobs[i].id, &jobs[i])
22 }
23 id, job, _ := pm.Top()
24 fmt.Printf("job with the smallest expiration. id %d, name %s\n", id, job.name)
25 job, _ = pm.Get(2)
26 fmt.Printf("job 2's expiration is %v\n", job.expiration)
27 fmt.Println("taking jobs one by one, in the order of expiration time")
28 for pm.Size() > 0 {
29 if id, job, ok := pm.Pop(); ok {
30 fmt.Printf("job id %d, name %s, expiration %v\n", id, job.name, job.expiration)
31 }
32 }
33}
Performance
We benchmark Add, Update, Delete, Pop on a priority map of 1M key-value pairs. The following is a result on my Mac M1 Max. You can run the benchmark with.
1go test -bench BenchmarkPriorityMap -benchmem -benchtime 10s
1goos: darwin
2goarch: arm64
3pkg: github.com/pengubco/algorithms/priority_map
4BenchmarkPriorityMap_Add_1M-10 66 175442495 ns/op 149403499 B/op 1023415 allocs/op
5BenchmarkPriorityMap_Update_1M-10 100 124093262 ns/op 80035 B/op 0 allocs/op
6BenchmarkPriorityMap_Del_1M-10 75 170370006 ns/op 0 B/op 0 allocs/op
7BenchmarkPriorityMap_Pop_1M-10 19 604293805 ns/op 0 B/op 0 allocs/op
8PASS
9ok github.com/pengubco/algorithms/priority_map 94.674s
No surprise that Pop
is most expensive because the heap may need to go from root to a leaf
to maintain the heap structure. It takes 604ms (~0.6 second) to pop 1M key-value pairs. I think
this is fast enough for normal production use.
Correctness
We run 1M operations on PriorityMap and a SortedSet in Redis, see redis-compare/main.go. After each operation, we compare the size and the smallest key-value pair from PriorityMap with corresponding values from Redis. This gives us confidence that PriorityMap is correct.
PriorityMap is a fast and easy-to-use data structure. I hope you enjoy it. Cut me a ticket if you see any issue.