June 30, 2015

Go: Slice search vs map lookup

Posted in Software at 07:09 by graham

tl;dr Use a map.

Computer science 101 tells us that maps are constant time access, O(1). But what is the constant? The map has to compute the hash, find the right bucket (array access), the right item within the bucket (another array access), and potentially do that multiple times as we walk a chain of buckets (if we overflowed the original bucket). At what point is it faster to iterate through an array comparing each item until we find the one we want?

Motivated by this comparison in C++ I decided to compare Go’s built-in map and slice types. The code is in a gist, a simple set of benchmark tests.

I tried two cases:

  • first a traditional key-value setup comparing map[string]string with []*Item{string,string}. The break-even point here is just five items. Under that the slice is faster, above it is slower.
  • second a set of integers, comparing map[int]struct{} with []int. The break-even point is ten items.

These results are similar to the C++ results. They mean we won’t be doing clever hacks in the name of performance. I call that good news. Use the obvious data structure and it will also be the right choice for performance.

The gory details

Here’s the numbers I get on my laptop running the gist. They are only relevant as relative values:

size map str slice str map int slice int

I tested with Go’s array (the [10]int things that we almost never use) but the results were similar to slice, so I didn’t include them. The map timings were very variable over the runs, up to 2x, so don’t read too much into that table. In the 1 or 2 item case the map occasionally beat the slice.

In the first, key-value case (map str vs slice str) I expect it’s the string comparison that costs us. Comparing strings is expensive. If we wanted to speed up the string comparison we would have to … hash the string!

Go’s map hashes strings with CPU AES instructions, if your processor has them, which it probably does. cat /proc/cpuinfo | egrep 'aes|sse2|sse4_1' will tell you. I assume using those instructions makes the hash function fast. Go also stores eight items per bucket, and has special-case access methods for string (mapaccess1_faststr) and int cases.

Even with such a highly optimized map, I don’t understand why the int (set) case is slower than the map at only 10 items. Here’s the relevant assembly for BenchmarkSetIntSlice. Each loop is two integers comparisons and a memory fetch. 10 items means up to 10 memory loads and 20 integer comparison, that’s all:

   if theArr[j] == q {
             # bounds check. rax is `j`, rdi is len(theArr)
  45b0b1:    cmp    %rdi,%rax
  45b0b4:    jae    45b104 <_access.BenchmarkSetIntSlice+0x144>
             # Fetch `theArr[j]`
  45b0b6:    lea    (%r9,%rax,8),%rbx
  45b0ba:    mov    (%rbx),%rbx
             # Compare the fetched value with `q` (rsi)
  45b0bd:    cmp    %rsi,%rbx
  45b0c0:    jne    45b0c9 <_access.BenchmarkSetIntSlice+0x109>
   found = true
             # We found it! rdx is `found`
  45b0c2:    mov    $0x1,%rdx

The map version is a little more involved, but only runs once. I can only assume that it’s the memory access (lea, mov), repeated ten times, that costs us. If you know different please let me know in the comments!


  1. c_harms said,

    June 23, 2017 at 16:07

    For mac OS, that is. Forgot to write that in my last comment.

  2. c_harms said,

    June 23, 2017 at 16:06

    In case anybody wondered: You can find out if your CPU supports AES instructions, use

    sysctl -a | grep machdep.cpu.features | egrep 'AES|SSE|SSE2|SSE3|SSE4'

  3. Shel said,

    February 25, 2016 at 12:09

    I’ve modified the gist to include a sorted search comparison. I sort a single time, but do include it in the timing. I’ve also got a run that takes the sort time out, so the binary search can be seen by itself.

    The binary search is slower if there are <=10 items, then straight linear search. Anything over 10 items and it is a better choice, if you can sort once. It still does not compete with a map’s hash lookup for anything over ~25 items.


  4. Brent Rowland said,

    June 30, 2015 at 23:12

    I appreciate seeing these kinds of numbers, even when they return approximately the expected results. When using integer keys and a relatively static data set, a binary search can be quite fast and considerably more memory efficient.

    For example, looking up glyph indices by code point in a TrueType font (https://github.com/rowland/leadtype/blob/master/ttf/cmap_table.go#L180) I’m seeing times in the neighborhood of 40 ns (https://github.com/rowland/leadtype/blob/master/ttf/cmap_table_test.go#L52) in a sparse dataset of a few thousand items.

  5. graham said,

    June 30, 2015 at 18:30

    @Jay Interesting question. It would definitely help, I don’t know if it would help enough to be faster than the map. We’d need to maintain the order on insert, so it would have to be a slice that changes rarely and is accessed a lot, to be worth it.

  6. Jay W. said,

    June 30, 2015 at 18:25

    This is an interesting post, though I’m wondering if binary search instead of a linear scan would change things.

Leave a Comment

Note: Your comment will only appear on the site once I approve it manually. This can take a day or two. Thanks for taking the time to comment.