Almost Always Use std::vector

Common wisdom says to use a linked list when you expect many insertions and deletions. List insertion is $\mathcal{O}(1)$ — just allocate a node and update a couple of pointers. Vector insertion is $\mathcal{O}(n)$ — you must shift everything past the insertion point.

But there’s a catch: you have to find the insertion point first. And that search makes linked lists slower in practice.

I watched Bjarne Stroustrup’s talk Why You Should Avoid Linked Lists and tried to reproduce his benchmarks.

The experiment

Start with a sorted container of $N$ elements. Insert 1,000 randomly-chosen elements while keeping the container sorted. Measure total time.

For std::list: no random access, so we step through linearly:

auto it = c.begin();
const auto kEnd = c.end();
while (it != kEnd && *it < insertElement) {
  ++it;
}

For std::vector: we can use std::lower_bound for a binary search, since vectors have random-access iterators.

Results

Vectors outperform lists by roughly an order of magnitude across all sizes tested (clang 3.6 and gcc 4.8 gave similar results).

There is a noticeable slowdown for vectors right at the L3 cache boundary. Beyond that size, cache misses increase. Interestingly, the list shows no such inflection — because list nodes are scattered in memory, cache misses are equally likely at every size.

Why vectors win

Bjarne Stroustrup’s three principles explain it:

1. Don’t store data unnecessarily. A linked list stores at least one forward pointer per element, and two for a doubly-linked list. If you’re storing doubles, you’re using 3× as much memory as needed.

2. Keep data compact. More data per cache line means fewer cache misses. The vector’s contiguous layout lets the CPU hold more elements in L1/L2 cache simultaneously.

3. Access memory predictably. Even though vector insertion is $\mathcal{O}(n)$ — shifting elements forward — that’s a sequential memory write. CPUs are extremely good at this: they can prefetch ahead and pipeline the writes. The list’s $\mathcal{O}(n)$ traversal jumps around in memory, defeating the prefetcher.

The $\mathcal{O}(n)$ shift that sounds expensive is actually a call to memmove — one of the fastest operations a CPU can do.

The takeaway

Default to std::vector. Use a different container only when you’ve benchmarked and confirmed that the vector is genuinely the bottleneck. This is not an absolute rule — but it is the right default.