Make `find_route`'s `dist` map elements fit in 128 bytes
We'd previously aggressively cached elements in the
`PathBuildingHop` struct (and its sub-structs), which resulted in a
rather bloated size. This implied cache misses as we read from and
write to multiple cache lines during processing of a single
channel.
Here, we reduce caching in `DirectedChannelInfo`, fitting the
`(NodeId, PathBuildingHop)` tuple in exactly 128 bytes. While this
should fit in a single cache line, it sadly does not generally lie
in only two lines, as glibc returns large buffers from `malloc`
which are very well aligned, plus 16 bytes (for its own allocation
tracking). Thus, we try to avoid reading from the last 16 bytes of
a `PathBuildingHop`, but luckily that isn't super hard.
Note that here we make accessing
`DirectedChannelInfo::effective_capacity` somewhat slower, but
that's okay as its only ever done once per `DirectedChannelInfo`
anyway.
While our routing benchmarks are quite noisy, this appears to
result in between a 5% and 15% performance improvement in the
probabilistic scoring benchmarks.