`AtomicCounter` was slightly race-y on 32-bit platforms because it
increments the high `AtomicUsize` independently from the low
`AtomicUsize`, leading to a potential race where another thread
could observe the low increment but not the high increment and see
a value of 0 twice.
This isn't a big deal because (a) most platforms are 64-bit these
days, (b) 32-bit platforms aren't super likely to have their
counter overflow 32 bits anyway, and (c) the two writes are
back-to-back so having another thread read during that window is
very unlikely.
However, we can also optimize the counter somewhat by using the
`target_has_atomic = "64"` cfg flag, which we do here, allowing us
to use `AtomicU64` even on 32-bit platforms where 64-bit atomics
are available.
This changes some test behavior slightly, which requires
adaptation.