Gursharan Singh [Fri, 8 Dec 2023 19:28:19 +0000 (11:28 -0800)]
Stop cleaning monitor updates on new block connect
Previously, we used to cleanup monitor updates at both consolidation
threshold and new block connects. With this change we will only
cleanup when our consolidation criteria is met. Also, we remove
monitor read from cleanup logic, in case of update consolidation.
Note: In case of channel-closing monitor update, we still need to
read the old monitor before persisting the new one in order to
determine the cleanup range.
Matt Corallo [Wed, 29 Nov 2023 00:33:16 +0000 (00:33 +0000)]
Drop half-life-based bucket decay in `update_history_buckets`
Because we decay the bucket information in the background, there's
not much reason to try to decay them immediately prior to updating,
and in removing that we can also clean up a good bit of dead code,
which we do here.
Matt Corallo [Wed, 29 Nov 2023 00:31:00 +0000 (00:31 +0000)]
Make scorer decay + persistence more frequent
There's some edge cases in our scoring when the information really
should be decayed but hasn't yet been prior to an update. Rather
than try to fix them exactly, we instead decay the scorer a bit
more often, which largely solves them but also gives us a bit more
accurate bounds on our channels, allowing us to reuse channels at
a similar amount to what just failed immediately, but at a
substantial penalty.
Matt Corallo [Thu, 12 Oct 2023 18:23:51 +0000 (18:23 +0000)]
Drop warning about mixing `no-std` and `std` `ProbabilisticScorer`s
Now that the serialization format of `no-std` and `std`
`ProbabilisticScorer`s both just use `Duration` since UNIX epoch
and don't care about time except when decaying, we don't need to
warn users to not mix the scorers across `no-std` and `std` flags.
Matt Corallo [Mon, 9 Oct 2023 01:52:20 +0000 (01:52 +0000)]
Drop now-unused `T: Time` bound on `ProbabilisticScorer`
Now that we don't access time via the `Time` trait in
`ProbabilisticScorer`, we can finally drop the `Time` bound
entirely, removing the `ProbabilisticScorerUsingTime` and type
alias indirection and replacing it with a simple struct.
Matt Corallo [Mon, 9 Oct 2023 01:44:33 +0000 (01:44 +0000)]
Use `Duration` based time info in scoring rather than `Time`
In the coming commits, the `T: Time` bound on `ProbabilisticScorer`
will be removed. In order to enable that, we need to switch over to
using the `ScoreUpdate`-provided current time (as a `Duration`
since the unix epoch), making the `T` bound entirely unused.
Matt Corallo [Mon, 9 Oct 2023 01:15:18 +0000 (01:15 +0000)]
Pipe `Duration`-based time information through scoring pipeline
In the coming commits, the `T: Time` bound on `ProbabilisticScorer`
will be removed. In order to enable that, we need to pass the
current time (as a `Duration` since the unix epoch) through the
score updating pipeline, allowing us to keep the
`*last_updated_time` fields up-to-date as we go.
Matt Corallo [Wed, 29 Nov 2023 03:07:54 +0000 (03:07 +0000)]
Update history bucket last_update time immediately on update
Now that we aren't decaying during scoring, when we set the
last_updated time in the history bucket logic doesn't matter, so
we should just update it when we've just updated the history
buckets.
Matt Corallo [Mon, 9 Oct 2023 01:11:10 +0000 (01:11 +0000)]
Stop decaying liquidity information during bounds-based scoring
Because scoring is an incredibly performance-sensitive operation,
doing liquidity information decay (and especially fetching the
current time!) during scoring isn't really a great idea. Now that
we decay liquidity information in the background, we don't have any
reason to decay during scoring, and we ultimately remove it
entirely here.
Matt Corallo [Mon, 9 Oct 2023 02:14:21 +0000 (02:14 +0000)]
Stop decaying historical liquidity information during scoring
Because scoring is an incredibly performance-sensitive operation,
doing liquidity information decay (and especially fetching the
current time!) during scoring isn't really a great idea. Now that
we decay liquidity information in the background, we don't have any
reason to decay during scoring, and we remove the historical bucket
liquidity decaying here.
Matt Corallo [Mon, 2 Oct 2023 20:07:21 +0000 (20:07 +0000)]
Impl decaying in `ProbabilisticScorer::decay_liquidity_certainty`
This implements decaying in the `ProbabilisticScorer`'s
`ScoreLookup::decay_liquidity_certainty` implementation, using
floats for accuracy since we're no longer particularly
time-sensitive. Further, it (finally) removes score entries which
have decayed to zero.
Matt Corallo [Mon, 2 Oct 2023 19:44:36 +0000 (19:44 +0000)]
Track historical liquidity update time separately from the bounds
In the next commit, we'll start to use the new
`ScoreUpdate::decay_liquidity_certainty` to decay our bounds in the
background. This will result in the `last_updated` field getting
updated regularly on decay, rather than only on update. While this
isn't an issue for the regular liquidity bounds, it poses a problem
for the historical liquidity buckets, which are decayed on a
separate (and by default much longer) timer. If we didn't move to
tracking their decays separately, we'd never let the `last_updated`
field get old enough for the historical buckets to decay at all.
Instead, here we introduce a new `Duration` in the
`ChannelLiquidity` which tracks the last time the historical
liquidity buckets were last updated. We initialize it to a copy of
`last_updated` on deserialization if it is missing.
Matt Corallo [Mon, 2 Oct 2023 19:14:26 +0000 (19:14 +0000)]
Add a scoring decay method to the `ScoreUpdate` trait
Rather than relying on fetching the current time during
routefinding, here we introduce a new trait method to `ScoreUpdate`
to do so. This largely mirrors what we do with the `NetworkGraph`,
and allows us to take on much more expensive operations (floating
point exponentiation) in our decaying.
Matt Corallo [Wed, 13 Dec 2023 22:55:32 +0000 (22:55 +0000)]
cfg-gate async signing logic
We are intending to release without having completed our async
signing logic, which sadly means we need to cfg-gate it to ensure
we restore the previous state of panicking on signer errors, rather
than putting us in a stuck state with no way to recover.
Matt Corallo [Wed, 1 Nov 2023 01:16:12 +0000 (01:16 +0000)]
Depend on `libm` in `no-std` for `powf`(64)
In the next commits we'll need `f64`'s `powf`, which is only
available in `std`. For `no-std`, here we depend on `libm` (a
`rust-lang` org project), which we can use for `powf`.
Add redundant blinded HTLC failure check for posterity.
Although this new check is unreachable right now, it helps prevent potential
future errors where we incorrectly fail blinded HTLCs with an unblinded error.
If a blinded HTLC errors when added to a Channel, such as if the recipient has
already sent a shutdown message, they should malformed-fail backwards with
error code INVALID_ONION_BLINDING and a zeroed out onion hash per BOLT 4.
Fix blinded recipient fail on receive reqs violation
If a blinded HTLC does not satisfy the receiver's requirements, e.g. bad CLTV
or amount, they should malformed-fail backwards with error code
INVALID_ONION_BLINDING and a zeroed out onion hash per BOLt 4.
Fix blinded recipient fail on onion decode failure
If a recipient behind a multihop blinded path fails to decode their onion
payload, they should fail backwards with error code INVALID_ONION_BLINDING and
a zeroed out onion hash per BOLT 4.
And use it in the multihop blinded path receive failure test. Will be used in
the next commit to test receiving an invalid blinded final onion payload.
We can't use the existing get_route test util here because blinded payments
rely on the sender adding a random shadow CLTV offset to the final hop; without
this the payment will be failed with cltv-expiry-too-soon.
If a blinded recipient to a multihop blinded path needs to fail back a
malformed HTLC, they should use error code INVALID_ONION_BLINDING and a zeroed
out onion hash per BOLT 4.
Support failing blinded non-intro HTLCs after RAA processing.
If an HTLC fails after its RAA is processed, it is failed back with
ChannelManager::fail_htlc_backwards_internal. This method will now correctly
inform the channel that this HTLC is blinded and to construct an
update_malformed message accordingly.
Adapt Channel::fail_htlc for failing with malformed OR update_fail_htlc.
Useful for failing blinded payments back with malformed, and will also be
useful in the future when we move onion decoding into
process_pending_htlc_forwards, after which Channel::fail_htlc will be used for
all malformed htlcs.
Set up Channel::fail_htlc to be able to return update_malformed
Currently it returns only update_fail, but we'll want it to be able to return
update_malformed as well in upcoming commits. We'll use this for correctly
failing blinded received HTLCs backwards with malformed and
invalid_onion_blinding error per BOLT 4.
Channel: add holding cell HTLC variant for blinded HTLCs.
For context, blinded HTLCs where we are not the intro node must always be
failed back with malformed and invalid_onion_blinding error per BOLT 4.
Prior to supporting blinded payments, the only way for an update_malformed to
be returned from Channel was if an onion was actually found to be malformed
during initial update_add processing. This meant that any malformed HTLCs would
never live in the holding cell but instead would be returned directly upon
initial RAA processing.
Now, we need to be able to store these HTLCs in the holding cell because the
HTLC failure necessitating an update_malformed may come long after the RAA is
initially processed, and we may not be a state to send the update_malformed
message at that time.
Therefore, add a new holding cell HTLC variant for blinded non-intro node
HTLCs, which will signal to Channel to fail with malformed and the correct
error code.
Set HTLCPreviousHopData::blinded for blinded received HTLCs.
Will be used in the next commit(s) to let us know to fail blinded received
HTLCs backwards with the malformed and invalid_onion_blinding error per BOLT 4.
Jeffrey Czyz [Tue, 12 Dec 2023 14:46:10 +0000 (08:46 -0600)]
Relax OnionMessenger::peer_disconnected assertion
When a peer is connected, OnionMessenger tracks it only if it supports
onion messages. On disconnect, we debug_assert that the peer was in a
state ConnectedPeer, failing when it is in the PendingConnection state.
However, we were mistakenly asserting for peers that we were not
tracking (i.e., that don't support onion messages). Relax the check to
not fail on the latter.
Wilmer Paulino [Fri, 3 Nov 2023 19:43:06 +0000 (12:43 -0700)]
Refactor commitment broadcast to always go through OnchainTxHandler
Currently, our holder commitment broadcast only goes through the
`OnchainTxHandler` for anchor outputs channels because we can actually
bump the commitment transaction fees with it. For non-anchor outputs
channels, we would just broadcast once directly via the
`ChannelForceClosed` monitor update, without going through the
`OnchainTxHandler`.
As we add support for async signing, we need to be tolerable to signing
failures. A signing failure of our holder commitment will currently
panic, but once the panic is removed, we must be able to retry signing
once the signer is available. We can easily achieve this via the
existing `OnchainTxHandler::rebroadcast_pending_claims`, but this
requires that we first queue our holder commitment as a claim. This
commit ensures we do so everywhere we need to broadcast a holder
commitment transaction, regardless of the channel type.
Wilmer Paulino [Tue, 31 Oct 2023 08:12:58 +0000 (01:12 -0700)]
Cancel previous commitment claims on newly confirmed commitment
Once a commitment transaction is broadcast/confirms, we may need to
claim some of the HTLCs in it. These claims are sent as requests to the
`OnchainTxHandler`, which will bump their feerate as they remain
unconfirmed. When said commitment transaction becomes unconfirmed
though, and another commitment confirms instead, i.e., a reorg happens,
the `OnchainTxHandler` doesn't have any insight into whether these
claims are still valid or not, so it continues attempting to claim the
HTLCs from the previous commitment (now unconfirmed) forever, along with
the HTLCs from the newly confirmed commitment.
Improving block conenction logging and filtered txids
Implement the Display trait for Outpoint and utilize it in the codebase for monitoring outpoints.
Additionally, add log tracing for best_block_update and confirmed transactions.
solves #2348
Matt Corallo [Fri, 8 Dec 2023 23:05:37 +0000 (23:05 +0000)]
Add CI test that `#[cfg]` tags are from a defined set
Rust is fairly relaxed in checking the validity of arguments
passed to #[cfg]. While it should probably be more strict when
checking features, it cannot be strict when checking loose cfg
tags, because those can be anything and are simply passed to rustc
via unconstrained arguments.
Thus, we do it for rustc manually, but scanning all our source and
checking that all our cfg tags match a known cfg tag.
Error if onion payloads exceed max length on packet construction.
Ensure that if we call construct_onion_packet and friends where payloads are
too large for the allotted packet length, we'll fail to construct. Previously,
senders would happily construct invalid packets by array-shifting the final
node's HMAC out of the packet when adding an intermediate onion layer, causing
the receiver to error with "final payload provided for us as an intermediate
node."
Fix debug panic in onion utils on large custom TLVs or metadata.
We previously assumed that the final node's payload would be ~93 bytes, and had
code to ensure that the filler encoded after that payload is not all 0s. Now
with custom TLVs and metadata supported, the final node's payload may take up
the entire onion packet, so we can't assume that there are 64 bytes of filler
to check.
Matt Corallo [Wed, 6 Dec 2023 05:29:28 +0000 (05:29 +0000)]
Pre-calculate heap element scores (retaining RouteGraphNode size)
`RouteGraphNode` currently recalculates scores in its `Ord`
implementation, wasting time while sorting the main Dijkstra's
heap.
Further, some time ago, when implementing the `htlc_maximum_msat`
amount reduction while walking the graph, we added
`PathBuildingHop::was_processed`, looking up the source node in
`dist` each time we pop'ed an element off of the binary heap.
As a result, we now have a reference to our `PathBuildingHop` when
processing a best-node's channels, leading to several fields in
`RouteGraphNode` being entirely redundant.
Here we drop those fields, but add a pre-calculated score field,
as well as force a suboptimal `RouteGraphNode` layout, retaining
its existing 64 byte size.
Without the suboptimal layout, performance is very mixed, but with
it performance is mostly improved, by around 10% in most tests.
Matt Corallo [Wed, 6 Dec 2023 05:02:07 +0000 (05:02 +0000)]
Reorder `PathBuildingHop` fields somewhat
Given `PathBuildingHop` is now an even multiple of cache lines, we
can pick which fields "fall off" the cache line we have visible
when dealing with hops, which we do here.
Matt Corallo [Wed, 6 Dec 2023 06:02:37 +0000 (06:02 +0000)]
Make `find_route`'s `dist` map elements fit in 128 bytes
We'd previously aggressively cached elements in the
`PathBuildingHop` struct (and its sub-structs), which resulted in a
rather bloated size. This implied cache misses as we read from and
write to multiple cache lines during processing of a single
channel.
Here, we reduce caching in `DirectedChannelInfo`, fitting the
`(NodeId, PathBuildingHop)` tuple in exactly 128 bytes. While this
should fit in a single cache line, it sadly does not generally lie
in only two lines, as glibc returns large buffers from `malloc`
which are very well aligned, plus 16 bytes (for its own allocation
tracking). Thus, we try to avoid reading from the last 16 bytes of
a `PathBuildingHop`, but luckily that isn't super hard.
Note that here we make accessing
`DirectedChannelInfo::effective_capacity` somewhat slower, but
that's okay as its only ever done once per `DirectedChannelInfo`
anyway.
While our routing benchmarks are quite noisy, this appears to
result in between a 5% and 15% performance improvement in the
probabilistic scoring benchmarks.
Matt Corallo [Wed, 6 Dec 2023 03:54:28 +0000 (03:54 +0000)]
Make `CandidateRouteHop::PrivateHop::target_node_id` a reference
This avoids bloating `CandidateRouteHop` with a full 33-byte
node_id (and avoids repeated public key serialization when we do
multiple pathfinding passes).
Matt Corallo [Wed, 6 Dec 2023 17:47:00 +0000 (17:47 +0000)]
Simplify and make scoring calls in `TestRouter` more complete
`TestRouter` tries to make scoring calls that mimic what an actual
router would do, but the changes in f0ecc3ec73dcdb9303b1bd5ac687a36
failed to make scoring calls for private hints or if we take a
public hop for the last hop.
This fixes those regressions, though no tests currently depend on
this behavior.
Matt Corallo [Wed, 6 Dec 2023 01:17:48 +0000 (01:17 +0000)]
`#[inline]` `CandidateRouteHop` accessors
These are used in the performance-critical routing and scoring
operations, which may happen outside of our crate. Thus, we really
need to allow downstream crates to inline these accessors into
their code, which we do here.
Matt Corallo [Wed, 6 Dec 2023 01:13:33 +0000 (01:13 +0000)]
Privatise `CandidateRouteHop::short_channel_id` as its a footgun
Short channel "ID"s are not globally unique when they come from a
BOLT 11 route hint or a first hop (which can be an outbound SCID
alias). In those cases, its rather confusing that we have a
`short_channel_id` method which mixes them all together, and even
more confusing that we have a `CandidateHopId` which is not, in
fact returning a unique identifier.
In our routing logic this is mostly fine - the cost of a collision
isn't super high and we should still do just fine finding a route,
however the same can't be true for downstream users, as they may or
may not rely on the apparent guarantees.
Thus, here, we privatise the SCID and id accessors.
Matt Corallo [Wed, 6 Dec 2023 17:48:51 +0000 (17:48 +0000)]
Fix and re-enable the `remembers_historical_failures` test
f0ecc3ec73dcdb9303b1bd5ac687a361decce2dd introduced a regression in
the `remembers_historical_failures` test, and disabled it by simply
removing the `#[test]` annotation. This fixes the test and marks it
as a test again.
Elias Rohrer [Fri, 8 Dec 2023 11:29:38 +0000 (12:29 +0100)]
Manually download `bitcoind`/`electrs` for CI tests
Previously, we used the auto-download feature of the
`electrsd`/`bitcoind` crates. While convenient, they unnecessarily
introduced a lot of dependecies (`zip`, `zstd`, `time`, etc.) to our
test environment which needed pinning for the MSRV of 1.63.
Here, we introduce a new `no_download` config flag to the
`lightning-transaction-sync` crate allowing us to disable this
auto-download feature in CI, where we now opt to download the
corresponding binaries manually. We keep the default-auto-download as a
convenience feature for running tests locally though.
Jeffrey Czyz [Fri, 8 Dec 2023 04:44:58 +0000 (22:44 -0600)]
Return correct SendSuccess in OnionMessenger
When enqueuing a message for a node already awaiting a connection,
BufferedAwaitingConnection should be returned when a node is not yet
connected as a peer. However, it was only returned when the first
message was enqueued. Any messages enqueued after but before a
connection was established incorrectly returned Buffered.
Matt Corallo [Wed, 29 Nov 2023 06:02:46 +0000 (06:02 +0000)]
Immediately error in `close_channel_internal` if there is no chan
Previously, unfunded channels would be stored outside of
`PeerState::channel_by_id`, and thus if there is no channel when
we look in `PeerState::channel_by_id`, `close_channel_internal`
called `force_close_channel_with_peer` to hunt for unfunded
channels.
However, that is no longer the case, so the call is redundant, and
we can simply return an error instead.
Matt Corallo [Wed, 29 Nov 2023 18:11:30 +0000 (18:11 +0000)]
Move pre-funded-channel immediate shutdown logic to the right place
Because a `Funded` `Channel` cannot possibly be pre-funding, the
logic in `ChannelManager::close_channel_internal` to handle
pre-funding channels is in the wrong place.
Rather than being handled inside the `Funded` branch, it should be
in an `else` following it, handling either of the two
`ChannelPhases` outside of `Funded`.
Sadly, because of a previous control flow management `loop {}`, the
existing code will infinite loop, which is fixed here.