Matt Corallo [Tue, 19 Apr 2022 21:46:44 +0000 (21:46 +0000)]
Do additional pre-flight checks before claiming a payment
As additional sanity checks, before claiming a payment, we check
that we have the full amount available in `claimable_htlcs` that
the payment should be for. Concretely, this prevents one
somewhat-absurd edge case where a user may receive an MPP payment,
wait many *blocks* before claiming it, allowing us to fail the
pending HTLCs and the sender to retry some subset of the payment
before we go to claim. More generally, this is just good
belt-and-suspenders against any edge cases we may have missed.
Matt Corallo [Wed, 4 May 2022 18:12:09 +0000 (18:12 +0000)]
Provide a redundant `Event::PaymentClaimed` on restart if needed
If we crashed during a payment claim and then detected a partial
claim on restart, we should ensure the user is aware that the
payment has been claimed. We do so here by using the new
partial-claim detection logic to create a `PaymentClaimed` event.
Matt Corallo [Mon, 18 Apr 2022 15:42:11 +0000 (15:42 +0000)]
Ensure all HTLCs for a claimed payment are claimed on startup
While the HTLC-claim process happens across all MPP parts under one
lock, this doesn't imply that they are claimed fully atomically on
disk. Ultimately, an application can crash after persisting one
`ChannelMonitorUpdate` out of multiple monitor updates needed for
the full claim.
Previously, this would leave us in a very bad state - because of
the all-channels-available check in `claim_funds` we'd refuse to
claim the payment again on restart (even though the
`PaymentReceived` event will be passed to the user again), and we'd
end up having partially claimed the payment!
The fix for the consistency part of this issue is pretty
straightforward - just check for this condition on startup and
complete the claim across all channels/`ChannelMonitor`s if we
detect it.
This still leaves us in a confused state from the perspective of
the user, however - we've actually claimed a payment but when they
call `claim_funds` we return `false` indicating it could not be
claimed.
Matt Corallo [Wed, 4 May 2022 17:49:09 +0000 (17:49 +0000)]
Store an `events::PaymentPurpose` with each claimable payment
In fc77c57c3c6e165d26cb5c1f5d1afee0ecd02589 we stopped using the
`FInalOnionHopData` in `OnionPayload::Invoice` directly and intend
to remove it eventually. However, in the next few commits we need
access to the payment secret when claimaing a payment, as we create
a new `PaymentPurpose` during the claim process for a new event.
In order to get access to a `PaymentPurpose` without having access
to the `FinalOnionHopData` we here change the storage of
`claimable_htlcs` to store a single `PaymentPurpose` explicitly
with each set of claimable HTLCs.
Matt Corallo [Wed, 4 May 2022 16:29:29 +0000 (16:29 +0000)]
Enable removal of `OnionPayload::Invoice::_legacy_hop_data` later
In fc77c57c3c6e165d26cb5c1f5d1afee0ecd02589 we stopped using the
`FinalOnionHopData` in `OnionPayload::Invoice` directly and renamed
it `_legacy_hop_data` with the intent of removing it in a few
versions. However, we continue to check that it was included in the
serialized data, meaning we would not be able to remove it without
breaking ability to serialize full `ChannelManager`s.
This fixes that by making the `_legacy_hop_data` an `Option` which
we will happily handle just fine if its `None`.
This update also includes a minor refactor. The return type of
`pending_monitor_events` has been changed to a `Vec` tuple with the
`OutPoint` type. This associates a `Vec` of `MonitorEvent`s with a
funding outpoint.
We've also renamed `source/sink_channel_id` to `prev/next_channel_id` in
the favour of clarity.
As the `counterparty_node_id` is now required to be passed back to the
`ChannelManager` to accept or reject an inbound channel request, the
documentation is updated to reflect that.
Matt Corallo [Mon, 9 May 2022 23:03:50 +0000 (23:03 +0000)]
Pull secp256k1 contexts from per-peer to per-PeerManager
Instead of including a `Secp256k1` context per
`PeerChannelEncryptor`, which is relatively expensive memory-wise
and nontrivial CPU-wise to construct, we should keep one for all
peers and simply reuse it.
This is relatively trivial so we do so in this commit.
Since its trivial to do so, we also take this opportunity to
randomize the new PeerManager context.
Matt Corallo [Wed, 6 Oct 2021 16:58:56 +0000 (16:58 +0000)]
Create a simple `FairRwLock` to avoid readers starving writers
Because we handle messages (which can take some time, persisting
things to disk or validating cryptographic signatures) with the
top-level read lock, but require the top-level write lock to
connect new peers or handle disconnection, we are particularly
sensitive to writer starvation issues.
Rust's libstd RwLock does not provide any fairness guarantees,
using whatever the OS provides as-is. On Linux, pthreads defaults
to starving writers, which Rust's RwLock exposes to us (without
any configurability).
Here we work around that issue by blocking readers if there are
pending writers, optimizing for readable code over
perfectly-optimized blocking.
Matt Corallo [Sun, 3 Oct 2021 21:44:52 +0000 (21:44 +0000)]
Wake reader future when we fail to flush socket buffer
This avoids any extra calls to `read_event` after a write fails to
flush the write buffer fully, as is required by the PeerManager
API (though it isn't critical).
Matt Corallo [Wed, 6 Oct 2021 04:45:07 +0000 (04:45 +0000)]
Limit blocked PeerManager::process_events waiters to two
Only one instance of PeerManager::process_events can run at a time,
and each run always finishes all available work before returning.
Thus, having several threads blocked on the process_events lock
doesn't accomplish anything but blocking more threads.
Here we limit the number of blocked calls on process_events to two
- one processing events and one blocked at the top which will
process all available events after the first completes.
Matt Corallo [Sat, 25 Sep 2021 22:24:23 +0000 (22:24 +0000)]
Avoid taking the peers write lock during event processing
Because the peers write lock "blocks the world", and happens after
each read event, always taking the write lock has pretty severe
impacts on parallelism. Instead, here, we only take the global
write lock if we have to disconnect a peer.
Matt Corallo [Wed, 6 Oct 2021 04:29:19 +0000 (04:29 +0000)]
[net-tokio] Call PeerManager::process_events without blocking reads
Unlike very ancient versions of lightning-net-tokio, this does not
rely on a single global process_events future, but instead has one
per connection. This could still cause significant contention, so
we'll ensure only two process_events calls can exist at once in
the next few commits.
Matt Corallo [Wed, 6 Oct 2021 06:58:15 +0000 (06:58 +0000)]
Process messages with only the top-level read lock held
Users are required to only ever call `read_event` serially
per-peer, thus we actually don't need any locks while we're
processing messages - we can only be processing messages in one
thread per-peer.
That said, we do need to ensure that another thread doesn't
disconnect the peer we're processing messages for, as that could
result in a peer_disconencted call while we're processing a
message for the same peer - somewhat nonsensical.
This significantly improves parallelism especially during gossip
processing as it avoids waiting on the entire set of individual
peer locks to forward a gossip message while several other threads
are validating gossip messages with their individual peer locks
held.
Matt Corallo [Fri, 30 Jul 2021 18:03:28 +0000 (18:03 +0000)]
Process messages from peers in parallel in `PeerManager`.
This adds the required locking to process messages from different
peers simultaneously in `PeerManager`. Note that channel messages
are still processed under a global lock in `ChannelManager`, and
most work is still processed under a global lock in gossip message
handling, but parallelizing message deserialization and message
decryption is somewhat helpful.
Add test that ensures that channels are closed with
`ClosureReason::DisconnectedPeer` if the peer disconnects before the
funding transaction has been broadcasted.
Set `ChannelUpdate` `htlc_maximum_msat` using the peer's value
Use the `counterparty_max_htlc_value_in_flight_msat` value, and not the
`holder_max_htlc_value_in_flight_msat` value when creating the
`htlc_maximum_msat` value for `ChannelUpdate` messages.
BOLT 7 specifies that the field MUST be less than or equal to
`max_htlc_value_in_flight_msat` received from the peer, which we
currently are not guaranteed to adhere to by using the holder value.
Add a config field
`ChannelHandshakeConfig::max_inbound_htlc_value_in_flight_percent_of_channel`
which sets the percentage of the channel value we cap the total value of
outstanding inbound HTLCs to.
This field can be set to a value between 1-100, where the value
corresponds to the percent of the channel value in whole percentages.
Note that:
* If configured to another value than the default value 10, any new
channels created with the non default value will cause versions of LDK
prior to 0.0.104 to refuse to read the `ChannelManager`.
* This caps the total value for inbound HTLCs in-flight only, and
there's currently no way to configure the cap for the total value of
outbound HTLCs in-flight.
* The requirements for your node being online to ensure the safety of
HTLC-encumbered funds are different from the non-HTLC-encumbered funds.
This makes this an important knob to restrict exposure to loss due to
being offline for too long. See
`ChannelHandshakeConfig::our_to_self_delay` and
`ChannelConfig::cltv_expiry_delta` for more information.
Default value: 10.
Minimum value: 1, any values less than 1 will be treated as 1 instead.
Maximum value: 100, any values larger than 100 will be treated as 100
instead.
Matt Corallo [Mon, 2 May 2022 20:45:17 +0000 (20:45 +0000)]
Reject outbound channels if the total reserve is larger than funding
In 2826af75a5761859dedcddc870de0753ae4ecde4 we fixed a fuzz crash
in which the total reserve values in a channel were greater than
the funding amount, checked when an incoming channel is accepted.
This, however, did not fix the same issue for outbound channels,
where a peer can accept a channel with a nonsense reserve value in
the `accept_channel` message. The `full_stack_target` fuzzer
eventually found its way into the same issue, which this resolves.
Thanks (again) to Chaincode Labs for providing the fuzzing
resources which found this bug!
Matt Corallo [Thu, 28 Apr 2022 17:10:04 +0000 (10:10 -0700)]
Avoid storing a full FinalOnionHopData in OnionPayload::Invoice
We only use it to check the amount when processing MPP parts, but
store the full object (including new payment metadata) in it.
Because we now store the amount in the parent structure, there is
no need for it at all in the `OnionPayload`. Sadly, for
serialization compatibility, we need it to continue to exist, at
least temporarily, but we can avoid populating the new fields in
that case.
Matt Corallo [Tue, 21 Dec 2021 22:10:43 +0000 (22:10 +0000)]
Store total payment amount in ClaimableHTLC explicitly
...instead of accessing it via the `OnionPayload::Invoice` form.
This may be useful if we add MPP keysend support, but is directly
useful to allow us to drop `FinalOnionHopData` from `OnionPayload`.
Matt Corallo [Mon, 2 May 2022 02:51:50 +0000 (02:51 +0000)]
Force-close channels on reorg only if the funding is unconfirmed
Currently, if a channel's funding is locked in and then later
reorg'd back to half of the channel's minimum-depth we will
immediately force-close the channel. However, this can happen at
the fork-point while processing a reorg, and generally reorgs do
not reduce the block height at all, making this a rather useless
endeavor.
Ideally we'd never auto-force-close channels at all due to a reorg,
instead simply marking it as inactive until the funding
transaction is re-confirmed (or allowing the user to attempt to
force-close or force-closing once we're confident we have
completed reorg processing if we're at risk of losing funds
already received in the channel).
Sadly, we currently do not support changing a channel's SCID and
updating our SCID maps, so we cannot yet remove the automated
force-close logic. Still, there is no reason to do it until a
funding transaction has been removed from the chain.
This implements that change - only force-closeing once a channel's
funding transaction has been reorg'd out (still potentially at a
reorg's fork point). This continues to imply a 1-confirmation
channel will always be force-closed after a reorg of the funding
transaction, and will imply a similar behavior with 0-conf
channels.
Matt Corallo [Thu, 28 Apr 2022 19:46:13 +0000 (19:46 +0000)]
Reject channels if the total reserves are larger than the funding
The `full_stack_target` fuzzer managed to find a subtraction
underflow in the new `Channel::get_htlc_maximum` function where we
subtract both sides' reserve values from the channel funding. Such
a channel is obviously completely useless, so we should reject it
during opening instead of integer-underflowing later.
Thanks to Chaincode Labs for providing the fuzzing resources which
found this bug!
Matt Corallo [Mon, 18 Apr 2022 02:10:44 +0000 (02:10 +0000)]
Do not force-close channels when we cannot communicate with peers
In general, we should never be automatically force-closing our
users' channels unless there is some immediate risk of funds loss
(ie because of some HTLC(s) which are timing out soon). In any
other case, we should trust the user to be able to figure out what
is going on and close their channels manually instead of trying to
be overly clever and automate closures if we think the channel is
useless.
In this case, even if a peer has some required feature that does
not allow us to communicate with them, there is a strong
possibility that some LDK upgrade may allow us to in the future. In
the mean time, there is no reason to go on-chain unless the user
needs funds immediately. In such a case, the user should already
have logic to force-close channels with peers which are not
available for any reason.
Matt Corallo [Mon, 25 Apr 2022 22:51:02 +0000 (22:51 +0000)]
Add test coverage for failure of inconsistent MPP parts
When we receive multiple HTLCs which claim to be a part of the same
MPP but which are inconsistent for some reason, we should fail the
inconsistent HTLCs but keep the first HTLCs up until the first
inconsistency.
This works, but it turns out there was no test coverage, so we add
some here.