Jeffrey Czyz [Thu, 5 Jan 2023 17:50:24 +0000 (11:50 -0600)]
Fix scorer panic when available capacity is zero
ProbabilisticScorer takes a ChannelUsage when computing a penalty for a
channel. The formula for calculating the liquidity penalty reduces the
maximum capacity by the amount of in-flight HTLCs (available capacity)
and adds one to prevent division by zero.
However, since the available capacity is passed to
DirectedChannelLiquidity as the capacity, other penalty formulas may use
the available (i.e., reduced) capacity inadvertently. In practice, this
has two ramifications for the historical liquidity penalty computation:
1. The bucket formula doesn't have a consistent denominator for a given
channel.
2. The bucket formula may divide by zero when the in-flight HTLC amount
equals or exceeds the effective capacity.
Fixing this involves only using the available capacity when appropriate.
Matt Corallo [Sun, 26 Feb 2023 20:22:28 +0000 (20:22 +0000)]
Make sure individual mutexes are constructed on different lines
Our lockdep logic (on Windows) identifies a mutex based on which
line it was constructed on. Thus, if we have two mutexes
constructed on the same line it will generate false positives.
Matt Corallo [Wed, 22 Feb 2023 22:54:38 +0000 (22:54 +0000)]
Disallow taking two instances of the same mutex at the same time
Taking two instances of the same mutex may be totally fine, but it
requires a total lockorder that we cannot (trivially) check. Thus,
its generally unsafe to do if we can avoid it.
To discourage doing this, here we default to panicing on such locks
in our lockorder tests, with a separate lock function added that is
clearly labeled "unsafe" to allow doing so when we can guarantee a
total lockorder.
This requires adapting a number of sites to the new API, including
fixing a bug this turned up in `ChannelMonitor`'s `PartialEq` where
no lockorder was guaranteed.
Matt Corallo [Thu, 2 Feb 2023 22:38:54 +0000 (22:38 +0000)]
Refuse recursive read locks in lockorder testing
Our existing lockorder tests assume that a read lock on a thread
that is already holding the same read lock is totally fine. This
isn't at all true. The `std` `RwLock` behavior is
platform-dependent - on most platforms readers can starve writers
as readers will never block for a pending writer. However, on
platforms where this is not the case, one thread trying to take a
write lock may deadlock with another thread that both already has,
and is attempting to take again, a read lock.
Worse, our in-tree `FairRwLock` exhibits this behavior explicitly
on all platforms to avoid the starvation issue.
Thus, we shouldn't have any special handling for allowing recursive
read locks, so we simply remove it here.
Matt Corallo [Wed, 22 Feb 2023 22:10:46 +0000 (22:10 +0000)]
Don't `per_peer_state` read locks recursively in monitor updating
When handling a `ChannelMonitor` update via the new
`handle_new_monitor_update` macro, we always call the macro with
the `per_peer_state` read lock held and have the macro drop the
per-peer state lock. Then, when handling the resulting updates, we
may take the `per_peer_state` read lock again in another function.
In a coming commit, recursive read locks will be disallowed, so we
have to drop the `per_peer_state` read lock before calling
additional functions in `handle_new_monitor_update`, which we do
here.
Matt Corallo [Fri, 3 Feb 2023 00:46:50 +0000 (00:46 +0000)]
Expect callers to hold read locks before `channel_monitor_updated`
Our existing lockorder tests assume that a read lock on a thread
that is already holding the same read lock is totally fine. This
isn't at all true. The `std` `RwLock` behavior is
platform-dependent - on most platforms readers can starve writers
as readers will never block for a pending writer. However, on
platforms where this is not the case, one thread trying to take a
write lock may deadlock with another thread that both already has,
and is attempting to take again, a read lock.
Worse, our in-tree `FairRwLock` exhibits this behavior explicitly
on all platforms to avoid the starvation issue.
Sadly, a user ended up hitting this deadlock in production in the
form of a call to `get_and_clear_pending_msg_events` which holds
the `ChannelManager::total_consistency_lock` before calling
`process_pending_monitor_events` and eventually
`channel_monitor_updated`, which tries to take the same read lock
again.
Luckily, the fix is trivial, simply remove the redundand read lock
in `channel_monitor_updated`.
Matt Corallo [Fri, 3 Feb 2023 00:33:27 +0000 (00:33 +0000)]
Hold the `total_consistency_lock` while in `outbound_payment` fns
We previously avoided holding the `total_consistency_lock` while
doing crypto operations to build onions. However, now that we've
abstracted out the outbound payment logic into a utility module,
ensuring the state is consistent at all times is now abstracted
away from code authors and reviewers, making it likely to break.
Further, because we now call `send_payment_along_path` both with,
and without, the `total_consistency_lock`, and because recursive
read locks may deadlock, it would now be quite difficult to figure
out which paths through `outbound_payment` need the lock and which
don't.
While it may slow writes somewhat, it's not really worth trying to
figure out this mess, instead we just hold the
`total_consistency_lock` before going into `outbound_payment`
functions.
Matt Corallo [Mon, 6 Feb 2023 22:12:09 +0000 (22:12 +0000)]
Remove the `final_cltv_expiry_delta` in `RouteParameters` entirely
fbc08477e8dcdd8f3f2ada8ca77388b6185febe2 purported to "move" the
`final_cltv_expiry_delta` field to `PaymentParamters` from
`RouteParameters`. However, for naive backwards-compatibility
reasons it left the existing on in place and only added a new,
redundant field in `PaymentParameters`.
It turns out there's really no reason for this - if we take a more
critical eye towards backwards compatibility we can figure out the
correct value in every `PaymentParameters` while deserializing.
We do this here - making `PaymentParameters` a `ReadableArgs`
taking a "default" `cltv_expiry_delta` when it goes to read. This
allows existing `RouteParameters` objects to pass the read
`final_cltv_expiry_delta` field in to be used if the new field
wasn't present.
Matt Corallo [Mon, 6 Feb 2023 21:56:39 +0000 (21:56 +0000)]
Support `ReadableArgs` types across in the TLV struct serialization
This adds `required` support for trait-wrapped reading (e.g. for
objects read via `ReadableArgs`) as well as support for the
trait-wrapped reading syntax across the TLV struct/enum
serialization macros.
Matt Corallo [Mon, 6 Feb 2023 21:43:10 +0000 (21:43 +0000)]
Require a non-0 number of non-empty paths when deserializing routes
When we read a `Route` (or a list of `RouteHop`s), we should never
have zero paths or zero `RouteHop`s in a path. As such, its fine to
simply reject these at deserialization-time. Technically this could
lead to something which we can generate not round-trip'ing
serialization, but that seems okay here.
Matt Corallo [Mon, 27 Feb 2023 17:32:07 +0000 (17:32 +0000)]
Move `lightning-transaction-sync` out of the main workspace
Because `lightning-transaction-sync` does not have an MSRV (and
because its dev-dependencies are huge), we can't build it by
default when devs run `cargo test`, so it is moved out of the
top-level workspace.
Fix upgradable_required fields to actually be required in lower level macros
When using lower level macros such as read_tlv_stream, upgradable_required
fields have been treated as regular options. This is incorrect, they should
either be upgradable_options or treated as required fields.
This field was previous useful in manual retries for users to know when all
paths of a payment have failed and it is safe to retry. Now that we support
automatic retries in ChannelManager and no longer support manual retries, the
field is no longer useful.
For backwards compat, we now always write false for this field. If we didn't do
this, previous versions would default this field's value to true, which can be
problematic because some clients have relied on the field to indicate when a
full payment retry is safe.
Jeffrey Czyz [Thu, 16 Feb 2023 02:17:18 +0000 (20:17 -0600)]
Fix amount overflow in Invoice building
An overflow can occur when multiplying the offer amount by the requested
quantity when no amount is given in the request. Return an error instead
of overflowing.
Jeffrey Czyz [Wed, 15 Feb 2023 22:43:41 +0000 (16:43 -0600)]
Fix amount overflow in Offer parsing and building
An overflow can occur when multiplying the offer amount by the requested
quantity when checking if the given amount is enough. Return an error
instead of overflowing.
Jeffrey Czyz [Thu, 9 Feb 2023 17:09:23 +0000 (11:09 -0600)]
Fuzz test for bech32 decoding
Fuzz testing bech32 decoding along with deserializing the underlying
message can result in overly exhaustive searches. Instead, the message
deserializations are now fuzzed separately. Add fuzzing for bech32
decoding.
Jeffrey Czyz [Thu, 9 Feb 2023 16:59:11 +0000 (10:59 -0600)]
Expose Bech32Encode trait for fuzzing
In order to fuzz test Bech32Encode parsing independent of the underlying
message deserialization, the trait needs to be exposed. Conditionally
expose it only for fuzzing.
Jeffrey Czyz [Fri, 20 Jan 2023 22:30:45 +0000 (16:30 -0600)]
Fuzz test for parsing Invoice
An invoice is serialized as a TLV stream and encoded as bytes. Add a
fuzz test that parses the TLV stream and deserializes the underlying
Invoice. Then compare the original bytes with those obtained by
re-serializing the Invoice.
Jeffrey Czyz [Fri, 20 Jan 2023 19:34:34 +0000 (13:34 -0600)]
Fuzz test for parsing InvoiceRequest
An invoice request is serialized as a TLV stream and encoded as bytes.
Add a fuzz test that parses the TLV stream and deserializes the
underlying InvoiceRequest. Then compare the original bytes with those
obtained by re-serializing the InvoiceRequest.
Matt Corallo [Thu, 9 Feb 2023 19:20:22 +0000 (19:20 +0000)]
Remove genesis block hash from public API
Forcing users to pass a genesis block hash has ended up being
error-prone largely due to byte-swapping questions for bindings
users. Further, our API is currently inconsistent - in
`ChannelManager` we take a `Bitcoin::Network` but in `NetworkGraph`
we take the genesis block hash.
Luckily `NetworkGraph` is the only remaining place where we require
users pass the genesis block hash, so swapping it for a `Network`
is a simple change.
Prior to this, we returned PaymentSendFailure from auto retry send payment
methods. This implied that we might return a PartialFailure from them, which
has never been the case. So it makes sense to rework the errors to be a better
fit for the methods.
We're taking error handling in a totally different direction now to make it
more asynchronous, see send_payment_internal for more information.
Matt Corallo [Sun, 19 Feb 2023 00:13:51 +0000 (00:13 +0000)]
Don't generate a `ChannelMonitorUpdate` for closed chans on shutdown
The `Channel::get_shutdown` docs are very clear - if the channel
jumps to `Shutdown` as a result of not being funded when we go to
initiate shutdown we should not generate a `ChannelMonitorUpdate`
as there's no need to bother with the shutdown script - we're
force-closing anyway.
However, this wasn't actually implemented, potentially causing a
spurious monitor update for no reason.
Matt Corallo [Sat, 3 Dec 2022 03:15:04 +0000 (03:15 +0000)]
Use the new monitor persistence flow for `funding_created` handling
Building on the previous commits, this finishes our transition to
doing all message-sending in the monitor update completion
pipeline, unifying our immediate- and async- `ChannelMonitor`
update and persistence flows.
Matt Corallo [Mon, 6 Feb 2023 23:03:38 +0000 (23:03 +0000)]
Use new monitor persistence flow in funding_signed handling
In the previous commit, we moved all our `ChannelMonitorUpdate`
pipelines to use a new async path via the
`handle_new_monitor_update` macro. This avoids having two message
sending pathways and simply sends messages in the "monitor update
completed" flow, which is shared between sync and async monitor
updates.
Here we reuse the new macro for handling `funding_signed` messages
when doing an initial `ChannelMonitor` persistence. This provides
a similar benefit, simplifying the code a trivial amount, but
importantly allows us to fully remove the original
`handle_monitor_update_res` macro.
Matt Corallo [Wed, 11 Jan 2023 21:37:57 +0000 (21:37 +0000)]
Always process `ChannelMonitorUpdate`s asynchronously
We currently have two codepaths on most channel update functions -
most methods return a set of messages to send a peer iff the
`ChannelMonitorUpdate` succeeds, but if it does not we push the
messages back into the `Channel` and then pull them back out when
the `ChannelMonitorUpdate` completes and send them then. This adds
a substantial amount of complexity in very critical codepaths.
Instead, here we swap all our channel update codepaths to
immediately set the channel-update-required flag and only return a
`ChannelMonitorUpdate` to the `ChannelManager`. Internally in the
`Channel` we store a queue of `ChannelMonitorUpdate`s, which will
become critical in future work to surface pending
`ChannelMonitorUpdate`s to users at startup so they can complete.
This leaves some redundant work in `Channel` to be cleaned up
later. Specifically, we still generate the messages which we will
now ignore and regenerate later.
This commit updates the `ChannelMonitorUpdate` pipeline across all
the places we generate them.
Matt Corallo [Sat, 3 Dec 2022 05:38:24 +0000 (05:38 +0000)]
Move TODO from `handle_monitor_update_res` into `Channel`
The TODO mentioned in `handle_monitor_update_res` about how we
might forget about HTLCs in case of permanent monitor update
failure still applies in spite of all our changes. If a channel is
drop'd in general, monitor-pending updates may be lost if the
monitor update failed to persist.
This was always the case, and is ultimately the general form of the
the specific TODO, so we simply leave comments there
Matt Corallo [Fri, 27 Jan 2023 06:14:18 +0000 (06:14 +0000)]
Handle `MonitorUpdateCompletionAction`s after monitor update sync
In a previous PR, we added a `MonitorUpdateCompletionAction` enum
which described actions to take after a `ChannelMonitorUpdate`
persistence completes. At the time, it was only used to execute
actions in-line, however in the next commit we'll start (correctly)
leaving the existing actions until after monitor updates complete.
Matt Corallo [Thu, 26 Jan 2023 04:47:25 +0000 (04:47 +0000)]
Limit the number of pending un-funded inbound channel
Because we store some (not large, but not zero) state per-peer,
it's useful to limit the number of peers we have connected, at
least with some buffer.
Much more importantly, each channel has a relatively large cost,
especially around the `ChannelMonitor`s we have to build for each.
Thus, here, we limit the number of channels per-peer which aren't
(yet) on-chain, as well as limit the number of (inbound) peers
which don't have a (funded-on-chain) channel.
Matt Corallo [Tue, 21 Feb 2023 19:10:43 +0000 (19:10 +0000)]
Remove the `peer_disconnected` `no_connection_possible` flag
Long ago, we used the `no_connection_possible` to signal that a
peer has some unknown feature set or some other condition prevents
us from ever connecting to the given peer. In that case we'd
automatically force-close all channels with the given peer. This
was somewhat surprising to users so we removed the automatic
force-close, leaving the flag serving no LDK-internal purpose.
Distilling the concept of "can we connect to this peer again in the
future" to a simple flag turns out to be ripe with edge cases, so
users actually using the flag to force-close channels would likely
cause surprising behavior.
Thus, there's really not a lot of reason to keep the flag,
especially given its untested and likely to be broken in subtle
ways anyway.
Matt Corallo [Wed, 15 Feb 2023 01:23:20 +0000 (01:23 +0000)]
Correct `funding_transaction_generated` err msg and fix fuzz check
This fixes new errors in `full_stack_target` pointed out by
Chaincode's generous fuzzing infrastructure. Specifically, there's
no reason to check the error message in the
`funding_transaction_generated` return value - it can only return
a failure if the channel has closed since the funding transaction
was generated (which is fine) or if the signer refuses to sign
(which can't happen in fuzzing).
Matt Corallo [Wed, 15 Feb 2023 01:20:38 +0000 (01:20 +0000)]
Correct the "is peer live" checks in `PeerManager`
In general, we should be checking if a `Peer` has `their_features`
set as the "is this peer connected and have they finished the
handshake" flag as it indicates an `Init` message was received.
While none of these appear to be reachable bugs, there were a
number of places where we checked other flags for this purpose,
which may lead to sending messages before `Init` in the future.
Here we clean these cases up to always use the correct check (via
the new util method).
Matt Corallo [Wed, 15 Feb 2023 01:13:57 +0000 (01:13 +0000)]
Fix (and DRY) the conditionals before calling `peer_disconnected`
If we have a peer that sends a non-`Init` first message, we'll call
`peer_disconnected` without ever having called `peer_connected`
(which has to wait until we have an `Init` message). This is a
violation of our API guarantees, though should generally not be an
issue.
Because this bug was repeated in a few places, we also take this
opportunity to DRY up the logic which checks the peer state before
calling `peer_disconnected`.
Found by the new `ChannelManager` assertions and the
`full_stack_target` fuzzer.
Matt Corallo [Sat, 3 Dec 2022 04:25:37 +0000 (04:25 +0000)]
Add a new monitor update result handling macro
Over the next few commits, this macro will replace the
`handle_monitor_update_res` macro. It takes a different approach -
instead of receiving the message(s) that need to be re-sent after
the monitor update completes and pushing them back into the
channel, we'll not get the messages from the channel at all until
we're ready for them.
This will unify our message sending into only actually fetching +
sending messages in the common monitor-update-completed code,
rather than both there *and* in the functions that call `Channel`
when new messages are originated.
Matt Corallo [Mon, 19 Dec 2022 20:41:42 +0000 (20:41 +0000)]
Add storage for `ChannelMonitorUpdate`s in `Channel`s
In order to support fully async `ChannelMonitor` updating, we need
to ensure that we can replay `ChannelMonitorUpdate`s if we shut
down after persisting a `ChannelManager` but without completing a
`ChannelMonitorUpdate` persistence. In order to support that we
(obviously) have to store the `ChannelMonitorUpdate`s in the
`ChannelManager`, which we do here inside the `Channel`.
We do so now because in the coming commits we will start using the
async persistence flow for all updates, and while we won't yet
support fully async monitor updating it's nice to get some of the
foundational structures in place now.
Matt Corallo [Wed, 30 Nov 2022 18:49:44 +0000 (18:49 +0000)]
Track actions to execute after a `ChannelMonitor` is updated
When a `ChannelMonitor` update completes, we may need to take some
further action, such as exposing an `Event` to the user initiating
another `ChannelMonitor` update, etc. This commit adds the basic
structure to track such actions and serialize them as required.
Note that while this does introduce a new map which is written as
an even value which users cannot opt out of, the map is only filled
in when users use the asynchronous `ChannelMonitor` updates. As
these are still considered beta, breaking downgrades for such users
is considered acceptable here.
Matt Corallo [Thu, 1 Dec 2022 00:25:32 +0000 (00:25 +0000)]
Add an infallible no-sign version of send_commitment_no_status_check
In the coming commits we'll move to async `ChannelMonitorUpdate`
application, which means we'll want to generate a
`ChannelMonitorUpdate` (including a new counterparty commitment
transaction) before we actually send it to our counterparty. To do
that today we'd have to actually sign the commitment transaction
by calling the signer, then drop it, apply the
`ChannelMonitorUpdate`, then re-sign the commitment transaction to
send it to our peer.
In this commit we instead split `send_commitment_no_status_check`
and `send_commitment_no_state_update` into `build_` and `send_`
variants, allowing us to generate new counterparty commitment
transactions without actually signing, then build them for sending,
with signatures, later.
Matt Corallo [Fri, 3 Feb 2023 23:05:58 +0000 (23:05 +0000)]
Fix (and test) threaded payment retries
The new in-`ChannelManager` retries logic does retries as two
separate steps, under two separate locks - first it calculates
the amount that needs to be retried, then it actually sends it.
Because the first step doesn't udpate the amount, a second thread
may come along and calculate the same amount and end up retrying
duplicatively.
Because we generally shouldn't ever be processing retries at the
same time, the fix is trivial - simply take a lock at the top of
the retry loop and hold it until we're done.
Matt Corallo [Tue, 7 Feb 2023 19:46:08 +0000 (19:46 +0000)]
Test if a given mutex is locked by the current thread in tests
In anticipation of the next commit(s) adding threaded tests, we
need to ensure our lockorder checks work fine with multiple
threads. Sadly, currently we have tests in the form
`assert!(mutex.try_lock().is_ok())` to assert that a given mutex is
not locked by the caller to a function.
The fix is rather simple given we already track mutexes locked by a
thread in our `debug_sync` logic - simply replace the check with a
new extension trait which (for test builds) checks the locked state
by only looking at what was locked by the current thread.
We're no longer supporting manual retries since
ChannelManager::send_payment_with_retry can be parameterized by a retry
strategy
This commit also updates all docs related to retry_payment and abandon_payment.
Since these docs frequently overlap with changes in preceding commits where we
start abandoning payments on behalf of the user, all the docs are updated in
one go.
Jeffrey Czyz [Tue, 31 Jan 2023 16:44:19 +0000 (10:44 -0600)]
Re-write CustomMessageHandler documentation
Documentation for CustomMessageHandler wasn't clear how it is related to
PeerManager and contained some grammatical and factual errors. Re-write
the docs and link to the lightning_custom_message crate.