Matt Corallo [Tue, 4 Apr 2023 21:45:37 +0000 (21:45 +0000)]
Reapply pending `ChannelMonitorUpdate`s on startup
If a `ChannelMonitorUpdate` was created and given to the user but
left uncompleted when the `ChannelManager` is persisted prior to a
restart, the user likely lost the `ChannelMonitorUpdate`(s). Thus,
we need to replay them for the user, which we do here using the
new `BackgroundEvent::MonitorUpdateRegeneratedOnStartup` variant.
Matt Corallo [Thu, 6 Apr 2023 19:56:01 +0000 (19:56 +0000)]
Process background events when taking the total_consistency_lock
When we generated a `ChannelMonitorUpdate` during `ChannelManager`
deserialization, we must ensure that it gets processed before any
other `ChannelMonitorUpdate`s. The obvious hook for this is when
taking the `total_consistency_lock`, which makes it unlikely we'll
regress by forgetting this.
Here we add that call in the `PersistenceNotifierGuard`, with a
test-only atomic bool to test that this criteria is met.
`BackgroundEvent` was used to store `ChannelMonitorUpdate`s which
result in a channel force-close, avoiding relying on
`ChannelMonitor`s having been loaded while `ChannelManager`
block-connection methods are called during startup.
In the coming commit(s) we'll also generate non-channel-closing
`ChannelMonitorUpdate`s during startup, which will need to be
replayed prior to any other `ChannelMonitorUpdate`s generated from
normal operation.
In the next commit we'll handle that by handling `BackgroundEvent`s
immediately after locking the `total_consistency_lock`.
Matt Corallo [Wed, 10 May 2023 17:15:29 +0000 (17:15 +0000)]
Make `AChannelManager` trait slightly more generic and always on
Rather than letting `AChannelManager` be bounded by all traits
being `Sized` we make them explicitly `?Sized`. We also make the
trait no longer test-only as it will be used in a coming commit.
Matt Corallo [Wed, 10 May 2023 00:45:08 +0000 (00:45 +0000)]
Return the counterparty node_id as a part of a force-shutdown res
In the coming commits we'll need the counterparty node_id when
handling a background monitor update as we may need to resume
normal channel operation as a result. Thus, we go ahead and pipe it
through from the shutdown end, as it makes the codepaths
consistent.
Sadly, the monitor-originated shutdown case doesn't allow for a
required counterparty node_id as some versions of LDK didn't have
it present in the ChannelMonitor.
Matt Corallo [Tue, 30 May 2023 18:11:47 +0000 (18:11 +0000)]
Fix `held_by_thread` in `no-std` to return instead of panicing
Our `no-std` locks simply panic if a lock cannot be taken as there
should be no lock contention in a single-threaded environment.
However, the `held_by_thread` debug methods were delegating to the
lock methods which resulted in a panic when asserting that a lock
*is* held by the current thread.
Instead, they are updated here to call the relevant `RefCell`
testing methods.
This makes much clearer at sites generating such events that they
will be lost on restart, to reduce risk of bugs creeping in due to
lost monitor updates.
In d4810087c1 we added logic to apply `ChannelMonitorUpdate`s which
were a part of a channel closure async via a background queue to
address some startup issues. When we did that we persisted those
updates to ensure we replayed them when starting next time.
However, there was no reason to - if we persisted and then
restarted even without those monitor updates we'd find a monitor
without a channel, which we'd tell to broadcast the latest
commitment transaction to force-close.
Since adding that logic, we've used the same background queue for
several purposes.
Elias Rohrer [Fri, 5 May 2023 09:26:50 +0000 (11:26 +0200)]
Make `lightning-transaction-sync` compat notice a bit more explicit
As `lightning-transaction-sync` was introduced with 0.0.114 and depended
on prior changes in the same release cycle we deemed it reasonable to
omit the implicitly limited backwards compatibility.
It however turns out this might be confusing to users copy/pasting the
codebase. Here we therefore spell out the implicit dependency on 0.0.114
and above.
Fix onion messages of size BIG_PACKET_HOP_DATA_LEN
This was previously broken and would result in an invalid HMAC error, because
we had a hardcoded assumption that OM hop data would always be of size 1300.
Duncan Dean [Thu, 20 Oct 2022 20:56:37 +0000 (22:56 +0200)]
Add message structs required for dual-funded channels
This is the first of a set of PRs to enable the experimental dual-funded
channels feature using interactive transaction construction. This allows
both the channel initiator and channel acceptor to contribute funds
towards the channel.
Matt Corallo [Fri, 5 May 2023 03:33:54 +0000 (03:33 +0000)]
Document when `PaymentPathSuccessful::payment_hash` is filled in.
The `payment_hash` field in `PaymentPathSuccessful` is always
`Some` as long as the pening payment tracker has a `payment_hash`,
which is true for all `Pending` payments as well as all `Fulfilled`
payments starting with the commit which added
`PaymentPathSuccessful` - 3b5c370b404e2f5a8f3c35093b97406f149a9340c177c05252574083d68df0da.
Matt Corallo [Fri, 5 May 2023 00:13:25 +0000 (00:13 +0000)]
Mention lnd's SCB feature in the corresponding error message
It's a bit confusing when we see only "Peer sent a garbage
channel_reestablish" when a peer uses lnd's SCB feature to ask us
to broadcast the latest state. This updates the error message to be
a bit clearer.
Wilmer Paulino [Thu, 4 May 2023 22:16:17 +0000 (15:16 -0700)]
Prevent ChannelForceClosed monitor update error after detecting spend
If we detected a spend for a channel onchain prior to handling its
`ChannelForceClosed` monitor update, we'd log a concerning error
message and return an error unnecessarily. The channel has already been
closed, so handling the `ChannelForceClosed` monitor update at this
point should be a no-op.
Groundwork for refactoring PaymentParams::Hints to ::Payee
Minor changes in preparation for supporting route blinding in
PaymentParameters. In the next commit, we'll be moving more
unblinded-payee-specific fields from the top level parameters into the clear
enum variant.
`<E as serde::de::Error>::custom()` accepts any `T: Display`, not just
`String`. Therefore it accepts `Arguments<'_>` too so we can use
`format_args!()` instead of `format!()`.
See https://github.com/lightningdevkit/rust-lightning/pull/2187#discussion_r1168781355
Matt Corallo [Sat, 29 Apr 2023 18:45:59 +0000 (18:45 +0000)]
Expose a trait impl'd for all `PeerManager` for use as a bound
A while back, in tests, we added a `AChannelManager` trait, which
is implemented for all `ChannelManager`s, and can be used as a
bound when we need a `ChannelManager`, rather than having to
duplicate all the bounds of `ChannelManager` everywhere.
Here we do the same thing for `PeerManager`, but make it public and
use it to clean up `lightning-net-tokio` and
`lightning-background-processor`.
We should likely do the same for `AChannelManager`, but that's left
as a followup.
Matt Corallo [Fri, 17 Mar 2023 04:55:30 +0000 (04:55 +0000)]
Store + process pending `ChannelMonitorUpdate`s in `Channel`
The previous commits set up the ability for us to hold
`ChannelMonitorUpdate`s which are pending until we're ready to pass
them to users and have them be applied. However, if the
`ChannelManager` is persisted while we're waiting to give the user
a `ChannelMonitorUpdate` we'll be confused on restart - seeing our
latest `ChannelMonitor` state as stale compared to our
`ChannelManager` - a critical error.
Luckily the solution is trivial, we simply need to store the
pending `ChannelMonitorUpdate` state and load it with the
`ChannelManager` data, allowing stale monitors on load as long as
we have the missing pending updates between where we are and the
latest `ChannelMonitor` state.
Matt Corallo [Thu, 16 Mar 2023 03:33:20 +0000 (03:33 +0000)]
Handle `EventCompletionAction`s after events complete
This adds handling of the new `EventCompletionAction`s after
`Event`s are handled, letting `ChannelMonitorUpdate`s which were
blocked fly after a relevant `Event`.
Matt Corallo [Fri, 28 Apr 2023 04:24:25 +0000 (04:24 +0000)]
Track an `EventCompletionAction` for after an `Event` is processed
This will allow us to block `ChannelMonitorUpdate`s on `Event`
processing in the next commit.
Note that this gets dangerously close to breaking forwards
compatibility - if we have an `Event` with an
`EventCompletionAction` tied to it, we persist a new, even, TLV in
the `ChannelManager`. Hopefully this should be uncommon, as it
implies an `Event` was delayed until after a full round-trip to a
peer.
Matt Corallo [Wed, 15 Mar 2023 23:16:06 +0000 (23:16 +0000)]
Allow holding `ChannelMonitorUpdate`s until later, completing one
In the coming commits, we need to delay `ChannelMonitorUpdate`s
until future actions (specifically `Event` handling). However,
because we should only notify users once of a given
`ChannelMonitorUpdate` and they must be provided in-order, we need
to track which ones have or have not been given to users and, once
updating resumes, fly the ones that haven't already made it to
users.
To do this we simply add a `bool` in the `ChannelMonitorUpdate` set
stored in the `Channel` which indicates if an update flew and
decline to provide new updates back to the `ChannelManager` if any
updates have their flown bit unset.
Further, because we'll now by releasing `ChannelMonitorUpdate`s
which were already stored in the pending list, we now need to
support getting a `Completed` result for a monitor which isn't the
only pending monitor (or even out of order), thus we also rewrite
the way monitor updates are marked completed.
Duncan Dean [Mon, 1 May 2023 20:52:30 +0000 (22:52 +0200)]
Remove `OptionalField` and move `shutdown_scriptpubkey` into TLV stream
As pointed out in https://github.com/lightning/bolts/pull/754/commits/6656b70,
we can move the `shutdown_scriptpubkey` field into the TLV streams of
`OpenChannel` and `AcceptChannel` without affecting the resulting encoding.
We use `WithoutLength` encoding here to ensure that we do not encode a
length prefix along with `Script` as is normally the case.
Duncan Dean [Mon, 1 May 2023 20:23:20 +0000 (22:23 +0200)]
Make `DataLossProtect` fields required and remove wrappers
The fields provided by `DataLossProtect` have been mandatory since
https://github.com/lightning/bolts/pull/754/commits/6656b70, regardless
of whether `option_dataloss_protect` or `option_remote_key` feature bits
are set.
We move the fields out of `DataLossProtect` to make encoding definitions
more succinct with `impl_writeable_msg!` and to reduce boilerplate.
This paves the way for completely removing `OptionalField` in subsequent
commits.
Matt Corallo [Sat, 29 Apr 2023 17:58:15 +0000 (17:58 +0000)]
Move the `CustomMessageHandler` into the `MessageHandler` struct
`PeerManager` takes a `MessageHandler` struct which contains all
the known message handlers for it to pass messages to. It then,
separately, takes a `CustomMessageHandler`. This makes no sense, we
should simply include the `CustomMessageHandler` in the
`MessageHandler` struct for consistency.
Matt Corallo [Fri, 28 Apr 2023 16:19:04 +0000 (16:19 +0000)]
Fix overflow in `awaiting_pong_timer...` with too many peers
If we have more than
127 / `MAX_BUFFER_DRAIN_TICK_INTERVALS_PER_PEER` (31) peers,
`awaiting_pong_timer_tick_intervals` can overflow before we hit
the limit. This isn't super harmful, we'll still disconnect peers
as long as they don't send *any* messages between two pings, but it
does cause us to not disconnect peers which are extremely slow in
responding to messages, e.g. because they are overloaded.
Duncan Dean [Wed, 26 Apr 2023 12:57:18 +0000 (14:57 +0200)]
Use `env::temp_dir()` for BP tests
Currently `BackgroundProcessor` tests create persister directories in the
current working directory and rely on cleaning up in a `Drop` implementation.
Unfortunately, it seems that in the async tests that nodes are not
`drop()`ed for some reason and so the directories created by those
tests remain behind in the current working directory.
This commit at least ensures that these test directories are created in
a temporary location for the OS using `temp_dir()`. It doesn't aim to
solve the lack of cleanup in the async tests.
Partial fix for #2224 but I believe it's enough to resolve it as these
temp directories that do remain will be purged by the OS at some stage
and are overwritten by subsequent tests if there is a conflict.
Matt Corallo [Wed, 26 Apr 2023 05:01:13 +0000 (05:01 +0000)]
Fix a leak in `FutureState` when a `Notifier` is dropped un-woken
If a `Notifier` has an internal `FutureState` which gathers some
sleeper callbacks, but is never actaully woken, those callbacks
will leak due to a circular `Arc` reference when the `Notifier` is
`drop`'d.
Because `Notifier`s are rarely `drop`'d in production this isn't a
huge deal, but shows up materially in bindings tests as they spawn
many nodes over the course of a short test.
Matt Corallo [Sun, 23 Apr 2023 16:17:29 +0000 (16:17 +0000)]
Don't remove nodes if there's no channel_update for a temp failure
Previously, we were requiring any `UPDATE` onion errors to include
a `channel_update`, as the spec mandates[1]. If we see an onion
error which is missing one we treat it as a misbehaving node that
isn't behaving according to the spec and simply remove the node.
Sadly, it appears at least some versions of CLN are such nodes, and
opt to not include `channel_update` at all if they're returning a
`temporary_channel_failure`. This causes us to completely remove
CLN nodes from our graph after they fail to forward our HTLC.
While CLN is violating the spec here, there's not a lot of reason
to not allow it, so we go ahead and do so here, treating it simply
as any other failure by letting the scorer handle it.
[1] The spec says `Please note that the channel_update field is
mandatory in messages whose failure_code includes the UPDATE flag`
however doesn't repeat it in the requirements section so its not
crazy that someone missed it when implementing.
Matt Corallo [Mon, 24 Apr 2023 03:48:42 +0000 (03:48 +0000)]
Check for `background-processor` exit condition before+after sleep
In a synchronous `BackgroundProcessor`, the exit is done by setting
an atomic flag, which is most likely to happen while we're asleep.
Thus, we previously checked for the exit condition after the sleep
(and after we persisted the `ChannelManager`, if required, though
this is no longer required and dates back to when we didn't do a
re-persist after breaking out of the main loop).
For an async `background-processor`, this is also fine, however
because of the relatively longer sleep time, if the exit flag is
set via a sleep check returning true during event processing, we
may end up delaying exit rather substantially.
In order to avoid this, we simply check for the exit condition both
before and immediately after the sleep in `background-processor`.