Matt Corallo [Tue, 26 Oct 2021 22:52:06 +0000 (22:52 +0000)]
Rewrite InvoicePayer retry to correctly handle MPP partial failures
This rewrites a good chunk of the retry logic in `InvoicePayer` to
address two issues:
* it was not considering the return value of `send_payment` (and
`retry_payment`) may indicate a failure on some paths but not
others,
* it was not considering that more failures may still come later
when removing elements from the retry count map. This could
result in us seeing an MPP-partial-failure, failing to retry,
removing the retries count entry, and then retrying other parts,
potentially forever.
Matt Corallo [Wed, 27 Oct 2021 22:12:07 +0000 (22:12 +0000)]
Dont unwrap `RouteParameter::expiry_time` as users can set it
Users can provide anything they want as `RouteParameters` so we
shouldn't assume any fields are set any particular way, including
`expiry_time` set at all.
Jeffrey Czyz [Wed, 27 Oct 2021 15:39:22 +0000 (10:39 -0500)]
Penalize failed channels in Scorer
As payments fail, the channel responsible for the failure may be
penalized. Implement Scorer::payment_path_failed to penalize the failed
channel using a configured penalty. As time passes, the penalty is
reduced using exponential decay, though penalties will accumulate if the
channel continues to fail. The decay interval is also configurable.
Jeffrey Czyz [Thu, 14 Oct 2021 18:04:39 +0000 (13:04 -0500)]
Notify scorer of failing payment path and channel
Upon receiving a PaymentPathFailed event, the failing payment may be
retried on a different path. To avoid using the channel responsible for
the failure, a scorer should be notified of the failure before being
used to find a new route.
Add a payment_path_failed method to routing::Score and call it in
InvoicePayer's event handler. Introduce a LockableScore parameterization
to InvoicePayer so the scorer is locked only once before calling
find_route.
Matt Corallo [Thu, 21 Oct 2021 22:33:42 +0000 (22:33 +0000)]
Give peers one timer tick to finish handshake before disconnecting
This ensures we don't let a hung connection stick around forever if
the peer never completes the initial handshake.
This also resolves a race where, on receiving a second connection
from a peer, we may reset their_node_id to None to prevent sending
messages even though the `channel_encryptor`
`is_ready_for_encryption()`. Sending pings only checks the
`channel_encryptor` status, not `their_node_id` resulting in an
`unwrap` on `None` in `enqueue_message`.
Jeffrey Czyz [Tue, 24 Aug 2021 05:08:15 +0000 (00:08 -0500)]
Add InvoicePayer for retrying failed payments
When a payment fails, it's useful to retry the payment once the network
graph and channel scores are updated. InvoicePayer is a utility for
making payments which will retry any failed payment paths for a payment
up to a configured number of total attempts. It is parameterized by a
Payer and Router for ease of customization and testing.
Implement EventHandler for InvoicePayer as a decorator that intercepts
PaymentPathFailed events and retries that payment using the parameters
from the event. It delegates to the decorated EventHandler after retries
have been exhausted and for other events.
Jeffrey Czyz [Mon, 25 Oct 2021 23:48:52 +0000 (18:48 -0500)]
Unify route finding methods
An upcoming Router interface will be used for finding a Route both when
initially sending a payment and also when retrying failed payment paths.
Unify the three varieties of get_route so the interface can consist of a
single method implemented by the new `find_route` method. Give get_route
pub(crate) visibility so it can still be used in tests.
Jeffrey Czyz [Mon, 4 Oct 2021 14:20:49 +0000 (09:20 -0500)]
Rewrite Invoice's interface in terms of msats
InvoiceBuilder's interface was changed recently to work in terms of
msats. Update Invoice's interface to return the amount in msats, too,
and make amount_pico_btc private.
Jeffrey Czyz [Sun, 17 Oct 2021 22:21:01 +0000 (17:21 -0500)]
Add PaymentId to PaymentSent event
The payment_hash may not uniquely identify the payment if it has been
reused. Include the payment_id in PaymentSent events so it can
correlated with the send_payment call.
Matt Corallo [Mon, 25 Oct 2021 04:46:26 +0000 (04:46 +0000)]
Store `Payee` information in `HTLCSource::OutboundRoute`.
This stores and tracks HTLC payee information with HTLCSource info,
allowing us to provide it back to the user if the HTLC fails and
ensuring persistence by keeping it with the HTLC itself as it
passes between Channel and ChannelMonitor.
Matt Corallo [Mon, 25 Oct 2021 17:52:30 +0000 (17:52 +0000)]
Make `Payee::pubkey` pub.
`Payee` is expected to be used by users to get routes for payment
retries, potentially with their own router. Thus, its helpful if it
is pub, even if it is redundant with the last hop in the `path`
field in `Events::PaymentPathFailed`.
Jeffrey Czyz [Fri, 22 Oct 2021 05:27:58 +0000 (00:27 -0500)]
Use option TLV decoding for short_channel_id
Using ignorable TLV decoding is only applicable for an Option containing
an enum, but short_channel_id is an Option<u64>. Use option TLV encoding
instead.
Jeffrey Czyz [Thu, 21 Oct 2021 22:52:53 +0000 (17:52 -0500)]
Include PaymentPathRetry data in PaymentPathFailed
When a payment path fails, it may be retried. Typically, this means
re-computing the route after updating the NetworkGraph and channel
scores in order to avoid the failing hop. The last hop in
PaymentPathFailed's path field contains the pubkey, amount, and CLTV
values needed to pass to get_route. However, it does not contain the
payee's features and route hints from the invoice.
Include the entire set of parameters in PaymentPathRetry and add it to
the PaymentPathFailed event. Add a get_retry_route wrapper around
get_route that takes PaymentPathRetry. This allows an EventHandler to
retry failed payment paths using the payee's route hints and features.
Jeffrey Czyz [Wed, 20 Oct 2021 14:15:31 +0000 (09:15 -0500)]
Define Payee abstraction for use in get_route
A payee can be identified by a pubkey and optionally have an associated
set of invoice features and route hints. Use this in get_route instead
of three separate parameters. This may be included in PaymentPathFailed
later to use when finding a new route.
Matt Corallo [Sun, 10 Oct 2021 23:36:44 +0000 (23:36 +0000)]
Reload pending payments from ChannelMonitor HTLC data on reload
If we go to send a payment, add the HTLC(s) to the channel(s),
commit the ChannelMonitor updates to disk, and then crash, we'll
come back up with no pending payments but HTLC(s) ready to be
claim/failed.
This makes it rather impractical to write a payment sender/retryer,
as you cannot guarantee atomicity - you cannot guarantee you'll
have retry data persisted even if the HTLC(s) are actually pending.
Because ChannelMonitors are *the* atomically-persisted data in LDK,
we lean on their current HTLC data to figure out what HTLC(s) are a
part of an outbound payment, rebuilding the pending payments list
on reload.
Matt Corallo [Sun, 3 Oct 2021 22:33:12 +0000 (22:33 +0000)]
Track payments after they resolve until all HTLCs are finalized
In the next commit, we will reload lost pending payments from
ChannelMonitors during restart. However, in order to avoid
re-adding pending payments which have already been fulfilled, we
must ensure that we do not fully remove pending payments until all
HTLCs for the payment have been fully removed from their
ChannelMonitors.
We do so here, introducing a new PendingOutboundPayment variant
called `Completed` which only tracks the set of pending HTLCs.
Matt Corallo [Sat, 2 Oct 2021 22:35:07 +0000 (22:35 +0000)]
Inform ChannelManager when fulfilled HTLCs are finalized
When an HTLC has been failed, we track it up until the point there
exists no broadcastable commitment transaction which has the HTLC
present, at which point Channel returns the HTLCSource back to the
ChannelManager, which fails the HTLC backwards appropriately.
When an HTLC is fulfilled, however, we fulfill on the backwards path
immediately. This is great for claiming upstream HTLCs, but when we
want to track pending payments, we need to ensure we can check with
ChannelMonitor data to rebuild pending payments. In order to do so,
we need an event similar to the HTLC failure event, but for
fulfills instead.
Specifically, if we force-close a channel, we remove its off-chain
`Channel` object entirely, at which point, on reload, we may notice
HTLC(s) which are not present in our pending payments map (as they
may have received a payment preimage, but not fully committed to
it). Thus, we'd conclude we still have a retryable payment, which
is untrue.
This commit does so, informing the ChannelManager via a new return
element where appropriate of the HTLCSource corresponding to the
failed HTLC.
Matt Corallo [Thu, 14 Oct 2021 23:38:08 +0000 (23:38 +0000)]
Always release `MonitorEvent`s to `ChannelManager` after 3 blocks
If we have a `ChannelMonitor` update from an on-chain event which
returns a `TemporaryFailure`, we block `MonitorEvent`s from that
`ChannelMonitor` until the update is persisted. This prevents
duplicate payment send events to the user after payments get
reloaded from monitors on restart.
However, if the event being avoided isn't going to generate a
PaymentSent, but instead result in us claiming an HTLC from an
upstream channel (ie the HTLC was forwarded), then the result of a
user delaying the event is that we delay getting our money, not a
duplicate event.
Because user persistence may take an arbitrary amount of time, we
need to bound the amount of time we can possibly wait to return
events, which we do here by bounding it to 3 blocks.
Matt Corallo [Sun, 10 Oct 2021 18:02:17 +0000 (18:02 +0000)]
Update test_dup_htlc_onchain_fails_on_reload for new persist API
ChannelMonitors now require that they be re-persisted before
MonitorEvents be provided to the ChannelManager, the exact thing
that test_dup_htlc_onchain_fails_on_reload was testing for when it
*didn't* happen. As such, test_dup_htlc_onchain_fails_on_reload is
now testing that we bahve correctly when the API guarantees are not
met, something we don't need to do.
Here, we adapt it to test the new API requirements through
ChainMonitor's calls to the Persist trait instead.
Matt Corallo [Wed, 13 Oct 2021 20:05:48 +0000 (20:05 +0000)]
Persist `ChannelMonitor`s after new blocks are connected
This resolves several user complaints (and issues in the sample
node) where startup is substantially delayed as we're always
waiting for the chain data to sync.
Further, in an upcoming PR, we'll be reloading pending payments
from ChannelMonitors on restart, at which point we'll need the
change here which avoids handling events until after the user
has confirmed the `ChannelMonitor` has been persisted to disk.
It will avoid a race where we
* send a payment/HTLC (persisting the monitor to disk with the
HTLC pending),
* force-close the channel, removing the channel entry from the
ChannelManager entirely,
* persist the ChannelManager,
* connect a block which contains a fulfill of the HTLC, generating
a claim event,
* handle the claim event while the `ChannelMonitor` is being
persisted,
* persist the ChannelManager (before the CHannelMonitor is
persisted fully),
* restart, reloading the HTLC as a pending payment in the
ChannelManager, which now has no references to it except from
the ChannelMonitor which still has the pending HTLC,
* replay the block connection, generating a duplicate PaymentSent
event.
Matt Corallo [Thu, 7 Oct 2021 23:59:47 +0000 (23:59 +0000)]
Use an opaque type to describe monitor updates in Persist
In the next commit, we'll be originating monitor updates both from
the ChainMonitor and from the ChannelManager, making simple
sequential update IDs impossible.
Further, the existing async monitor update API was somewhat hard to
work with - instead of being able to generate monitor_updated
callbacks whenever a persistence process finishes, you had to
ensure you only did so at least once all previous updates had also
been persisted.
Here we eat the complexity for the user by moving to an opaque
type for monitor updates, tracking which updates are in-flight for
the user and only generating monitor-persisted events once all
pending updates have been committed.
Matt Corallo [Thu, 7 Oct 2021 18:51:49 +0000 (18:51 +0000)]
Move ChannelManager::monitor_updated to a MonitorEvent
In the next commit we'll need ChainMonitor to "see" when a monitor
persistence completes, which means `monitor_updated` needs to move
to `ChainMonitor`. The simplest way to then communicate that
information to `ChannelManager` is via `MonitorEvet`s, which seems
to line up ok, even if they're now constructed by multiple
different places.
Jeffrey Czyz [Mon, 18 Oct 2021 23:36:35 +0000 (18:36 -0500)]
Add source and target nodes to routing::Score
Expand routing::Score::channel_penalty_msat to include the source and
target node ids of the channel. This allows scorers to avoid certain
nodes altogether if desired.
Jeffrey Czyz [Sat, 16 Oct 2021 02:31:33 +0000 (21:31 -0500)]
Simplify prefers_shorter_route_with_higher_fees
In order to make the scoring tests easier to read, only check the
relevant RouteHop fields. The remaining fields are tested elsewhere.
Expand the test to show the path used without scoring.
Failed payments may be retried, but calling get_route may return a Route
with the same failing path. Add a routing::Score trait used to
parameterize get_route, which it calls to determine how much a channel
should be penalized in terms of msats willing to pay to avoid the
channel.
Also, add a Scorer struct that implements routing::Score with a constant
constant penalty. Subsequent changes will allow for more robust scoring
by feeding back payment path success and failure to the scorer via event
handling.
Matt Corallo [Fri, 8 Oct 2021 06:16:28 +0000 (06:16 +0000)]
Use Persister to return errors in tests not chain::Watch
As ChainMonitor will need to see those errors in a coming PR,
we need to return errors via Persister so that our ChainMonitor
chain::Watch implementation sees them.
Matt Corallo [Fri, 8 Oct 2021 20:40:34 +0000 (20:40 +0000)]
Handle Persister returning TemporaryFailure for new channels
Previously, if a Persister returned a TemporaryFailure error when
we tried to persist a new channel, the ChainMonitor wouldn't track
the new ChannelMonitor at all, generating a PermanentFailure later
when the updating is restored.
This fixes that by correctly storing the ChannelMonitor on
TemporaryFailures, allowing later update restoration to happen
normally.
This is (indirectly) tested in the next commit where we use
Persister to return all monitor-update errors.
Matt Corallo [Fri, 8 Oct 2021 05:17:48 +0000 (05:17 +0000)]
Simplify channelmonitor tests which use chain::Watch and Persister
test_simple_monitor_permanent_update_fail and
test_simple_monitor_temporary_update_fail both have a mode where
they use either chain::Watch or persister to return errors.
As we won't be doing any returns directly from the chain::Watch
wrapper in a coming commit, the chain::Watch-return form of the
test will no longer make sense.
Matt Corallo [Fri, 8 Oct 2021 19:07:00 +0000 (19:07 +0000)]
Make `ChainMonitor::monitors` private and expose monitor via getter
Exposing a `RwLock<HashMap<>>` directly was always a bit strange,
and in upcoming changes we'd like to change the internal
datastructure in `ChainMonitor`.
Further, the use of `RwLock` and `HashMap` meant we weren't able
to expose the ChannelMonitors themselves to users in bindings,
leaving a bindings/rust API gap.
Thus, we take this opportunity go expose ChannelMonitors directly
via a wrapper, hiding the internals of `ChainMonitor` behind
getters. We also update tests to use the new API.
The interface for get_route will change to take a scorer. Using
get_route_and_payment_hash whenever possible allows for keeping the
scorer inside get_route_and_payment_hash rather than at every call site.
Replace get_route with get_route_and_payment_hash wherever possible.
Additionally, update get_route_and_payment_hash to use the known invoice
features and the sending node's logger.
Matt Corallo [Wed, 13 Oct 2021 21:14:35 +0000 (21:14 +0000)]
Return the temporary channel id in success from `create_channel`
This makes it more practical for users to track channels prior to
funding, especially if the channel fails because the peer rejects
it for a parameter mismatch.
Matt Corallo [Mon, 11 Oct 2021 23:46:51 +0000 (23:46 +0000)]
Expose ReadOnlyNetworkGraph::get_addresses to C by cloning result
We cannot expose ReadOnlyNetworkGraph::get_addresses as is in C as
it returns a list of references to an enum, which the bindings
dont support. Instead, we simply clone the result so that it
doesn't contain references.