Matt Corallo [Sat, 8 Oct 2022 19:54:43 +0000 (19:54 +0000)]
Stop timing out payments automatically, requiring abandon_payment
When the `abandon_payment` flow was added there was some concern
that upgrading users may not migrate to the new flow, causing
memory leaks in the pending-payment tracking.
While this is true, now that we're relying on the
pending_outbound_payments map for `send_payment` idempotency, the
risk of removing a payment prematurely goes up from "spurious
retry failure" to "sending a duplicative payment", which is much
worse.
Thus, we simply remove the automated payment timeout here,
explicitly requiring that users call `abandon_payment` when they
give up retrying a payment.
Matt Corallo [Sat, 8 Oct 2022 23:26:18 +0000 (23:26 +0000)]
Delay removal of fulfilled outbound payments for a few timer ticks
Previously, once a fulfilled outbound payment completed and all
associated HTLCs were resolved, we'd immediately remove the payment
entry from the `pending_outbound_payments` map.
Now that we're using the `pending_outbound_payments` map for send
idempotency, this presents a race condition - if the user makes a
redundant `send_payment` call at the same time that the original
payment's last HTLC is resolved, the user would reasonably expect
the `send_payment` call to fail due to our idempotency guarantees.
However, because the `pending_outbound_payments` entry is being
removed, if it completes first the `send_payment` call will
succeed even though the user has not had a chance to see the
corresponding `Event::PaymentSent`.
Instead, here, we delay removal of `Fulfilled`
`pending_outbound_payments` entries until several timer ticks have
passed without any corresponding event or HTLC pending.
Matt Corallo [Thu, 6 Oct 2022 21:31:02 +0000 (21:31 +0000)]
Allow users to specify the `PaymentId` for new outbound payments
In c986e52ce83e9aeaa9447abebc5f6600470337cf, an `MppId` was added
to `HTLCSource` objects as a way of correlating HTLCs which belong
to the same payment when the `ChannelManager` sees an HTLC
succeed/fail. This allows it to have awareness of the state of all
HTLCs in a payment when it generates the ultimate user-facing
payment success/failure events. This was used in the same PR to
avoid generating duplicative success/failure events for a single
payment.
Because the field was only used as an internal token to correlate
HTLCs, and retries were not supported, it was generated randomly by
calling the `KeysInterface`'s 32-byte random-fetching function.
This also provided a backwards-compatibility story as the existing
HTLC randomization key was re-used for older clients.
In 28eea12bbe0d78d256f79ec725cf02366dce4e36 `MppId` was renamed to
the current `PaymentId` which was then used expose the
`retry_payment` interface, allowing users to send new HTLCs which
are considered a part of an existing payment.
At no point has the payment-sending API seriously considered
idempotency, a major drawback which leaves the API unsafe in most
deployments. Luckily, there is a simple solution - because the
`PaymentId` must be unique, and because payment information for a
given payment is held for several blocks after a payment
completes/fails, it represents an obvious idempotency token.
Here we simply require the user provide the `PaymentId` directly in
`send_payment`, allowing them to use whatever token they may
already have for a payment's idempotency token.
Matt Corallo [Thu, 29 Sep 2022 15:22:47 +0000 (15:22 +0000)]
Add `WriteableScore` bindings impl for `MultiThreadedLockableScore`
In 56b07e52aabdaca521987e765f1fa864966a5d49 we made
`MultiThreadedLockableScore` fully bindings-compatible. However, it
did not add a `WriteableScore` implementation for it. This was an
oversight as it is a `WriteableScore` in Rust and needs to be for
use in other parts of the API.
Here we add the required impl in a way that the bindings generator
is able to handle it and add conversion utilities.
Implement sending and receiving custom onion messages
This uses the work done in the preceding commits to implement encoding a user's
custom TLV in outbound onion messages, and decoding custom TLVs in inbound
onion messages, to be provided to the new CustomOnionMessageHandler.
As we're moving towards monitor update async being a supported
use-case, we shouldn't call an async monitor update "failed", but
rather "in progress". This simply updates the internal channel.rs
enum name to reflect the new thinking.
Matt Corallo [Sat, 20 Aug 2022 01:03:27 +0000 (01:03 +0000)]
Handle async initial ChannelMonitor persistence failing on restart
If the initial ChannelMonitor persistence is done asynchronously
but does not complete before the node restarts (with a
ChannelManager persistence), we'll start back up with a channel
present but no corresponding ChannelMonitor.
Because the Channel is pending-monitor-update and has not yet
broadcasted its initial funding transaction or sent channel_ready,
this is not a violation of our API contract nor a safety violation.
However, the previous code would refuse to deserialize the
ChannelManager treating it as an API contract violation.
The solution is to test for this case explicitly and drop the
channel entirely as if the peer disconnected before we received
the funding_signed for outbound channels or before sending the
channel_ready for inbound channels.
Matt Corallo [Thu, 13 Oct 2022 21:57:09 +0000 (21:57 +0000)]
Include all channel route hints if no connected channels exist
Mobile clients often take a second or two before they reconnect to
their peers as its not always clear immediately that connections
have been killed (especially on iOS). This can cause us to
spuriously fail to include route hints in our invoices if we're on
mobile.
The fix is simple, if we're selecting channels to include in route
hints and we're not not connected to the peer for any of our
channels, we should simply include the hints for all channels, even
though we're disconencted.
As we integrate the support of anchor outputs, we'll want to know if
each input we're working with came from an anchor outputs channel.
Instead of threading through a `opt_anchors` boolean across several
methods on `PackageSolvingData` and `PackageTemplate`, we decide to
store a reference in each `PackageSolvingData` variant instead that
features a change in behavior between channels with and without anchor
outputs.
Gursharan Singh [Tue, 11 Oct 2022 19:20:33 +0000 (15:20 -0400)]
Dedupe gossip msgs while updating networkGraph from RGS
While applying gossip info from RGS-server, number of harmless
errors might arise which should be ignored. E.g. client should not
fail if there is a duplicate gossip for same channel or duplicate
update.
Update send_onion_message API to take new OnionMessageContents enum
OnionMessageContents specifies the data TLV that the sender wants in the onion
message. This enum only has one variant for now, Custom. When offers are added,
additional variants for invoice, invoice_request, and invoice_error will be
added.
This commit does not actually implement sending the custom OM contents, just
the API change.
Parameterize OnionMessenger by new CustomOnionMessageHandler trait
OnionMessenger::new will now take a custom onion message handler trait
implementation. This handler will be used in upcoming commit(s) to handle
inbound custom onion messages.
The new trait also specifies what custom messages are supported via its
associated type, CustomMessage. This associated type must implement a new
CustomOnionMessagesContents trait, which requires custom messages to support
being written, being read, and supplying their TLV type.
Useful in decoding a custom message, so (a) the message type can be provided to
the handler and (b) None can be returned if the message type is unknown.
Used in upcoming commit(s) to support custom onion messages.
Matt Corallo [Wed, 5 Oct 2022 19:23:56 +0000 (19:23 +0000)]
Increase the default `liquidity_offset_half_life` to six hours
Even at relatively high payment volumes, decaying knowledge of each
individual channel every hour causes aggressive retrying of
channels as we quickly forget the state of a channel. Even with the
historical tracker, this isn't fully remedied, as we'll track the
history bounds with the decayed value.
Instead, we decay every six hours here, reducing how often we'll
retry a channel due to decay.
In addition to this, the decay likely needs to be substantially
more linear, as tracked in #1752.
Matt Corallo [Wed, 5 Oct 2022 19:10:06 +0000 (19:10 +0000)]
Rewrite documentation some on `ProbabilisticScorer`
We had some user confusion on how the probabilistic scorer works,
especially in reference to the half-life parameter. This attempts
to clarify how the bounds work, and how they are decayed.
Elias Rohrer [Mon, 10 Oct 2022 18:47:18 +0000 (14:47 -0400)]
Add `futures` CI check
This adds testing of the `futures` feature to CI. In order to avoid
introducing a dependency on `std`, we switch to using the `futures-util`
crate directly enabling only a subset of features. To this end, we also
switch to using the `select_biased!` macro, which is equivalent to
`select!` except that it doesn't choose ready futures pseudo-randomly
at runtime.
Matt Corallo [Thu, 6 Oct 2022 15:44:03 +0000 (15:44 +0000)]
Test a full `no-std` build in the `no-std-check` crate in CI
Rust is incredibly forgiving in attempts to access `std`, making it
rather difficult to test `no-std` properly. In practice, the only
decent way to do so is to actually build for a platform that does
not have `std`, which we do here by building the `no-std-check`
crate for `arm-thumbv7m-none-eabi`.
Duncan Dean [Fri, 26 Aug 2022 10:54:16 +0000 (12:54 +0200)]
Remove and track perm failed nodes & channels
We never had the `NetworkGraph::node_failed` method implemented. The
scorer now handles non-permanent failures by downgrading nodes, so we
don't want that implemented.
The method is renamed to `node_failed_permanent` to explicitly indicate
that this is the only case it handles. We also add tracking in the form
of two maps as fields of `NetworkGraph`, namely, `removed_nodes` and
`removed_channels`. We track these removed graph entries to ensure we
don't just resync them right away from gossip soon after removing them.
We stop tracking these removed nodes whenever `remove_stale_channels_and_tracking()`
is called and the entries have been tracked for longer than
`REMOVED_ENTRIES_TRACKING_AGE_LIMIT_SECS` which is currently set to one
week.
Matt Corallo [Fri, 7 Oct 2022 01:38:56 +0000 (01:38 +0000)]
Drop the subsecond part of timestamps when constructing invoices
We had a user who was confused why an invoice failed to round-trip
(i.e. was not `PartialEq` with itself after write/read) when a
subsecond creation time was used (e.g. via the `from_system_time`
constructor).
This commit should address this confusion by dropping subsecond
parts in easily-accessible constructors when creating BOLT 11
invoices.
Matt Corallo [Mon, 22 Aug 2022 22:41:31 +0000 (22:41 +0000)]
Decay historical liquidity tracking when no new data is added
To avoid scoring based on incredibly old historical liquidity data,
we add a new half-life here which is used to (very slowly) decay
historical liquidity tracking buckets.
Matt Corallo [Mon, 22 Aug 2022 22:40:02 +0000 (22:40 +0000)]
Track a reference to scoring parameters in DirectedChannelLiquidity
This simplifies adding additional half lives in
DirectedChannelLiquidity by simply storing a reference to the full
parameter set rather than only the single current half-life.
Matt Corallo [Thu, 21 Jul 2022 02:08:28 +0000 (02:08 +0000)]
Calculate a new penalty based on historical channel liquidity range
Our current `ProbabilisticScorer` attempts to build a model of the
current liquidity across the payment channel network. This works
fine to ignore channels we *just* tried to pay through, but it
fails to remember patterns over longer time horizons.
Specifically, there are *many* channels within the network that are
very often either fully saturated in one direction, or are
regularly rebalanced and rarely saturated. While our model may
discover that, when it decays its offsets or if there is a
temporary change in liquidity, it effectively forgets the "normal"
state of the channel.
This causes substantially suboptimal routing in practice, and
avoiding discarding older knowledge when new datapoints come in is
a potential solution to this.
Here, we implement one such design, using the decaying buckets
added in the previous commit to calculate a probability of payment
success based on a weighted average of recent liquidity estimates
for a channel.
For each min/max liquidity bucket pair (where the min liquidity is
less than the max liquidity), we can calculate the probability that
a payment succeeds using our traditional `amount / capacity`
formula. From there, we weigh the probability by the number of
points in each bucket pair, calculating a total probability for the
payment, and assigning a penalty using the same log-probability
calculation used for the non-historical penalties.
Matt Corallo [Tue, 19 Jul 2022 22:37:16 +0000 (22:37 +0000)]
Track history of where channel liquidities have been in the past
This introduces two new fields to the `ChannelLiquidity` struct -
`min_liquidity_offset_history` and `max_liquidity_offset_history`,
both an array of 8 `u16`s. Each entry represents the proportion of
time that we spent with the min or max liquidity offset in the
given 1/8th of the channel's liquidity range. ie the first bucket
in `min_liquidity_offset_history` represents the proportion of time
we've thought the channel's minimum liquidity is lower than 1/8th's
the channel's capacity.
Each bucket is stored, effectively, as a fixed-point number with
5 bits for the fractional part, which is incremented by one (ie 32)
each time we update our liquidity estimates and decide our
estimates are in that bucket. We then decay each bucket by
2047/2048.
Thus, memory of a payment sticks around for more than 8,000
data points, though the majority of that memory decays after 1,387
data points.
Matt Corallo [Wed, 5 Oct 2022 02:11:00 +0000 (02:11 +0000)]
Correct the directionality of liquidity non-update messages
When we log liquidity updates, if we decline to update anything as
the new bounds are already within the old bounds, the
directionality of the log entries was reversed.
Matt Corallo [Thu, 6 Oct 2022 15:42:41 +0000 (15:42 +0000)]
Correct `rapid-gossip-sync` `no-std` build and test
While `rapid-gossip-sync` recently gained a `no-std` feature, it
didn't actually work, as there were still dangling references to
`std` and prelude assumptions. This makes `rapid-gossip-sync`
build (and test) properly in `no-std`.
Matt Corallo [Tue, 16 Aug 2022 21:58:06 +0000 (21:58 +0000)]
Add a TODO for an important issue for making async mon updates safe
If we receive a monitor event from a forwarded-to channel which
contains a preimage for an HTLC, we have to propogate that preimage
back to the forwarded-from channel monitor. However, once we have
that update, we're running in a relatively unsafe state - we have
the preimage in memory, but if we were to crash the forwarded-to
channel monitor will not regenerate the update with the preimage
for us. If we haven't managed to write the monitor update to the
forwarded-from channel by that point, we've lost the preimage, and,
thus, money!
Matt Corallo [Mon, 18 Jul 2022 01:32:27 +0000 (01:32 +0000)]
Rework `chain::Watch` return types to make async updates less scary
When a `chain::Watch` `ChannelMonitor` update method is called, the
user has three options:
(a) persist the monitor update immediately and return success,
(b) fail to persist the monitor update immediately and return
failure,
(c) return a flag indicating the monitor update is in progress and
will complete in the future.
(c) is rather harmless, and in some deployments should be expected
to be the return value for all monitor update calls, but currently
requires returning `Err(ChannelMonitorUpdateErr::TemporaryFailure)`
which isn't very descriptive and sounds scarier than it is.
Instead, here, we change the return type used to be a single enum
(rather than a Result) and rename `TemporaryFailure`
`UpdateInProgress`.
Matt Corallo [Fri, 23 Sep 2022 21:08:26 +0000 (21:08 +0000)]
Add a bindings-only version of `Future::register_callback`
While we could, in theory, add support to the bindings logic to map
`Box<dyn Trait>`, there isn't a whole lot of use doing so when its
incredibly trivial to do directly.
This adds a trivial wrapper around `Future::register_callback` that
is only built in bindings and which is linked in the
`register_callback` docs for visibility.
Matt Corallo [Thu, 22 Sep 2022 14:07:25 +0000 (14:07 +0000)]
Downgrade `hashbrown` to meet MSRV
`hashbrown` depends on `ahash` which depends on `once_cell`. Sadly,
in https://github.com/matklad/once_cell/issues/201 the `once_cell`
maintainer decided they didn't want to do the work of having an
MSRV policy for `once_cell`, making `ahash`, and thus `hashbrown`
require the latest compiler. I've reached out to `ahash` to suggest
they drop the dependency (as they could trivially work around not
having it), but until then we simply downgrade `hashbrown`.
`rust-bitcoin` also requires an older `hashbrown` so we're actually
reducing our total `no-std` code here anyway.
Matt Corallo [Sun, 18 Sep 2022 13:55:08 +0000 (13:55 +0000)]
Avoid returning a reference to a u64.
In c353c3ed7c40e689a3b9fb6730c6dabbd3c92cc5 an accessor method was
added which returns an `Option<&u64>`. While this allows Rust to
return an 8-byte object, returning a reference to something
pointer-sized is a somewhat strange API.
Instead, we opt for a straight `Option<u64>`, which is sadly
somewhat larger on the stack, but is simpler and already supported
in the bindings generation.