Matt Corallo [Fri, 8 Dec 2023 19:43:17 +0000 (19:43 +0000)]
Consolidate `candidate` access in `add_entry` during routing
Because fetching fields from the `$candidate` often implies an
indirect read, grouping them together may result in one or two
fewer memory loads, so we do so here.
Matt Corallo [Fri, 8 Dec 2023 05:44:32 +0000 (05:44 +0000)]
Somewhat optimize the generic `Features::requires_unknown_bits`
It turns out we spend several percent of our routefinding time just
checking if nodes and channels require unknown features
byte-by-byte. While the cost is almost certainly dominated by the
memory read latency, avoiding doing the checks byte-by-byte should
reduce the branch count slightly, which may reduce the overhead.
Matt Corallo [Fri, 8 Dec 2023 01:49:20 +0000 (01:49 +0000)]
Store source/target `node_counter`s in `DirectionalChannelInfo`
Because we now have some slack space in `PathBuildingHop`, we can
use it to cache some additional hot values. Here we use it to
cache the source and target `node_counter`s for public channels,
effectively prefetching the values from the channel state.
Matt Corallo [Fri, 8 Dec 2023 01:49:08 +0000 (01:49 +0000)]
DRY redundant calls to `$candidate.htlc_minimum_msat()` in routing
While LLVM should inline and elide the redundant calls, because the
router is rather large LLVM can decide against inlining in some
cases where it would be an nice win.
Thus, its worth DRY'ing the redundant calls explicitly.
Matt Corallo [Sun, 10 Dec 2023 03:28:37 +0000 (03:28 +0000)]
Cache whether a node is a first-hop target in the per-node state
When processing the main loop during routefinding, for each node,
we check whether it happens to be our peer in one of our channels.
This ensures we never fail to find a route that takes a hop through
a private channel of ours, to a private node, then through
invoice-provided route hints to reach the ultimate payee.
Because this is incredibly hot code, doing a full `HashMap` lookup
to check if each node is a first-hop target ends up eating a good
chunk of time during routing. Luckily, we can trivially avoid this
cost.
Because we're already looking up the per-node state in the `dist`
map, we can store a bool in each first-hop target's state, avoiding
the lookup unless we know its going to succeed.
This requires storing a dummy entry in `dist`, which feels somewhat
strange, but is ultimately fine as we should never be looking at
per-node state unless we've already found a path to that node,
updating the fields in doign so.
Matt Corallo [Wed, 10 Jul 2024 01:22:32 +0000 (01:22 +0000)]
Move blinded path introduction point resolution to a helper method
This marginally reduces the size of `get_route` by moving a the
blinded path introduction point resolution and blinded path checks
into a helper method.
Matt Corallo [Thu, 7 Dec 2023 23:40:26 +0000 (23:40 +0000)]
Align `PathBuildingHop` to 128b, now that we store them in a `Vec`
Now that `PathBuildingHop` is stored in a `Vec` (as `Option`s),
rather than `HashMap` entries, they can grow to fill a full two
cache lines without a memory access performance cost. In the next
commit we'll take advantage of this somewhat, but here we update
the assertions and drop the `repr(C)`, allowing rust to lay the
memory out as it wishes.
Matt Corallo [Mon, 24 Jun 2024 23:50:29 +0000 (23:50 +0000)]
Drop the `dist` `HashMap` in routing, replacing it with a `Vec`.
Now that we have unique, dense, 32-bit identifiers for all the
nodes in our network graph, we can store the per-node information
when routing in a simple `Vec` rather than a `HashMap`. This avoids
the overhead of hashing and table scanning entirely, for a nice
"simple" optimization win.
Matt Corallo [Sat, 1 Jun 2024 22:45:04 +0000 (22:45 +0000)]
Use `NodeCounters` `NodeId`s as the blinded path intro references
The router's `introduction_node_id_cache` is used to cache the
`NodeId`s of blinded path introduction points so that we don't
have to look them up every time we go around the main router loop.
When using it, if the introduction point isn't a public node we
then look up the introduction in our first-hops map.
In either case, we have to end up with a reference to a `NodeId`
that outlives our `dist` map.
Here we consolidate both the initial cache building and the
first-hops map lookup to one place, storing only a reference to a
`NodeId` either in the `NetworkGraph` or in the new `NodeCounters`
to get the required lifetime without needing to reference into the
first-hops map.
We then take this opportunity to avoid `clone`ing the first-hops
map entries as we now no longer reference into it.
Matt Corallo [Sat, 1 Jun 2024 18:19:45 +0000 (18:19 +0000)]
Drop `private_hop_key_cache` in favor of `NodeCounters`
With the new `NodeCounters` have have a all the `NodeId`s we'll
need during routing, so there's no need to keep the
`private_hop_key_cache` which existed to provide references to
`NodeId`s which are needed during routing.
Matt Corallo [Tue, 19 Mar 2024 19:29:06 +0000 (19:29 +0000)]
Add a new `NodeCounters` utility to track counters when routing
In the next commit we'll stop using `NodeId`s to look up nodes when
routing, instead using the new per-node counters. Here we take the
first step, adding a local struct which tracks temporary counters
for route hints/source/destination nodes.
Because we must ensure we have a 1-to-1 mapping from node ids to
`node_counter`s, even across first-hop and last-hop hints, we have
to be careful to check the network graph first, then a new
`private_node_id_to_node_counter` map to ensure we only ever end up
with one counter per node id.
shaavan [Sat, 15 Jun 2024 13:52:56 +0000 (19:22 +0530)]
Allow create_blinded_paths functions to accept MessageContext as input field
- Enabled `create_blinded_paths` to accept `MessageContext` TLVs as
an input field.
- `MessageContext` is intended to be sent along with the `reply_path`
to the counterparty.
- Added `MessageContext` in the `create_blinded_paths` flow, optionally
appending it within the `reply_path`.
- Updated tests to verify the new feature.
shaavan [Sat, 15 Jun 2024 13:52:28 +0000 (19:22 +0530)]
Update handle_message to accept OffersContext data as an input field
1. Handling Offers Data:
- Updated `handle_message` to accept `OffersContext` data as an input field.
- If it is present, it will be utilized by the handler to
abandon outbound payments that have failed for any reason.
2. Consistency in Custom Message Handling:
- Updated `handle_custom_message` to accept optional custom data.
for consistency.
- Note: `custom_data` will remain unused in this PR.
shaavan [Fri, 14 Jun 2024 12:36:05 +0000 (18:06 +0530)]
Introduce MessageContext in ReceiveTlvs
1. New Enum for Enhanced Data Handling:
- Introduced the `MessageContext` enum, which allows the node to include
additional context data along with the `reply_path` sent to the counterparty.
- The node anticipates receiving this data back for further processing.
2. Variants in MessageContext:
- The `MessageContext` enum includes two variants: "Offers" and
"Context"
- One of the variants, `Offers`, holds the `payment_id`
of the associated Outbound BOLT12 Payment.
3. Future Usage:
- This enum will be utilized in a subsequent commit to abandon outbound
payments that have failed to complete for various reasons.
Matt Corallo [Thu, 13 Jun 2024 00:29:01 +0000 (00:29 +0000)]
Block monitor updates to ensure preimages are in each MPP part
If we claim an MPP payment and only persist some of the
`ChannelMonitorUpdate`s which include the preimage prior to
shutting down, we may be in a state where some of our
`ChannelMonitor`s have the preimage for a payment while others do
not.
This, it turns out, is actually mostly safe - on startup
`ChanelManager` will detect there's a payment it has as unclaimed
but there's a corresponding payment preimage in a `ChannelMonitor`
and go claim the other MPP parts. This works so long as the
`ChannelManager` has been persisted after the payment has been
received but prior to the `PaymentClaimable` event being processed
(and the claim itself occurring). This is not always true and
certainly not required on our API, but our
`lightning-background-processor` today does persist prior to event
handling so is generally true subject to some race conditions.
In order to address this we need to use copy payment preimages
across channels irrespective of the `ChannelManager`'s payment
state, but this introduces another wrinkle - if one channel makes
substantial progress while other channel(s) are still waiting to
get the payment preimage in `ChannelMonitor`(s) while the
`ChannelManager` hasn't been persisted after the payment was
received, we may end up without the preimage on disk.
Here, we address this issue by using the new
`RAAMonitorUpdateBlockingAction` variant for this specific case. We
block persistence of an RAA `ChannelMonitorUpdate` which may remove
the preimage from disk until all channels have had the preimage
added to their `ChannelMonitor`.
We do this only in-memory (and not on disk) as we can recreate this
blocker during the startup re-claim logic.
This will enable us to claim MPP parts without using the
`ChannelManager`'s payment state in later work.
Matt Corallo [Tue, 14 May 2024 00:03:39 +0000 (00:03 +0000)]
Add a `RAAMonitorUpdateBlockingAction::ClaimedMPPPayment`
If we claim an MPP payment and only persist some of the
`ChannelMonitorUpdate`s which include the preimage prior to
shutting down, we may be in a state where some of our
`ChannelMonitor`s have the preimage for a payment while others do
not.
This, it turns out, is actually mostly safe - on startup
`ChanelManager` will detect there's a payment it has as unclaimed
but there's a corresponding payment preimage in a `ChannelMonitor`
and go claim the other MPP parts. This works so long as the
`ChannelManager` has been persisted after the payment has been
received but prior to the `PaymentClaimable` event being processed
(and the claim itself occurring). This is not always true and
certainly not required on our API, but our
`lightning-background-processor` today does persist prior to event
handling so is generally true subject to some race conditions.
In order to address this race we need to use copy payment preimages
across channels irrespective of the `ChannelManager`'s payment
state, but this introduces another wrinkle - if one channel makes
substantial progress while other channel(s) are still waiting to
get the payment preimage in `ChannelMonitor`(s) while the
`ChannelManager` hasn't been persisted after the payment was
received, we may end up without the preimage on disk.
Here, we address this issue with a new
`RAAMonitorUpdateBlockingAction` variant for this specific case. We
block persistence of an RAA `ChannelMonitorUpdate` which may remove
the preimage from disk until all channels have had the preimage
added to their `ChannelMonitor`.
We do this only in-memory (and not on disk) as we can recreate this
blocker during the startup re-claim logic.
This will enable us to claim MPP parts without using the
`ChannelManager`'s payment state in later work.
Matt Corallo [Wed, 12 Jun 2024 20:49:07 +0000 (20:49 +0000)]
Track the cp `node_id` which originated an HTLC in the `HTLCSource`
Because we track pending `ChannelMonitorUpdate`s per-peer, we
really need to know what peer an HTLC came from when we go to claim
it on-chain, allowing us to look up any blocked actions in the
`PeerState`. While we could do this by moving the blocked actions
to some global "closed-channel update queue", its simpler to do it
this way.
While this trades off `ChannelMonitorUpdate` size somewhat (the
`HTLCSource` is included in many of them), which we should be
sensitive to, it will also allow us to (eventually) remove the
SCID -> peer + channel_id lookups we do when claiming or failing
today, which are somewhat annoying to deal with.
Matt Corallo [Tue, 18 Jun 2024 17:24:21 +0000 (17:24 +0000)]
Add skipping variants to `impl_writeable_tlv_based_enum_upgradable`
In some cases, we have variants of an enum serialized using
`impl_writeable_tlv_based_enum_upgradable` which we don't want to
write/read. Here we add support for such variants by writing them
using the (odd) type 255 without any contents and using
`MaybeReadable` to decline to read them.
Duncan Dean [Thu, 4 Jul 2024 13:48:01 +0000 (15:48 +0200)]
Skip incremental-mutants job for main
There is no diff when running against main, so we skip incremental-mutants.
Before this commit, we'd get `Error: Failed to parse diff: Line 1: Error while parsing:`
as the diff file was empty.
Matt Corallo [Tue, 25 Jun 2024 16:20:59 +0000 (16:20 +0000)]
(Re-)add handling for `ChannelUpdate::message_flags`
When the `htlc_maximum_msat` field was made mandatory in
`ChannelUpdate` (in b0e8b739b73cc25f3e1ab00695a14d972162a140) we
started ignoring the `message_flags` field entirely and always
writing `1`. The comment updates indicated that the `message_flags`
field was deprecated, but this is not true - only the
`htlc_maximum_msat` indicator bit was deprecated, requiring it to
be 1.
If a node creates a `channel_update` with `message_flags` bits set
other than the low bit, this will cause us to spuriously reject
the message with an invalid signature error as we will check the
message against the wrong hash.
With today's current spec this is totally fine - the only other bit
defined for `message_flags` is the `dont_forward` bit, which when
set indicates we shouldn't accept the message into our gossip store
anyway (though this may lead to spurious `warning` messages being
sent to peers). However, in the future this may mean we start
rejecting valid `channel_update`s if new bits are defiend.
Alec Chen [Tue, 18 Jun 2024 18:01:59 +0000 (11:01 -0700)]
Async signing test util follow ups
- Remove unused unavailable signers in TestKeysInterface
- Remove unnecessary display implementation for SignerOp
- Move set of test disabled signer ops to EnforcementState
- Add method to enable/disable all signer ops at once
Duncan Dean [Thu, 30 Nov 2023 11:31:24 +0000 (13:31 +0200)]
Introduce basic incremental mutation testing
We introduce a CI job for mutation testing of PR diffs using cargo-mutants.
Missed cases do not trigger a fail of this job yet as we just introduce it
now for visibility. We may start enforcing stricter rules at a later stage.
Matt Corallo [Wed, 26 Jun 2024 14:16:50 +0000 (14:16 +0000)]
Handle feerates of `u32::MAX` without overflowing
Though we generally shouldn't be seeing these, and the
`get_dust_buffer_feerate` implementation will still return
`u32::MAX` in spite of the overflow, we should handle the overflow
to avoid panic when `debug_assertions` are enabled.
AsyncPaymentsMessageHandler trait for OnionMessenger
Add a trait for handling async payments messages to OnionMessenger. This allows
users to either provide their own custom handling for async payments messages
or rely on a version provided by LDK.
Willem Van Lint [Tue, 11 Jun 2024 23:05:57 +0000 (16:05 -0700)]
Implement non-strict forwarding
This change implements non-strict forwarding, allowing the node to
forward an HTLC along a channel other than the one specified
by short_channel_id in the onion message, so long as the receiver has
the same node public key intended by short_channel_id
([BOLT](https://github.com/lightning/bolts/blob/57ce4b1e05c996fa649f00dc13521f6d496a288f/04-onion-routing.md#non-strict-forwarding)).
This can improve payment reliability when there are multiple channels
with the same peer e.g. when outbound liquidity is replenished by
opening a new channel.
The implemented forwarding strategy now chooses the channel with the
lowest outbound liquidity that can forward an HTLC to maximize the
probability of being able to successfully forward a subsequent HTLC.
Matt Corallo [Thu, 7 Dec 2023 04:32:06 +0000 (04:32 +0000)]
Store the source and destination `node_counter`s in `ChannelInfo`
In the next commit, we'll use the new `node_counter`s to remove a
`HashMap` from the router, using a `Vec` to store all our per-node
information. In order to make finding entries in that `Vec` cheap,
here we store the source and destintaion `node_counter`s in
`ChannelInfo`, givind us the counters for both ends of a channel
without doing a second `HashMap` lookup.
Matt Corallo [Thu, 7 Dec 2023 04:12:04 +0000 (04:12 +0000)]
Track a `counter` for each node in our network graph
These counters are simply a unique number describing each node.
They have no specific meaning, but start at 0 and count up, with
counters being reused after a node has been deleted.
G8XSU [Fri, 14 Jun 2024 23:56:36 +0000 (16:56 -0700)]
Optimize ChannelMonitor persistence on block connections.
Currently, every block connection triggers the persistence of all
ChannelMonitors with an updated best_block. This approach poses
challenges for large node operators managing thousands of channels.
Furthermore, it leads to a thundering herd problem
(https://en.wikipedia.org/wiki/Thundering_herd_problem), overwhelming
the storage with simultaneous requests.
To address this issue, we now persist ChannelMonitors at a
regular cadence, spreading their persistence across blocks to
mitigate spikes in write operations.
Outcome: After doing this, Ldk's IO footprint should be reduced
by ~50 times. The processing time required to sync each block
will be significantly reduced, particularly for nodes with 1000s
of channels, as write latency plays a significant role in this process.
As a result, the Node/ChainMonitor will be blocked for a shorter
duration, leading to further efficiency gains.
Jeffrey Czyz [Mon, 17 Jun 2024 19:41:32 +0000 (14:41 -0500)]
Relax channel count check for unannounced nodes
When creating blinded paths, introduction nodes are limited to peers
with at least three channels to prevent easily guessing the recipient.
Relax this check when the recipient is unannounced since they won't be
in the NetworkGraph.
Jeffrey Czyz [Thu, 13 Jun 2024 22:15:19 +0000 (17:15 -0500)]
Advance self blinded payment paths
DefaultRouter will ignore blinded paths where the sender is the
introduction node. Similar to message paths, advance such paths by one
so that payments may be sent to them.
Jeffrey Czyz [Fri, 14 Jun 2024 21:57:41 +0000 (16:57 -0500)]
Add advance_path_by_one for blinded payment paths
Similar to blinded paths for onion messages, if given a blinded payment
path where we are the introduction node, the path must be advanced by
one in order to use it.
Jeffrey Czyz [Fri, 14 Jun 2024 14:04:47 +0000 (09:04 -0500)]
Don't modify path when advance_path_by_one errors
When using advance_path_by_one when we are the introduction node, any
error will result having the first hop of the input blinded path
removed. Instead, only remove the first hop on success. Otherwise, the
path will be invalid.
Jeffrey Czyz [Wed, 12 Jun 2024 21:54:19 +0000 (16:54 -0500)]
Blinded payments to unannounced introduction node
When creating blinded paths for receiving onion payments, allow using
the recipient's only peer as the introduction node when the recipient is
unannounced. This allows for sending payments without going through an
intermediary, which is useful for testing or when only connected to an
LSP with an empty NetworkGraph.
Jeffrey Czyz [Wed, 12 Jun 2024 15:58:18 +0000 (10:58 -0500)]
BlindedPath with unannounced introduction node
When creating blinded paths for receiving onion messages, allow using
the recipient's only peer as the introduction node when the recipient is
unannounced. This allows for sending messages without going through an
intermediary, which is useful for testing or when only connected to an
LSP with an empty NetworkGraph.