Matt Corallo [Sun, 3 Oct 2021 21:44:52 +0000 (21:44 +0000)]
Wake reader future when we fail to flush socket buffer
This avoids any extra calls to `read_event` after a write fails to
flush the write buffer fully, as is required by the PeerManager
API (though it isn't critical).
Matt Corallo [Wed, 6 Oct 2021 04:45:07 +0000 (04:45 +0000)]
Limit blocked PeerManager::process_events waiters to two
Only one instance of PeerManager::process_events can run at a time,
and each run always finishes all available work before returning.
Thus, having several threads blocked on the process_events lock
doesn't accomplish anything but blocking more threads.
Here we limit the number of blocked calls on process_events to two
- one processing events and one blocked at the top which will
process all available events after the first completes.
Matt Corallo [Sat, 25 Sep 2021 22:24:23 +0000 (22:24 +0000)]
Avoid taking the peers write lock during event processing
Because the peers write lock "blocks the world", and happens after
each read event, always taking the write lock has pretty severe
impacts on parallelism. Instead, here, we only take the global
write lock if we have to disconnect a peer.
Matt Corallo [Wed, 6 Oct 2021 04:29:19 +0000 (04:29 +0000)]
[net-tokio] Call PeerManager::process_events without blocking reads
Unlike very ancient versions of lightning-net-tokio, this does not
rely on a single global process_events future, but instead has one
per connection. This could still cause significant contention, so
we'll ensure only two process_events calls can exist at once in
the next few commits.
Matt Corallo [Wed, 6 Oct 2021 06:58:15 +0000 (06:58 +0000)]
Process messages with only the top-level read lock held
Users are required to only ever call `read_event` serially
per-peer, thus we actually don't need any locks while we're
processing messages - we can only be processing messages in one
thread per-peer.
That said, we do need to ensure that another thread doesn't
disconnect the peer we're processing messages for, as that could
result in a peer_disconencted call while we're processing a
message for the same peer - somewhat nonsensical.
This significantly improves parallelism especially during gossip
processing as it avoids waiting on the entire set of individual
peer locks to forward a gossip message while several other threads
are validating gossip messages with their individual peer locks
held.
Matt Corallo [Fri, 30 Jul 2021 18:03:28 +0000 (18:03 +0000)]
Process messages from peers in parallel in `PeerManager`.
This adds the required locking to process messages from different
peers simultaneously in `PeerManager`. Note that channel messages
are still processed under a global lock in `ChannelManager`, and
most work is still processed under a global lock in gossip message
handling, but parallelizing message deserialization and message
decryption is somewhat helpful.
Set `ChannelUpdate` `htlc_maximum_msat` using the peer's value
Use the `counterparty_max_htlc_value_in_flight_msat` value, and not the
`holder_max_htlc_value_in_flight_msat` value when creating the
`htlc_maximum_msat` value for `ChannelUpdate` messages.
BOLT 7 specifies that the field MUST be less than or equal to
`max_htlc_value_in_flight_msat` received from the peer, which we
currently are not guaranteed to adhere to by using the holder value.
Add a config field
`ChannelHandshakeConfig::max_inbound_htlc_value_in_flight_percent_of_channel`
which sets the percentage of the channel value we cap the total value of
outstanding inbound HTLCs to.
This field can be set to a value between 1-100, where the value
corresponds to the percent of the channel value in whole percentages.
Note that:
* If configured to another value than the default value 10, any new
channels created with the non default value will cause versions of LDK
prior to 0.0.104 to refuse to read the `ChannelManager`.
* This caps the total value for inbound HTLCs in-flight only, and
there's currently no way to configure the cap for the total value of
outbound HTLCs in-flight.
* The requirements for your node being online to ensure the safety of
HTLC-encumbered funds are different from the non-HTLC-encumbered funds.
This makes this an important knob to restrict exposure to loss due to
being offline for too long. See
`ChannelHandshakeConfig::our_to_self_delay` and
`ChannelConfig::cltv_expiry_delta` for more information.
Default value: 10.
Minimum value: 1, any values less than 1 will be treated as 1 instead.
Maximum value: 100, any values larger than 100 will be treated as 100
instead.
Matt Corallo [Thu, 28 Apr 2022 17:10:04 +0000 (10:10 -0700)]
Avoid storing a full FinalOnionHopData in OnionPayload::Invoice
We only use it to check the amount when processing MPP parts, but
store the full object (including new payment metadata) in it.
Because we now store the amount in the parent structure, there is
no need for it at all in the `OnionPayload`. Sadly, for
serialization compatibility, we need it to continue to exist, at
least temporarily, but we can avoid populating the new fields in
that case.
Matt Corallo [Tue, 21 Dec 2021 22:10:43 +0000 (22:10 +0000)]
Store total payment amount in ClaimableHTLC explicitly
...instead of accessing it via the `OnionPayload::Invoice` form.
This may be useful if we add MPP keysend support, but is directly
useful to allow us to drop `FinalOnionHopData` from `OnionPayload`.
Matt Corallo [Thu, 28 Apr 2022 19:46:13 +0000 (19:46 +0000)]
Reject channels if the total reserves are larger than the funding
The `full_stack_target` fuzzer managed to find a subtraction
underflow in the new `Channel::get_htlc_maximum` function where we
subtract both sides' reserve values from the channel funding. Such
a channel is obviously completely useless, so we should reject it
during opening instead of integer-underflowing later.
Thanks to Chaincode Labs for providing the fuzzing resources which
found this bug!
Matt Corallo [Mon, 25 Apr 2022 22:51:02 +0000 (22:51 +0000)]
Add test coverage for failure of inconsistent MPP parts
When we receive multiple HTLCs which claim to be a part of the same
MPP but which are inconsistent for some reason, we should fail the
inconsistent HTLCs but keep the first HTLCs up until the first
inconsistency.
This works, but it turns out there was no test coverage, so we add
some here.
Matt Corallo [Tue, 26 Apr 2022 15:03:39 +0000 (15:03 +0000)]
Expand `chain::Listen` trivially to accept filtered block data
The `chain::Listen` interface provides a block-connection-based
alternative to the `chain::Confirm` interface, which supports
providing transaction data at a time separate from the block
connection time.
For users who are downloading the full headers tree (e.g. from a
node over the Bitcoin P2P protocol) but who are not downloading
full blocks (e.g. because they're using BIP 157/158 filtering)
there is no API that matches exactly their event stream -
`chain::Listen` requries full blocks for each block,
`chain::Confirm` requires breaking each connection event into two
calls.
Given its incredibly trivial to take a `TransactionData` in
addition to a `Block` in `chain::Listen` we do so here, adding a
default-implementation `block_connected` which simply creates the
`TransactionData`, which ultimately all of the `chain::Listen`
implementations currently do anyway.
Matt Corallo [Thu, 21 Apr 2022 02:30:16 +0000 (02:30 +0000)]
Reorder the BP loop to make manager persistence more reliable
The main loop of the background processor has this line:
`peer_manager.process_events(); // Note that this may block on ChannelManager's locking`
which does, indeed, sometimes block waiting on the `ChannelManager`
to finish whatever its doing. Specifically, its the only place in
the background processor loop that we block waiting on the
`ChannelManager`, so if the `ChannelManager` is relatively busy, we
may end up being blocked there most of the time.
This should be fine, except today we had a user who's node was
particularly slow in processing some channel updates, resulting in
the background processor being blocked there (as expected). Then,
when the channel updates were completed (and persisted) the next
thing the background processor did was hand the user events to
process, creating yet more channel updates. Ultimately, the users'
node crashed before finishing the event processing. This left us
with an updated monitor on disk and an outdated manager, and they
lost the channel on startup.
Here we simply move the above quoted line to after the normal event
processing, ensuring the next thing we do after blocking on
`ChannelManager` locks is persist the manager, prior to event
handling.
MAX_FUNDING_SATOSHIS will no longer be accurately named once wumbo is merged.
Also, we'll want to check that wumbo channels don't exceed the total bitcoin supply
Matt Corallo [Sat, 16 Apr 2022 20:07:34 +0000 (20:07 +0000)]
Separate `ChannelDetails`' outbound capacity from the next HTLC max
`ChannelDetails::outbound_capacity_msat` describes the total amount
available for sending across several HTLCs, basically just our
balance minus the reserve value maintained by our counterparty.
However, when routing we use it to guess the maximum amount we can
send in a single additional HTLC, which it is not.
There are numerous reasons why our balance may not match the amount
we can send in a single HTLC, whether the HTLC in-flight limit, the
channe's HTLC maximum, or our feerate buffer.
This commit splits the `outbound_capacity_msat` field into two -
`outbound_capacity_msat` and `outbound_htlc_limit_msat`, setting us
up for correctly handling our next-HTLC-limit in the future.
This also addresses the first of the reasons why the values may
not match - the max-in-flight limit. The inaccuracy is ultimately
tracked as #1126.
Default to creating BOLT4 tlv payload format onions
Default to creating tlv onions for nodes for which we haven't received
any features through node announcements or which aren't in the
`network_graph`, and where no other features are known such as invoice
features nor features in the init msg for nodes we have channels to.
Matt Corallo [Sun, 3 Apr 2022 21:21:54 +0000 (21:21 +0000)]
Lower-bound the log approximation and stop using it > ~98.5%
When we start getting a numerator and divisor particularly close to
each other, the log approximation starts to get very noisy. In
order to avoid applying scores that are basically noise (and can
range upwards of 2x the default per-hop penalty), simply consider
such cases as having a success probability of 100%.
Matt Corallo [Sun, 3 Apr 2022 20:16:03 +0000 (20:16 +0000)]
Expand the precision of our log10 lookup tables + add precision
When we send values over channels of rather substantial size, the
imprecision of our log lookup tables creates a rather substantial
non-linearity between values that round up or down one bit.
For example, with the default scoring values, sending 100k sats
over channels with 1m, 2m, 3m, and 4m sats of capacity score
rather drastically differently: 3645, 2512, 500, and 1442 msat.
Here we expand the precision of our log lookup tables rather
substantially by: (a) making the multiplier 2048 instead of 1024,
which still fits inside a u16, and (b) quadrupling the size of the
lookup table to look at the top 6 bits after the most-significant
bit of an input instead of the top 4.
This makes the scores of the same channels substantially more
linear, with values of 3613, 1977, 1474, and 1223 msat.
The same channels would be scored at 3611, 1972, 1464, and 1216
msat with a non-approximating scorer.
Matt Corallo [Mon, 4 Apr 2022 02:51:22 +0000 (02:51 +0000)]
Move lightning-invoice deser errors to lib.rs instead of `pub use`
Having public types in a private module is somewhat awkward from a
readability standpoint, but, more importantly, the bindings logic
has a relatively rough go of converting them - it doesn't implement
`pub use` as its "implement this function" logic is all within the
context of a module. We'd need to keep a set of re-exported things
to implement them when parsing modules...or we could just move two
enums from `de.rs` to `lib.rs` here, which is substantially less
work.
Jeffrey Czyz [Mon, 14 Feb 2022 21:31:59 +0000 (15:31 -0600)]
Immutable BlockSource interface
Querying a BlockSource is a logically immutable operation. Use non-mut
references in its interface to reflect this, which allows for users to
hold multiple references if desired.
Matt Corallo [Sun, 3 Apr 2022 01:04:26 +0000 (01:04 +0000)]
Pipe filesystem writes in `lightning-persister` through `BufWriter`
We generally make no effort to ensure all writes are buffered in
lower-level objects, so wrapping write calls in `BufWriter` may
substantially improve performance in some cases. This is especially
important now that we block the sample node exit until the
`NetworkGraph` has been written out, which includes many small-ish
writes.
With this change, shutdown of the sample node on a relatively
underpowered device went from 15-30 seconds of CPU time to a second
or two, plus IO sync time.
Jeffrey Czyz [Thu, 31 Mar 2022 13:13:10 +0000 (08:13 -0500)]
Add an amount penalty to ProbabilisticScorer
The cost of large payments tends to be dominated by the channel fees. To
avoid this, add an amount penalty to ProbabilisticScorer with a user
configurable multiplier. The multiplier is applied for every 2^20th of
the amount weighted by the negative log10 of the channel's success
probability for the payment.
Jeffrey Czyz [Thu, 31 Mar 2022 02:20:58 +0000 (21:20 -0500)]
Avoid retrying over recently failed channels
In ProbabilisticScorer, the channel liquidity balance is reduced
whenever a payment fails at the corresponding channel. The payment may
still be retried through the channel, however, because the liquidity
penalty is capped. Use u64::max_value instead in this situation to avoid
retrying over the same path. This effectively makes u64::max_value the
penalty for amounts exceeding the upper bound, as well.
As an edge case, avoid using u64::max_value on attempts where the amount
is equal to the effective capacity, which may be the HTLC maximum when
the channel capacity is unknown.
Matt Corallo [Thu, 24 Mar 2022 18:38:43 +0000 (18:38 +0000)]
Don't consider a path as having hit HTLC-min if it isn't sufficient
During the first pass of path finding, we seek a single path with the
exact payment amount, and only seek additional paths if (a) no single
path can carry the entire balance of the payment or (b) we found a good
path, but along the way we found candidate paths with the potential to
result in a lower total fee. This commit fixes the behavior of (b) -- we
were previously considering some paths to be candidates for a lower fee
when in fact they never would have worked. This caused us to re-run
Dijkstra's when it might not have been beneficial.
Jurvis Tan [Tue, 22 Mar 2022 03:13:14 +0000 (20:13 -0700)]
Add NetworkGraph persistence
Instead of creating a separate trait for persisting NetworkGraph, use and rename the existing ChannelManagerPersister to handle them both. persist_graph is then called on removal of stale channels and on exit.
Matt Corallo [Tue, 15 Mar 2022 23:53:01 +0000 (23:53 +0000)]
Drop the `Writeable::encode_with_len` method in non-test buidls
There's not a lot of reason to keep it given its used in one place
outside of tests, and this lets us clean up some of the byte_utils
calls that are still lying around.
Matt Corallo [Tue, 8 Mar 2022 21:55:02 +0000 (21:55 +0000)]
Use the correct SCID when failing HTLCs to aliased channels
When we fail an HTLC which was destined for a channel that the HTLC
sender didn't know the real SCID for, we should ensure we continue
to use the alias in the channel_update we provide them. Otherwise
we will leak the channel's real SCID to HTLC senders.