I recently saw the following panic on one of my test nodes:
```
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()`
on an `Err` value: Os { code: 107, kind: NotConnected, message:
"Transport endpoint is not connected" }',
rust-lightning/lightning-net-tokio/src/lib.rs:250:38
```
Presumably what happened is somehow the connection was closed in
between us accepting it and us going to start processing it. While
this is a somewhat surprising race, its clearly reachable. The fix
proposed here is quite trivial - simply don't `unwrap` trying to
fetch our peer's socket address, instead treat the peer address as
`None` and discover the disconnection later when we go to read.
Default to creating BOLT4 tlv payload format onions
Default to creating tlv onions for nodes for which we haven't received
any features through node announcements or which aren't in the
`network_graph`, and where no other features are known such as invoice
features nor features in the init msg for nodes we have channels to.
Matt Corallo [Sun, 3 Apr 2022 21:21:54 +0000 (21:21 +0000)]
Lower-bound the log approximation and stop using it > ~98.5%
When we start getting a numerator and divisor particularly close to
each other, the log approximation starts to get very noisy. In
order to avoid applying scores that are basically noise (and can
range upwards of 2x the default per-hop penalty), simply consider
such cases as having a success probability of 100%.
Matt Corallo [Sun, 3 Apr 2022 20:16:03 +0000 (20:16 +0000)]
Expand the precision of our log10 lookup tables + add precision
When we send values over channels of rather substantial size, the
imprecision of our log lookup tables creates a rather substantial
non-linearity between values that round up or down one bit.
For example, with the default scoring values, sending 100k sats
over channels with 1m, 2m, 3m, and 4m sats of capacity score
rather drastically differently: 3645, 2512, 500, and 1442 msat.
Here we expand the precision of our log lookup tables rather
substantially by: (a) making the multiplier 2048 instead of 1024,
which still fits inside a u16, and (b) quadrupling the size of the
lookup table to look at the top 6 bits after the most-significant
bit of an input instead of the top 4.
This makes the scores of the same channels substantially more
linear, with values of 3613, 1977, 1474, and 1223 msat.
The same channels would be scored at 3611, 1972, 1464, and 1216
msat with a non-approximating scorer.
Matt Corallo [Mon, 4 Apr 2022 02:51:22 +0000 (02:51 +0000)]
Move lightning-invoice deser errors to lib.rs instead of `pub use`
Having public types in a private module is somewhat awkward from a
readability standpoint, but, more importantly, the bindings logic
has a relatively rough go of converting them - it doesn't implement
`pub use` as its "implement this function" logic is all within the
context of a module. We'd need to keep a set of re-exported things
to implement them when parsing modules...or we could just move two
enums from `de.rs` to `lib.rs` here, which is substantially less
work.
Jeffrey Czyz [Mon, 14 Feb 2022 21:31:59 +0000 (15:31 -0600)]
Immutable BlockSource interface
Querying a BlockSource is a logically immutable operation. Use non-mut
references in its interface to reflect this, which allows for users to
hold multiple references if desired.
Matt Corallo [Sun, 3 Apr 2022 01:04:26 +0000 (01:04 +0000)]
Pipe filesystem writes in `lightning-persister` through `BufWriter`
We generally make no effort to ensure all writes are buffered in
lower-level objects, so wrapping write calls in `BufWriter` may
substantially improve performance in some cases. This is especially
important now that we block the sample node exit until the
`NetworkGraph` has been written out, which includes many small-ish
writes.
With this change, shutdown of the sample node on a relatively
underpowered device went from 15-30 seconds of CPU time to a second
or two, plus IO sync time.
Jeffrey Czyz [Thu, 31 Mar 2022 13:13:10 +0000 (08:13 -0500)]
Add an amount penalty to ProbabilisticScorer
The cost of large payments tends to be dominated by the channel fees. To
avoid this, add an amount penalty to ProbabilisticScorer with a user
configurable multiplier. The multiplier is applied for every 2^20th of
the amount weighted by the negative log10 of the channel's success
probability for the payment.
Jeffrey Czyz [Thu, 31 Mar 2022 02:20:58 +0000 (21:20 -0500)]
Avoid retrying over recently failed channels
In ProbabilisticScorer, the channel liquidity balance is reduced
whenever a payment fails at the corresponding channel. The payment may
still be retried through the channel, however, because the liquidity
penalty is capped. Use u64::max_value instead in this situation to avoid
retrying over the same path. This effectively makes u64::max_value the
penalty for amounts exceeding the upper bound, as well.
As an edge case, avoid using u64::max_value on attempts where the amount
is equal to the effective capacity, which may be the HTLC maximum when
the channel capacity is unknown.
Matt Corallo [Thu, 24 Mar 2022 18:38:43 +0000 (18:38 +0000)]
Don't consider a path as having hit HTLC-min if it isn't sufficient
During the first pass of path finding, we seek a single path with the
exact payment amount, and only seek additional paths if (a) no single
path can carry the entire balance of the payment or (b) we found a good
path, but along the way we found candidate paths with the potential to
result in a lower total fee. This commit fixes the behavior of (b) -- we
were previously considering some paths to be candidates for a lower fee
when in fact they never would have worked. This caused us to re-run
Dijkstra's when it might not have been beneficial.
Jurvis Tan [Tue, 22 Mar 2022 03:13:14 +0000 (20:13 -0700)]
Add NetworkGraph persistence
Instead of creating a separate trait for persisting NetworkGraph, use and rename the existing ChannelManagerPersister to handle them both. persist_graph is then called on removal of stale channels and on exit.
Matt Corallo [Tue, 15 Mar 2022 23:53:01 +0000 (23:53 +0000)]
Drop the `Writeable::encode_with_len` method in non-test buidls
There's not a lot of reason to keep it given its used in one place
outside of tests, and this lets us clean up some of the byte_utils
calls that are still lying around.
Matt Corallo [Tue, 8 Mar 2022 21:55:02 +0000 (21:55 +0000)]
Use the correct SCID when failing HTLCs to aliased channels
When we fail an HTLC which was destined for a channel that the HTLC
sender didn't know the real SCID for, we should ensure we continue
to use the alias in the channel_update we provide them. Otherwise
we will leak the channel's real SCID to HTLC senders.
Matt Corallo [Fri, 4 Mar 2022 20:20:19 +0000 (20:20 +0000)]
Make all callsites to `get_channel_update_for_unicast` fallible
This reduces unwraps in channelmanager by a good bit, providing
robustness for the upcoming 0conf changes which allow SCIDs to be
missing after a channel is in use, making
`get_channel_update_for_unicast` more fallible.
This also serves as a useful refactor for the next commit,
consolidating the channel_update creation sites which are changed
in the next commit.
Matt Corallo [Thu, 17 Feb 2022 22:13:54 +0000 (22:13 +0000)]
Negotiate `scid_alias` for private channels based on a new config
Because negotiating `scid_alias` for all of our channels will cause
us to create channels which LDK versions prior to 0.0.106 do not
understand, we disable `scid_alias` negotiation by default.
Matt Corallo [Tue, 1 Feb 2022 17:23:52 +0000 (17:23 +0000)]
Add support for the SCIDAlias feature bit in incoming channels
This does not, however, ever send the scid_alias feature bit for
outgoing channels, as that would cause the immediately prior
version of LDK to be unable to read channel data.
Matt Corallo [Wed, 16 Feb 2022 04:21:29 +0000 (04:21 +0000)]
Expose chan type in `Event::OpenChannelRequest` & `ChannelDetails`
As we add new supported channel types, inbound channels which use
new features may cause backwards-compatibility issues for clients.
If a new channel is opened using new features while a client still
wishes to ensure support for downgrading to a previous version of
LDK, that new channel may cause the `ChannelManager` to fail
deserialization due to unsupported feature flags.
By exposing the channel type flags to the user in channel requests,
users wishing to support downgrading to previous versions of LDK
can reject channels which use channel features which previous
versions of LDK do not understand.
Matt Corallo [Sun, 27 Mar 2022 17:11:22 +0000 (17:11 +0000)]
Bump `check_commits` CI job rustc to 1.57
1.51 (and other earlier versions of `rustc`) appear to refuse to
accept our documentation links due to a bogus failure to resolve
`ChannelTypeFeatures::supports_scid_privacy`.
Jeffrey Czyz [Thu, 24 Mar 2022 23:21:29 +0000 (18:21 -0500)]
Fix overflow in ProbabilisticScorer
When a routing hint is given in an invoice, the effective capacity of
the channel is assumed to be infinite (i.e., u64::max_value) if the hop
is private. Adding 1 to this in the success probability calculation will
cause an overflow and ultimately an `index out of bounds panic` in
log10_times_1024. This was not an issue with using log10 because the use
of f64 would give infinite which casts to 0 for u64.
Jeffrey Czyz [Tue, 22 Mar 2022 00:11:48 +0000 (19:11 -0500)]
Add a base penalty to ProbabilisticScorer
ProbabilisticScorer tends to prefer longer routes to shorter ones. Make
the default scoring behavior include a customizable base penalty to
avoid longer routes.
Matt Corallo [Wed, 23 Mar 2022 22:10:15 +0000 (22:10 +0000)]
Skip `channel_update` signature checks if we have a newer state
`channel_update` messages already have their signatures checked
with the network graph write lock held, so there's no reason to
check the signatures before doing other quicker checks first,
including checking if we're already aware of a newer update for the
channel.
This reduces common-case CPU usage as `channel_update`s are sent
rather liberally over the p2p network to gossip them.
Matt Corallo [Fri, 18 Mar 2022 23:19:51 +0000 (23:19 +0000)]
Avoid needless MPP on multiple channels to the same first-hop
When we have many channels to the same first-hop, many of which do
not have sufficient balance to make the requested payment, but when
some do, instead of simply using the available channel balance we
may switch to MPP, potentially with many, many paths.
Instead, we should seek to use the smallest channel which can
easily handle the requested payment, which we do here by sorting
the first_hops in our router before beginning the graph search.
Note that the "real" fix for this should be to instead decide which
channel to use at HTLC-send time, as most other nodes do during
relay, but this provides a minimal fix without needing to do the
rather-large work of refactoring our HTLC send+relay pipelines.
Issues with overly-aggressive MPP on many channels were reported by
Cash App.
Matt Corallo [Fri, 18 Mar 2022 03:28:25 +0000 (03:28 +0000)]
Tag some type aliases with `(C-not exported)`
Type aliases are now more robustly being exported in the C bindings
generator, which requires ensuring we don't include some type
aliases which make no sense in bindings.
Matt Corallo [Thu, 17 Mar 2022 22:14:43 +0000 (22:14 +0000)]
Send a `gossip_timestamp_filter` on connect to enable gossip sync
On connection, if our peer supports gossip queries, and we never
send a `gossip_timestamp_filter`, our peer is supposed to never
send us gossip outside of explicit queries. Thus, we'll end up
always having stale gossip information after the first few
connections we make to peers.
The solution is to send a dummy `gossip_timestamp_filter`
immediately after connecting to peers.