Arik Sosman [Thu, 9 May 2024 06:58:29 +0000 (23:58 -0700)]
Detect beginning of static channel status streaks.
Previously, whenever a channel hasn't received updates in
6+ days, we would automatically send incremental reminders
that only contain the update flags.
However, if a channel has been received updates
within that six-day-timeframe, but all those updates were
identical, it would result in no updates being added to the
snapshot because the mutation set ended up empty and that
data would get purged from the serialization.
In order to avoid the reminder logic being duped by
channels simply being consistent, we now look up the
beginning of the latest continuous stretch of non-mutating
channel updates. If a channel's details have not been
altered in more than six days, we now send reminders no
matter the frequency with which channel updates have been
received since.
Matt Corallo [Mon, 29 Jan 2024 17:24:32 +0000 (17:24 +0000)]
Do DB insertions in parallel
When inserting new gossip into the DB, we block the LDK peer
handling if we get behind. This is mostly okay, but can cause ping
timeouts and reconnections, which isn't ideal. To limit how often
we should see this, here we move to doing the new gossip insertions
in parallel.
Arik Sosman [Sat, 4 Nov 2023 06:05:26 +0000 (23:05 -0700)]
Include old updates when necessary.
When a channel has only recently become bidirectional,
but there has not been a new update in the old direction
since the last sync, the latest update in the old direction
must still be included in full because it is the first time
the full channel is being snapshotted.
Arik Sosman [Tue, 29 Aug 2023 01:01:03 +0000 (18:01 -0700)]
Send full updates after old last seen updates.
Previously, whenever we saw that there was a previous update that a
client would have seen, we simply calculated the delta set based on
which properties have changed, and would most likely send an
incremental update set (excepting the case of a new or newly sent
announcement, in which case all sent updates are full).
However, if the last seen update was old, and there's a chance that
a user may have run RGS since, it is possible that due to the
7-day-backdating-mechanism included on the client, the reference
update would no longer be present.
To fix that, anytime we see that a last seen update is more than six
days old, we automatically include a full update.
Previously, we had hard-coded factors for the default snapshot
generation interval, which also served as the minimum snapshot
scope. In this commit, we substitute that with a doubling
mechanism that stops once it reaches or exceeds the
21-day-mark, which can be configured using an additional flag.
Arik Sosman [Mon, 28 Aug 2023 16:07:19 +0000 (09:07 -0700)]
Fix multiplication overflow bug.
The `snapshot_sync_day_factors` array is sorted
ascendingly, so find() will return on the first
iteration that is at least equal to the requested
interval.
However, the last value in the array is u64::max,
which means that multiplying it with DAY_SECONDS
will overflow. To avoid that, we use saturating_mul.
Matt Corallo [Sun, 16 Jul 2023 17:20:56 +0000 (17:20 +0000)]
Drop overly optimistic index
The `channel_updates_id_with_scid_dir_blob` index allows the
intermediate-row-fetching logic to be index-only, but there's very
little reason to do so - we now use subqueries to build the exact
set of rows we want, by id, and then fetch various colums. Having
an index that lets us look up those columns without hitting the
regular table is fine, but there's not a ton of cost to hitting the
table by primary key and maintaining yet another index isn't free.
Matt Corallo [Sun, 16 Jul 2023 03:20:52 +0000 (03:20 +0000)]
Don't hold the `NetworkGraph` read lock across an await point
Holding the `NetworkGraph` read lock across a query await point
can cause a deadlock if another task tries to handle a gossip
message at the same time.
Matt Corallo [Sun, 16 Jul 2023 00:37:05 +0000 (00:37 +0000)]
Switch to streaming queries
In order to use streaming queries we have to use `tokio-postgres`'s
`query_raw` command, rather than `query`. This should reduce our
memory footprint from 10+GB to well under one.
The `consider_intermediate_updates` flag is always set, and must be
set for correctness, so we remove it. Further, we optimize the
query that hung on it somewhat by removing an uneccessary
`ORDER BY` clause which was only neccessary if
`consider_intermediate_updates` were unset.
Matt Corallo [Sat, 15 Jul 2023 06:42:33 +0000 (06:42 +0000)]
Substantially optimize reference-row-fetching
By first fetching the rows we need from a smaller index, we avoid
walking a large index which contained the full `blob_signed`. This
reduces reference-row-fetching from 680 seconds to 152 seconds when
searching today for reference rows against 7 days ago.
Old:
```
ln-gossip=# EXPLAIN ANALYZE SELECT DISTINCT ON (short_channel_id, direction) id, blob_signed, direction
FROM channel_updates
WHERE seen < '2023-07-07 00:00:00' AND short_channel_id IN (
SELECT DISTINCT ON (short_channel_id) short_channel_id
FROM channel_updates
WHERE seen >= '2023-07-07 00:00:00'
)
ORDER BY short_channel_id ASC, direction ASC, seen DESC;
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=186279.46..11921204.82 rows=168910 width=161) (actual time=732.365..680504.173 rows=129985 loops=1)
-> Merge Join (cost=186279.46..11632998.93 rows=57641177 width=161) (actual time=732.364..679193.755 rows=31714061 loops=1)
Merge Cond: (channel_updates.short_channel_id = channel_updates_1.short_channel_id)
-> Index Only Scan using channel_updates_scid_dir_seen on channel_updates (cost=0.56..10718853.69 rows=57641177 width=161) (actual time=0.638..673675.749 rows=57408667 loops=1)
Index Cond: (seen < '2023-07-07 00:00:00'::timestamp without time zone)
Heap Fetches: 0
-> Unique (cost=186278.90..192574.84 rows=84455 width=8) (actual time=478.881..750.241 rows=68210 loops=1)
-> Sort (cost=186278.90..189426.87 rows=1259188 width=8) (actual time=478.878..653.035 rows=1452661 loops=1)
Sort Key: channel_updates_1.short_channel_id
Sort Method: external merge Disk: 17680kB
-> Index Only Scan using channel_updates_seen_scid on channel_updates channel_updates_1 (cost=0.56..41481.08 rows=1259188 width=8) (actual time=0.885..264.333 rows=1504495 loops=1)
Index Cond: (seen >= '2023-07-07 00:00:00'::timestamp without time zone)
Heap Fetches: 2273
Planning Time: 0.164 ms
JIT:
Functions: 9
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 21.265 ms, Inlining 37.914 ms, Optimization 113.040 ms, Emission 101.901 ms, Total 274.121 ms
Execution Time: 680601.155 ms
(19 rows)
```
New:
```
ln-gossip=# EXPLAIN ANALYZE SELECT id, direction, blob_signed FROM channel_updates
WHERE id IN (
SELECT DISTINCT ON (short_channel_id, direction) id
FROM channel_updates
WHERE seen < '2023-07-07 00:00:00'
ORDER BY short_channel_id ASC, direction ASC, seen DESC
) AND short_channel_id IN (
SELECT DISTINCT ON (short_channel_id) short_channel_id
FROM channel_updates
WHERE seen >= '2023-07-07 00:00:00'
);
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2942503.92..2943867.77 rows=169870 width=145) (actual time=22862.627..152436.685 rows=130116 loops=1)
Hash Cond: (channel_updates.short_channel_id = channel_updates_2.short_channel_id)
-> Nested Loop (cost=2738282.26..2739200.18 rows=169870 width=153) (actual time=22141.452..151504.140 rows=393250 loops=1)
-> HashAggregate (cost=2738281.69..2738283.69 rows=200 width=4) (actual time=22139.440..22339.035 rows=393250 loops=1)
Group Key: channel_updates_1.id
Batches: 1 Memory Usage: 45089kB
-> Result (cost=0.56..2736158.32 rows=169870 width=21) (actual time=0.102..21984.409 rows=393250 loops=1)
-> Unique (cost=0.56..2736158.32 rows=169870 width=21) (actual time=0.074..21943.089 rows=393250 loops=1)
-> Index Only Scan using channel_updates_scid_dir_seen_desc_with_id on channel_updates channel_updates_1 (cost=0.56..2448011.03 rows=57629457 width=21) (actual time=0.073..19776.181 rows=57408667 loops=1)
Index Cond: (seen < '2023-07-07 00:00:00'::timestamp without time zone)
Heap Fetches: 0
-> Index Only Scan using channel_updates_id_with_scid_dir_blob on channel_updates (cost=0.56..4.60 rows=1 width=153) (actual time=0.328..0.328 rows=1 loops=393250)
Index Cond: (id = channel_updates_1.id)
Heap Fetches: 0
-> Hash (cost=203159.97..203159.97 rows=84935 width=8) (actual time=721.105..721.107 rows=70731 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3787kB
-> Unique (cost=195708.67..202310.62 rows=84935 width=8) (actual time=552.965..713.465 rows=70731 loops=1)
-> Sort (cost=195708.67..199009.65 rows=1320391 width=8) (actual time=552.962..650.323 rows=1537141 loops=1)
Sort Key: channel_updates_2.short_channel_id
Sort Method: external merge Disk: 18064kB
-> Index Only Scan using channel_updates_seen_scid on channel_updates channel_updates_2 (cost=0.56..43421.19 rows=1320391 width=8) (actual time=66.736..324.130 rows=1537141 loops=1)
Index Cond: (seen >= '2023-07-07 00:00:00'::timestamp without time zone)
Heap Fetches: 68
Planning Time: 0.520 ms
JIT:
Functions: 21
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.643 ms, Inlining 7.055 ms, Optimization 33.167 ms, Emission 25.782 ms, Total 66.648 ms
Execution Time: 152458.777 ms
(29 rows)
```
Matt Corallo [Thu, 6 Jul 2023 16:43:05 +0000 (16:43 +0000)]
Require DB insertions to complete in fifteen seconds
For some reason the mainnet server hung, seemingly on the DB
insertion task. This will improve debugging by simply crashing if
an insertion takes longer than five seconds.
Matt Corallo [Sun, 2 Jul 2023 17:17:07 +0000 (17:17 +0000)]
Build reminder updates with correct SCID field
When the reminder updates were added, a dummy `ChannelUpdate` with
a number of zero'd fields were created under the assumption that
the zero'd fields would be ignored downstream when building
serialized updates. However, the SCID field was `assert`'ed on (and
serialized in the update), causing any reminder updates to cause an
assertion panic.
Instead, we do it the Right Way (tm) here and move the
only-sometimes-available fields into the update type enum, ensuring
we can't access "poison" fields downstream.