Matt Corallo [Mon, 29 Jan 2024 17:24:32 +0000 (17:24 +0000)]
Do DB insertions in parallel
When inserting new gossip into the DB, we block the LDK peer
handling if we get behind. This is mostly okay, but can cause ping
timeouts and reconnections, which isn't ideal. To limit how often
we should see this, here we move to doing the new gossip insertions
in parallel.
Arik Sosman [Sat, 4 Nov 2023 06:05:26 +0000 (23:05 -0700)]
Include old updates when necessary.
When a channel has only recently become bidirectional,
but there has not been a new update in the old direction
since the last sync, the latest update in the old direction
must still be included in full because it is the first time
the full channel is being snapshotted.
Arik Sosman [Tue, 29 Aug 2023 01:01:03 +0000 (18:01 -0700)]
Send full updates after old last seen updates.
Previously, whenever we saw that there was a previous update that a
client would have seen, we simply calculated the delta set based on
which properties have changed, and would most likely send an
incremental update set (excepting the case of a new or newly sent
announcement, in which case all sent updates are full).
However, if the last seen update was old, and there's a chance that
a user may have run RGS since, it is possible that due to the
7-day-backdating-mechanism included on the client, the reference
update would no longer be present.
To fix that, anytime we see that a last seen update is more than six
days old, we automatically include a full update.
Previously, we had hard-coded factors for the default snapshot
generation interval, which also served as the minimum snapshot
scope. In this commit, we substitute that with a doubling
mechanism that stops once it reaches or exceeds the
21-day-mark, which can be configured using an additional flag.
Arik Sosman [Mon, 28 Aug 2023 16:07:19 +0000 (09:07 -0700)]
Fix multiplication overflow bug.
The `snapshot_sync_day_factors` array is sorted
ascendingly, so find() will return on the first
iteration that is at least equal to the requested
interval.
However, the last value in the array is u64::max,
which means that multiplying it with DAY_SECONDS
will overflow. To avoid that, we use saturating_mul.
Matt Corallo [Sun, 16 Jul 2023 17:20:56 +0000 (17:20 +0000)]
Drop overly optimistic index
The `channel_updates_id_with_scid_dir_blob` index allows the
intermediate-row-fetching logic to be index-only, but there's very
little reason to do so - we now use subqueries to build the exact
set of rows we want, by id, and then fetch various colums. Having
an index that lets us look up those columns without hitting the
regular table is fine, but there's not a ton of cost to hitting the
table by primary key and maintaining yet another index isn't free.
Matt Corallo [Sun, 16 Jul 2023 03:20:52 +0000 (03:20 +0000)]
Don't hold the `NetworkGraph` read lock across an await point
Holding the `NetworkGraph` read lock across a query await point
can cause a deadlock if another task tries to handle a gossip
message at the same time.
Matt Corallo [Sun, 16 Jul 2023 00:37:05 +0000 (00:37 +0000)]
Switch to streaming queries
In order to use streaming queries we have to use `tokio-postgres`'s
`query_raw` command, rather than `query`. This should reduce our
memory footprint from 10+GB to well under one.
The `consider_intermediate_updates` flag is always set, and must be
set for correctness, so we remove it. Further, we optimize the
query that hung on it somewhat by removing an uneccessary
`ORDER BY` clause which was only neccessary if
`consider_intermediate_updates` were unset.
Matt Corallo [Sat, 15 Jul 2023 06:42:33 +0000 (06:42 +0000)]
Substantially optimize reference-row-fetching
By first fetching the rows we need from a smaller index, we avoid
walking a large index which contained the full `blob_signed`. This
reduces reference-row-fetching from 680 seconds to 152 seconds when
searching today for reference rows against 7 days ago.
Old:
```
ln-gossip=# EXPLAIN ANALYZE SELECT DISTINCT ON (short_channel_id, direction) id, blob_signed, direction
FROM channel_updates
WHERE seen < '2023-07-07 00:00:00' AND short_channel_id IN (
SELECT DISTINCT ON (short_channel_id) short_channel_id
FROM channel_updates
WHERE seen >= '2023-07-07 00:00:00'
)
ORDER BY short_channel_id ASC, direction ASC, seen DESC;
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=186279.46..11921204.82 rows=168910 width=161) (actual time=732.365..680504.173 rows=129985 loops=1)
-> Merge Join (cost=186279.46..11632998.93 rows=57641177 width=161) (actual time=732.364..679193.755 rows=31714061 loops=1)
Merge Cond: (channel_updates.short_channel_id = channel_updates_1.short_channel_id)
-> Index Only Scan using channel_updates_scid_dir_seen on channel_updates (cost=0.56..10718853.69 rows=57641177 width=161) (actual time=0.638..673675.749 rows=57408667 loops=1)
Index Cond: (seen < '2023-07-07 00:00:00'::timestamp without time zone)
Heap Fetches: 0
-> Unique (cost=186278.90..192574.84 rows=84455 width=8) (actual time=478.881..750.241 rows=68210 loops=1)
-> Sort (cost=186278.90..189426.87 rows=1259188 width=8) (actual time=478.878..653.035 rows=1452661 loops=1)
Sort Key: channel_updates_1.short_channel_id
Sort Method: external merge Disk: 17680kB
-> Index Only Scan using channel_updates_seen_scid on channel_updates channel_updates_1 (cost=0.56..41481.08 rows=1259188 width=8) (actual time=0.885..264.333 rows=1504495 loops=1)
Index Cond: (seen >= '2023-07-07 00:00:00'::timestamp without time zone)
Heap Fetches: 2273
Planning Time: 0.164 ms
JIT:
Functions: 9
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 21.265 ms, Inlining 37.914 ms, Optimization 113.040 ms, Emission 101.901 ms, Total 274.121 ms
Execution Time: 680601.155 ms
(19 rows)
```
New:
```
ln-gossip=# EXPLAIN ANALYZE SELECT id, direction, blob_signed FROM channel_updates
WHERE id IN (
SELECT DISTINCT ON (short_channel_id, direction) id
FROM channel_updates
WHERE seen < '2023-07-07 00:00:00'
ORDER BY short_channel_id ASC, direction ASC, seen DESC
) AND short_channel_id IN (
SELECT DISTINCT ON (short_channel_id) short_channel_id
FROM channel_updates
WHERE seen >= '2023-07-07 00:00:00'
);
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2942503.92..2943867.77 rows=169870 width=145) (actual time=22862.627..152436.685 rows=130116 loops=1)
Hash Cond: (channel_updates.short_channel_id = channel_updates_2.short_channel_id)
-> Nested Loop (cost=2738282.26..2739200.18 rows=169870 width=153) (actual time=22141.452..151504.140 rows=393250 loops=1)
-> HashAggregate (cost=2738281.69..2738283.69 rows=200 width=4) (actual time=22139.440..22339.035 rows=393250 loops=1)
Group Key: channel_updates_1.id
Batches: 1 Memory Usage: 45089kB
-> Result (cost=0.56..2736158.32 rows=169870 width=21) (actual time=0.102..21984.409 rows=393250 loops=1)
-> Unique (cost=0.56..2736158.32 rows=169870 width=21) (actual time=0.074..21943.089 rows=393250 loops=1)
-> Index Only Scan using channel_updates_scid_dir_seen_desc_with_id on channel_updates channel_updates_1 (cost=0.56..2448011.03 rows=57629457 width=21) (actual time=0.073..19776.181 rows=57408667 loops=1)
Index Cond: (seen < '2023-07-07 00:00:00'::timestamp without time zone)
Heap Fetches: 0
-> Index Only Scan using channel_updates_id_with_scid_dir_blob on channel_updates (cost=0.56..4.60 rows=1 width=153) (actual time=0.328..0.328 rows=1 loops=393250)
Index Cond: (id = channel_updates_1.id)
Heap Fetches: 0
-> Hash (cost=203159.97..203159.97 rows=84935 width=8) (actual time=721.105..721.107 rows=70731 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3787kB
-> Unique (cost=195708.67..202310.62 rows=84935 width=8) (actual time=552.965..713.465 rows=70731 loops=1)
-> Sort (cost=195708.67..199009.65 rows=1320391 width=8) (actual time=552.962..650.323 rows=1537141 loops=1)
Sort Key: channel_updates_2.short_channel_id
Sort Method: external merge Disk: 18064kB
-> Index Only Scan using channel_updates_seen_scid on channel_updates channel_updates_2 (cost=0.56..43421.19 rows=1320391 width=8) (actual time=66.736..324.130 rows=1537141 loops=1)
Index Cond: (seen >= '2023-07-07 00:00:00'::timestamp without time zone)
Heap Fetches: 68
Planning Time: 0.520 ms
JIT:
Functions: 21
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.643 ms, Inlining 7.055 ms, Optimization 33.167 ms, Emission 25.782 ms, Total 66.648 ms
Execution Time: 152458.777 ms
(29 rows)
```
Matt Corallo [Thu, 6 Jul 2023 16:43:05 +0000 (16:43 +0000)]
Require DB insertions to complete in fifteen seconds
For some reason the mainnet server hung, seemingly on the DB
insertion task. This will improve debugging by simply crashing if
an insertion takes longer than five seconds.
Matt Corallo [Sun, 2 Jul 2023 17:17:07 +0000 (17:17 +0000)]
Build reminder updates with correct SCID field
When the reminder updates were added, a dummy `ChannelUpdate` with
a number of zero'd fields were created under the assumption that
the zero'd fields would be ignored downstream when building
serialized updates. However, the SCID field was `assert`'ed on (and
serialized in the update), causing any reminder updates to cause an
assertion panic.
Instead, we do it the Right Way (tm) here and move the
only-sometimes-available fields into the update type enum, ensuring
we can't access "poison" fields downstream.
Matt Corallo [Mon, 5 Jun 2023 23:26:38 +0000 (23:26 +0000)]
Removed unused_mut rejection and fix some unused `mut`s
Making warnings a hard failure is generally bad practice as it can
result in new compiler versions failing to compile
otherwise-totally-acceptable code, which in this case is happening
on rustc beta, which is now warning for new cases of unused mut.
Andrei [Tue, 16 May 2023 00:00:00 +0000 (00:00 +0000)]
Fix dummy symlink
The commit changes the symlink for the snapshot for the current date
from `./res/snapshots_pending/empty_delta.lngossip` to
`../snapshots/empty_delta.lngossip` such that nginx does not
retrun 404, but 200 with the dummy snapshot