Further optimize intermediate row fetching by filtering scids first
Postgresql actually thinks this version is more expensive, but when
I run it its about 4x as fast:
```
ln-gossip=# EXPLAIN ANALYZE SELECT id, direction, blob_signed FROM channel_updates
WHERE id IN (
SELECT DISTINCT ON (short_channel_id, direction) id
FROM channel_updates
WHERE seen < '2023-07-08 00:00:00' AND short_channel_id IN (
SELECT DISTINCT ON (short_channel_id) short_channel_id
FROM channel_updates
WHERE seen >= '2023-07-08 00:00:00'
)
ORDER BY short_channel_id ASC, direction ASC, seen DESC
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=
3771382.74..
3772300.66 rows=172186 width=145) (actual time=25270.727..34954.993 rows=129921 loops=1)
-> HashAggregate (cost=
3771382.18..
3771384.18 rows=200 width=4) (actual time=25267.060..25313.378 rows=129921 loops=1)
Group Key: channel_updates_1.id
Batches: 1 Memory Usage: 12321kB
-> Unique (cost=189510.13..
3769229.85 rows=172186 width=21) (actual time=693.154..25177.896 rows=129921 loops=1)
-> Merge Join (cost=189510.13..
3480148.53 rows=
57816264 width=21) (actual time=693.152..23970.052 rows=
31781122 loops=1)
Merge Cond: (channel_updates_1.short_channel_id = channel_updates_2.short_channel_id)
-> Index Only Scan using channel_updates_scid_dir_seen_desc_with_id on channel_updates channel_updates_1 (cost=0.56..
2560497.30 rows=
57816264 width=21) (actual time=0.154..19198.277 rows=
57605314 loops=1)
Index Cond: (seen < '2023-07-08 00:00:00'::timestamp without time zone)
Heap Fetches: 0
-> Unique (cost=189509.57..195871.76 rows=86093 width=8) (actual time=594.634..787.364 rows=68181 loops=1)
-> Sort (cost=189509.57..192690.67 rows=
1272439 width=8) (actual time=594.621..722.739 rows=
1452420 loops=1)
Sort Key: channel_updates_2.short_channel_id
Sort Method: external merge Disk: 17664kB
-> Index Only Scan using channel_updates_seen_scid on channel_updates channel_updates_2 (cost=0.56..43091.07 rows=
1272439 width=8) (actual time=3.461..348.146 rows=
1502697 loops=1)
Index Cond: (seen >= '2023-07-08 00:00:00'::timestamp without time zone)
Heap Fetches: 40000
-> Index Only Scan using channel_updates_id_with_scid_dir_blob on channel_updates (cost=0.56..4.60 rows=1 width=145) (actual time=0.074..0.074 rows=1 loops=129921)
Index Cond: (id = channel_updates_1.id)
Heap Fetches: 0
Planning Time: 20.757 ms
JIT:
Functions: 18
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.770 ms, Inlining 51.503 ms, Optimization 25.848 ms, Emission 21.086 ms, Total 99.208 ms
Execution Time: 34995.683 ms
(26 rows)
ln-gossip=# EXPLAIN ANALYZE SELECT id, direction, blob_signed FROM channel_updates
WHERE id IN (
SELECT DISTINCT ON (short_channel_id, direction) id
FROM channel_updates
WHERE seen < '2023-07-08 00:00:00'
ORDER BY short_channel_id ASC, direction ASC, seen DESC
) AND short_channel_id IN (
SELECT DISTINCT ON (short_channel_id) short_channel_id
FROM channel_updates
WHERE seen >= '2023-07-08 00:00:00'
);
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=
3049540.37..
3050910.30 rows=172186 width=145) (actual time=18697.102..123187.066 rows=129921 loops=1)
Hash Cond: (channel_updates.short_channel_id = channel_updates_2.short_channel_id)
-> Nested Loop (cost=
2851731.51..
2852649.44 rows=172186 width=153) (actual time=18029.529..122265.952 rows=393938 loops=1)
-> HashAggregate (cost=
2851730.95..
2851732.95 rows=200 width=4) (actual time=18027.857..18309.986 rows=393938 loops=1)
Group Key: channel_updates_1.id
Batches: 5 Memory Usage: 16449kB Disk Usage: 7104kB
-> Result (cost=0.56..
2849578.62 rows=172186 width=21) (actual time=0.070..17851.034 rows=393938 loops=1)
-> Unique (cost=0.56..
2849578.62 rows=172186 width=21) (actual time=0.044..17809.361 rows=393938 loops=1)
-> Index Only Scan using channel_updates_scid_dir_seen_desc_with_id on channel_updates channel_updates_1 (cost=0.56..
2560497.30 rows=
57816264 width=21) (actual time=0.043..15596.783 rows=
57605314 loops=1)
Index Cond: (seen < '2023-07-08 00:00:00'::timestamp without time zone)
Heap Fetches: 0
-> Index Only Scan using channel_updates_id_with_scid_dir_blob on channel_updates (cost=0.56..4.60 rows=1 width=153) (actual time=0.263..0.263 rows=1 loops=393938)
Index Cond: (id = channel_updates_1.id)
Heap Fetches: 0
-> Hash (cost=196732.69..196732.69 rows=86093 width=8) (actual time=662.832..662.833 rows=70509 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3779kB
-> Unique (cost=189509.57..195871.76 rows=86093 width=8) (actual time=542.007..657.269 rows=70509 loops=1)
-> Sort (cost=189509.57..192690.67 rows=
1272439 width=8) (actual time=542.004..610.886 rows=
1502697 loops=1)
Sort Key: channel_updates_2.short_channel_id
Sort Method: external merge Disk: 17664kB
-> Index Only Scan using channel_updates_seen_scid on channel_updates channel_updates_2 (cost=0.56..43091.07 rows=
1272439 width=8) (actual time=67.791..274.989 rows=
1502697 loops=1)
Index Cond: (seen >= '2023-07-08 00:00:00'::timestamp without time zone)
Heap Fetches: 39999
Planning Time: 0.324 ms
JIT:
Functions: 23
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.976 ms, Inlining 8.080 ms, Optimization 37.602 ms, Emission 26.389 ms, Total 73.048 ms
Execution Time: 123206.373 ms
(29 rows)
```