From fee45d58d49ae14299c849abe0a3eb1b5f9de060 Mon Sep 17 00:00:00 2001 From: Matt Corallo Date: Sun, 16 Jul 2023 17:13:10 +0000 Subject: [PATCH] Further optimize intermediate row fetching by filtering scids first Postgresql actually thinks this version is more expensive, but when I run it its about 4x as fast: ``` ln-gossip=# EXPLAIN ANALYZE SELECT id, direction, blob_signed FROM channel_updates WHERE id IN ( SELECT DISTINCT ON (short_channel_id, direction) id FROM channel_updates WHERE seen < '2023-07-08 00:00:00' AND short_channel_id IN ( SELECT DISTINCT ON (short_channel_id) short_channel_id FROM channel_updates WHERE seen >= '2023-07-08 00:00:00' ) ORDER BY short_channel_id ASC, direction ASC, seen DESC ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=3771382.74..3772300.66 rows=172186 width=145) (actual time=25270.727..34954.993 rows=129921 loops=1) -> HashAggregate (cost=3771382.18..3771384.18 rows=200 width=4) (actual time=25267.060..25313.378 rows=129921 loops=1) Group Key: channel_updates_1.id Batches: 1 Memory Usage: 12321kB -> Unique (cost=189510.13..3769229.85 rows=172186 width=21) (actual time=693.154..25177.896 rows=129921 loops=1) -> Merge Join (cost=189510.13..3480148.53 rows=57816264 width=21) (actual time=693.152..23970.052 rows=31781122 loops=1) Merge Cond: (channel_updates_1.short_channel_id = channel_updates_2.short_channel_id) -> Index Only Scan using channel_updates_scid_dir_seen_desc_with_id on channel_updates channel_updates_1 (cost=0.56..2560497.30 rows=57816264 width=21) (actual time=0.154..19198.277 rows=57605314 loops=1) Index Cond: (seen < '2023-07-08 00:00:00'::timestamp without time zone) Heap Fetches: 0 -> Unique (cost=189509.57..195871.76 rows=86093 width=8) (actual time=594.634..787.364 rows=68181 loops=1) -> Sort (cost=189509.57..192690.67 rows=1272439 width=8) (actual time=594.621..722.739 rows=1452420 loops=1) Sort Key: channel_updates_2.short_channel_id Sort Method: external merge Disk: 17664kB -> Index Only Scan using channel_updates_seen_scid on channel_updates channel_updates_2 (cost=0.56..43091.07 rows=1272439 width=8) (actual time=3.461..348.146 rows=1502697 loops=1) Index Cond: (seen >= '2023-07-08 00:00:00'::timestamp without time zone) Heap Fetches: 40000 -> Index Only Scan using channel_updates_id_with_scid_dir_blob on channel_updates (cost=0.56..4.60 rows=1 width=145) (actual time=0.074..0.074 rows=1 loops=129921) Index Cond: (id = channel_updates_1.id) Heap Fetches: 0 Planning Time: 20.757 ms JIT: Functions: 18 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.770 ms, Inlining 51.503 ms, Optimization 25.848 ms, Emission 21.086 ms, Total 99.208 ms Execution Time: 34995.683 ms (26 rows) ln-gossip=# EXPLAIN ANALYZE SELECT id, direction, blob_signed FROM channel_updates WHERE id IN ( SELECT DISTINCT ON (short_channel_id, direction) id FROM channel_updates WHERE seen < '2023-07-08 00:00:00' ORDER BY short_channel_id ASC, direction ASC, seen DESC ) AND short_channel_id IN ( SELECT DISTINCT ON (short_channel_id) short_channel_id FROM channel_updates WHERE seen >= '2023-07-08 00:00:00' ); ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=3049540.37..3050910.30 rows=172186 width=145) (actual time=18697.102..123187.066 rows=129921 loops=1) Hash Cond: (channel_updates.short_channel_id = channel_updates_2.short_channel_id) -> Nested Loop (cost=2851731.51..2852649.44 rows=172186 width=153) (actual time=18029.529..122265.952 rows=393938 loops=1) -> HashAggregate (cost=2851730.95..2851732.95 rows=200 width=4) (actual time=18027.857..18309.986 rows=393938 loops=1) Group Key: channel_updates_1.id Batches: 5 Memory Usage: 16449kB Disk Usage: 7104kB -> Result (cost=0.56..2849578.62 rows=172186 width=21) (actual time=0.070..17851.034 rows=393938 loops=1) -> Unique (cost=0.56..2849578.62 rows=172186 width=21) (actual time=0.044..17809.361 rows=393938 loops=1) -> Index Only Scan using channel_updates_scid_dir_seen_desc_with_id on channel_updates channel_updates_1 (cost=0.56..2560497.30 rows=57816264 width=21) (actual time=0.043..15596.783 rows=57605314 loops=1) Index Cond: (seen < '2023-07-08 00:00:00'::timestamp without time zone) Heap Fetches: 0 -> Index Only Scan using channel_updates_id_with_scid_dir_blob on channel_updates (cost=0.56..4.60 rows=1 width=153) (actual time=0.263..0.263 rows=1 loops=393938) Index Cond: (id = channel_updates_1.id) Heap Fetches: 0 -> Hash (cost=196732.69..196732.69 rows=86093 width=8) (actual time=662.832..662.833 rows=70509 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 3779kB -> Unique (cost=189509.57..195871.76 rows=86093 width=8) (actual time=542.007..657.269 rows=70509 loops=1) -> Sort (cost=189509.57..192690.67 rows=1272439 width=8) (actual time=542.004..610.886 rows=1502697 loops=1) Sort Key: channel_updates_2.short_channel_id Sort Method: external merge Disk: 17664kB -> Index Only Scan using channel_updates_seen_scid on channel_updates channel_updates_2 (cost=0.56..43091.07 rows=1272439 width=8) (actual time=67.791..274.989 rows=1502697 loops=1) Index Cond: (seen >= '2023-07-08 00:00:00'::timestamp without time zone) Heap Fetches: 39999 Planning Time: 0.324 ms JIT: Functions: 23 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.976 ms, Inlining 8.080 ms, Optimization 37.602 ms, Emission 26.389 ms, Total 73.048 ms Execution Time: 123206.373 ms (29 rows) ``` --- src/lookup.rs | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/lookup.rs b/src/lookup.rs index 263514d..c554f9f 100644 --- a/src/lookup.rs +++ b/src/lookup.rs @@ -225,12 +225,12 @@ pub(super) async fn fetch_channel_updates(delta_set: &mut DeltaSet, client: &Cli WHERE id IN ( SELECT DISTINCT ON (short_channel_id, direction) id FROM channel_updates - WHERE seen < $1 + WHERE seen < $1 AND short_channel_id IN ( + SELECT DISTINCT ON (short_channel_id) short_channel_id + FROM channel_updates + WHERE seen >= $1 + ) ORDER BY short_channel_id ASC, direction ASC, seen DESC - ) AND short_channel_id IN ( - SELECT DISTINCT ON (short_channel_id) short_channel_id - FROM channel_updates - WHERE seen >= $1 ) ", [last_sync_timestamp_object]).await.unwrap(); let mut pinned_rows = Box::pin(reference_rows); -- 2.39.5