3 \begin{enumerate}[label=(\alph*)]
4 \item Hash Join in Merge Join.
6 EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
7 SELECT a,b,c FROM r NATURAL JOIN s NATURAL JOIN t;
12 Merge Join (cost=126688.84..189746.85 rows=4812309 width=12)
13 (actual time=1223.252..3564.720 rows=4790517 loops=1)
15 Merge Cond: ((t.a = r.a) AND (t.c = s.c))
16 Buffers: shared hit=138, temp read=1320 written=3593
17 -> Sort (cost=809.39..834.39 rows=10000 width=8)
18 (actual time=6.542..9.442 rows=10000 loops=1)
21 Sort Method: quicksort Memory: 853kB
22 Buffers: shared hit=45
23 -> Seq Scan on public.t (cost=0.00..145.00 rows=10000 width=8)
24 (actual time=0.013..1.698 rows=10000 loops=1)
26 Buffers: shared hit=45
27 -> Materialize (cost=125878.37..130770.59 rows=978445 width=12)
28 (actual time=1216.698..1848.300 rows=4790518 loops=1)
30 Buffers: shared hit=93, temp read=1320 written=3593
31 -> Sort (cost=125878.37..128324.48 rows=978445 width=12)
32 (actual time=1216.694..1265.485 rows=105629 loops=1)
35 Sort Method: external merge Disk: 21312kB
36 Buffers: shared hit=93, temp read=1320 written=3593
37 -> Hash Join (cost=270.00..11799.45 rows=978445 width=12)
38 (actual time=5.707..326.759 rows=993774 loops=1)
40 Hash Cond: (r.b = s.b)
41 Buffers: shared hit=93
42 -> Seq Scan on public.r (cost=0.00..145.00 rows=10000 width=8)
43 (actual time=0.013..2.200 rows=10000 loops=1)
45 Buffers: shared hit=45
46 -> Hash (cost=145.00..145.00 rows=10000 width=8)
47 (actual time=5.636..5.636 rows=10000 loops=1)
49 Buckets: 16384 Batches: 1 Memory Usage: 519kB
50 Buffers: shared hit=45
51 -> Seq Scan on public.s (cost=0.00..145.00 rows=10000 width=8)
52 (actual time=0.012..2.656 rows=10000 loops=1)
54 Buffers: shared hit=45
55 Planning time: 0.448 ms
56 Execution time: 4093.957 ms
63 set enable\_hashjoin=1; set enable\_mergejoin=0; set enable\_nestloop=0;
64 EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
65 SELECT a,b,c FROM r NATURAL JOIN s NATURAL JOIN t;
70 Hash Join (cost=565.00..382720.42 rows=4812309 width=12)
71 (actual time=10.065..2336.284 rows=4790517 loops=1)
72 Execution time: 2839.835 ms
77 set enable\_hashjoin=0; set enable\_mergejoin=1; set enable\_nestloop=0;
78 EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
79 SELECT a,b,c FROM r NATURAL JOIN s NATURAL JOIN t;
84 Merge Join (cost=127309.80..190367.82 rows=4812309 width=12)
85 (actual time=1400.115..3961.542 rows=4790517 loops=1)
86 Planning time: 0.374 ms
87 Execution time: 4215.626 ms
92 set enable\_hashjoin=0; set enable\_mergejoin=0; set enable\_nestloop=1;
93 EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
94 SELECT a,b,c FROM r NATURAL JOIN s NATURAL JOIN t;
99 Nested Loop (cost=0.00..172728360.00 rows=4812309 width=12)
100 Execution time: query-timeout
104 Changing the create numbers definitely impacts performance. The question is what is part of the assignment and what not.
108 \item create indices.
110 create index i1 on r(a);
111 create index i2 on r(b);
112 create index i3 on s(b);
113 create index i4 on s(c);
114 create index i5 on t(a);
115 create index i6 on t(c);
116 create index ii1 on r(a,b);
117 create index ii2 on s(b,c);
118 create index ii3 on t(a,c);
119 REINDEX DATABASE u00726236; VACUUM ANALYZE;
122 EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
123 SELECT a,b,c FROM r NATURAL JOIN s NATURAL JOIN t;
126 New query plan for merge join.
128 Merge Join (cost=126115.03..189598.35 rows=4812309 width=12)
129 (actual time=1471.775..3798.577 rows=4790517 loops=1)
130 Output: r.a, r.b, s.c
131 Merge Cond: ((t.a = r.a) AND (t.c = s.c))
132 Buffers: shared hit=7289 read=58, temp read=1325 written=3598
133 -> Index Only Scan using ii3 on public.t (cost=0.29..449.50 rows=10000 width=8)
134 (actual time=0.034..8.250 rows=10000 loops=1)
137 Buffers: shared hit=6299 read=29
138 -> Materialize (cost=126114.75..131006.97 rows=978445 width=12)
139 (actual time=1471.732..2093.037 rows=4790518 loops=1)
140 Output: r.a, r.b, s.c
141 Buffers: shared hit=990 read=29, temp read=1325 written=3598
142 -> Sort (cost=126114.75..128560.86 rows=978445 width=12)
143 (actual time=1471.727..1519.766 rows=105629 loops=1)
144 Output: r.a, r.b, s.c
146 Sort Method: external merge Disk: 21312kB
147 Buffers: shared hit=990 read=29, temp read=1325 written=3598
148 -> Merge Join (cost=809.67..12035.83 rows=978445 width=12)
149 (actual time=5.865..568.309 rows=993774 loops=1)
150 Output: r.a, r.b, s.c
151 Merge Cond: (r.b = s.b)
152 Buffers: shared hit=990 read=29
153 -> Index Scan using i2 on public.r (cost=0.29..449.80 rows=10000 width=8)
154 (actual time=0.026..7.013 rows=10000 loops=1)
156 Buffers: shared hit=945 read=29
157 -> Sort (cost=809.39..834.39 rows=10000 width=8)
158 (actual time=5.836..189.024 rows=993775 loops=1)
161 Sort Method: quicksort Memory: 853kB
162 Buffers: shared hit=45
163 -> Seq Scan on public.s (cost=0.00..145.00 rows=10000 width=8)
164 (actual time=0.013..2.654 rows=10000 loops=1)
166 Buffers: shared hit=45
167 Planning time: 1.060 ms
168 Execution time: 4323.022 ms
171 It seems like no performance improvement is made: Query planning and execution time is bigger with indices.
174 Suddenly nested loop joins become feasible.
176 Nested Loop (cost=0.57..498778.26 rows=4812309 width=12)
177 (actual time=0.096..8351.002 rows=4790517 loops=1)
178 Output: r.a, r.b, s.c
179 Buffers: shared hit=7476815 read=11
180 -> Nested Loop (cost=0.29..30520.00 rows=1007007 width=12)
181 (actual time=0.076..1017.339 rows=987869 loops=1)
182 Output: r.a, r.b, t.c
183 Buffers: shared hit=950972 read=4
184 -> Seq Scan on public.t (cost=0.00..145.00 rows=10000 width=8)
185 (actual time=0.015..2.720 rows=10000 loops=1)
187 Buffers: shared hit=45
188 -> Index Only Scan using ii1 on public.r (cost=0.29..2.05 rows=99 width=8)
189 (actual time=0.007..0.076 rows=99 loops=10000)
191 Index Cond: (r.a = t.a)
193 Buffers: shared hit=950927 read=4
194 -> Index Only Scan using ii2 on public.s (cost=0.29..0.41 rows=5 width=8)
195 (actual time=0.003..0.006 rows=5 loops=987869)
197 Index Cond: ((s.b = r.b) AND (s.c = t.c))
198 Heap Fetches: 4790517
199 Buffers: shared hit=6525843 read=7
200 Planning time: 0.604 ms
201 Execution time: 8884.429 ms
209 RESET ALL; EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
210 SELECT a,b,c FROM r NATURAL JOIN t WHERE b IN (SELECT s.b FROM s);
215 Hash Join (cost=578.59..12368.66 rows=1007007 width=12)
216 (actual time=25.336..340.303 rows=987869 loops=1)
217 Output: r.a, r.b, t.c
218 Hash Cond: (t.a = r.a)
219 Buffers: shared hit=135
220 -> Seq Scan on public.t (cost=0.00..145.00 rows=10000 width=8)
221 (actual time=0.035..1.820 rows=10000 loops=1)
223 Buffers: shared hit=45
224 -> Hash (cost=453.59..453.59 rows=10000 width=8)
225 (actual time=25.235..25.235 rows=10000 loops=1)
227 Buckets: 16384 Batches: 1 Memory Usage: 519kB
228 Buffers: shared hit=90
229 -> Hash Join (cost=172.27..453.59 rows=10000 width=8)
230 (actual time=14.607..22.222 rows=10000 loops=1)
232 Hash Cond: (r.b = s.b)
233 Buffers: shared hit=90
234 -> Seq Scan on public.r (cost=0.00..145.00 rows=10000 width=8)
235 (actual time=0.026..1.943 rows=10000 loops=1)
237 Buffers: shared hit=45
238 -> Hash (cost=171.01..171.01 rows=101 width=4)
239 (actual time=14.558..14.559 rows=101 loops=1)
241 Buckets: 1024 Batches: 1 Memory Usage: 12kB
242 Buffers: shared hit=45
243 -> HashAggregate (cost=170.00..171.01 rows=101 width=4)
244 (actual time=14.423..14.486 rows=101 loops=1)
247 Buffers: shared hit=45
248 -> Seq Scan on public.s (cost=0.00..145.00 rows=10000 width=4)
249 (actual time=0.028..6.220 rows=10000 loops=1)
251 Buffers: shared hit=45
252 Planning time: 0.626 ms
253 Execution time: 439.179 ms
256 It now takes only 439.7 ms to execute the query, instead of the previous 4323 ms.
259 \item This was the case with (d) already.\\
260 But if we create indexes we get a merge join + hash semi join combination, which is a little bit slower.
262 Merge Join (cost=1118.26..12199.68 rows=1007007 width=12)
263 (actual time=23.151..517.241 rows=987869 loops=1)
264 Output: r.a, r.b, t.c
265 Merge Cond: (t.a = r.a)
266 Buffers: shared hit=585 read=29
267 -> Index Scan using i5 on public.t (cost=0.29..448.89 rows=10000 width=8)
268 (actual time=0.038..4.266 rows=10000 loops=1)
270 Buffers: shared hit=495 read=29
271 -> Sort (cost=1117.98..1142.98 rows=10000 width=8)
272 (actual time=23.107..186.813 rows=987870 loops=1)
275 Sort Method: quicksort Memory: 853kB
276 Buffers: shared hit=90
277 -> Hash Join (cost=172.27..453.59 rows=10000 width=8)
278 (actual time=11.223..19.180 rows=10000 loops=1)
280 Hash Cond: (r.b = s.b)
281 Buffers: shared hit=90
282 -> Seq Scan on public.r (cost=0.00..145.00 rows=10000 width=8)
283 (actual time=0.019..2.069 rows=10000 loops=1)
285 Buffers: shared hit=45
286 -> Hash (cost=171.01..171.01 rows=101 width=4)
287 (actual time=11.183..11.183 rows=101 loops=1)
289 Buckets: 1024 Batches: 1 Memory Usage: 12kB
290 Buffers: shared hit=45
291 -> HashAggregate (cost=170.00..171.01 rows=101 width=4)
292 (actual time=11.079..11.126 rows=101 loops=1)
295 Buffers: shared hit=45
296 -> Seq Scan on public.s (cost=0.00..145.00 rows=10000 width=4)
297 (actual time=0.019..4.724 rows=10000 loops=1)
299 Buffers: shared hit=45
300 Planning time: 1.320 ms
301 Execution time: 624.356 ms
308 RESET ALL; EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
309 SELECT a,b,c FROM r NATURAL JOIN s WHERE a IN (SELECT t.a FROM t);
312 Without index: hash join + hash semi join
314 Hash Join (cost=440.25..1943.04 rows=112490 width=12)
315 (actual time=20.751..57.164 rows=105628 loops=1)
316 Planning time: 0.836 ms
317 Execution time: 67.660 ms
320 With index: hash join + merge join.
322 Hash Join (cost=440.59..1817.14 rows=112490 width=12)
323 (actual time=19.917..53.721 rows=105628 loops=1)
324 Planning time: 1.938 ms
325 Execution time: 65.131 ms
331 RESET ALL; EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
332 SELECT a,b,c FROM s NATURAL JOIN t WHERE a IN (SELECT r.a FROM r);
335 Without index: hash join + hash semi join
337 Hash Join (cost=442.27..56192.50 rows=4884102 width=12)
338 (actual time=17.134..1482.398 rows=4884102 loops=1)
339 Planning time: 0.547 ms
340 Execution time: 1980.768 ms
343 With index: hash join + hash join.
345 Hash Join (cost=442.27..56192.50 rows=4884102 width=12)
346 (actual time=17.631..1541.867 rows=4884102 loops=1)
347 Planning time: 0.978 ms
348 Execution time: 2040.407 ms
351 We suspect that performance varies likely because the dataset is not a balanced triangle and in some cases results can be discarded earlier. Also resultset deduplication may play a role.