From 89d695058df25d3dade11aa91fb10432da52ec2a Mon Sep 17 00:00:00 2001 From: David Kaufmann Date: Wed, 15 May 2019 04:06:40 +0200 Subject: [PATCH] add ex2.2 --- ex2/main_2.tex | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/ex2/main_2.tex b/ex2/main_2.tex index c98c97c..1d03463 100644 --- a/ex2/main_2.tex +++ b/ex2/main_2.tex @@ -1 +1,28 @@ %ex2.2 + +\begin{enumerate}[label=(\alph*)] + \item + Yes. Some words (e.g. "I", "a", "are",..) will occur very often, the + reducers handling that key will take very long to process all values. + Other reducers handling less frequently used words will be finished in + less time by far. + \item + With less reducers the total time will go up, but the skew will be + less, as the long lists are more likely to be moved to a reducer which + had short lists before (as that one will be finished sooner) + With 10000 reducers the skew will go up, as a few of those reducers + will have to handle very long lists, while the others will finish early + and idle. This is still assuming that we don't have a combiner. + \item + It will be much less skew than if we don't use a combiner. A lot of + words appear lots of times on a page, so pre-processing them with a + combiner will reduce the number of values for the reducers + significantly, and this also will reduce communication cost. + \item + Communication cost will increase, as the reducer size can be set to + lower values. This will also reduce the skew in the times taken by the + reducers. + Replication rate will stay the same, as this is determined by mapped + values divided by inputs (which is about 2500 for an average webpage) + Reducer size can go down, in order to reduce reducer runtime skew. +\end{enumerate} -- 2.43.0