3 \begin{enumerate}[label=(\alph*)]
4 \item\textbf{Suppose we do not use a combiner at the Map tasks. Do you expect there to be skew in
5 the times taken by the various reducers to process their value lists? Why or why not?}\\
7 Yes. Some words (e.g. ''I'', ''a'', ''are'',..) will occur very often, the
8 reducers handling that key will take very long to process all values.
9 Other reducers handling less frequently used words will be finished in
12 \item\textbf{If we combine the reducer functions into a small number of Reduce tasks, say 10 task,
13 at random, do you expect the skew to be significant? What if we instead combined the
14 reducers into 10.000 Reduce tasks?!}\\
16 With less reducers the total time will go up, but the skew will be
17 less, as the long lists are more likely to be moved to a reducer which
18 had short lists before (as that one will be finished sooner)
19 With 10000 reducers the skew will go up, as a few of those reducers
20 will have to handle very long lists, while the others will finish early
21 and idle. This is still assuming that we don't have a combiner.
23 \item\textbf{Suppose we do use a combiner at the 100 Map tasks. Do you expect the skew to be
24 significant? Why or why not?}\\
26 It will be much less skew than if we don't use a combiner. A lot of
27 words appear lots of times on a page, so pre-processing them with a
28 combiner will reduce the number of values for the reducers
29 significantly, and this also will reduce communication cost.
31 \item\textbf{ Suppose we increase the number of Reduce tasks significantly, say to 1.000.000. Do you
32 expect the communication cost to increase? What about the replication rate or reducer
33 size? Justify your answers.}\\
35 Communication cost will increase, as the reducer size can be set to
36 lower values. This will also reduce the skew in the times taken by the
38 Replication rate will stay the same, as this is determined by mapped
39 values divided by inputs (which is about 2500 for an average web-page)
40 Reducer size can go down, in order to reduce reducer runtime skew.