ex2/main_2.tex

   1 %ex2.2
   2
   3 \begin{enumerate}[label=(\alph*)]
   4         \item
   5                 Yes. Some words (e.g. "I", "a", "are",..) will occur very often, the
   6                 reducers handling that key will take very long to process all values.
   7                 Other reducers handling less frequently used words will be finished in
   8                 less time by far.
   9         \item
  10                 With less reducers the total time will go up, but the skew will be
  11                 less, as the long lists are more likely to be moved to a reducer which
  12                 had short lists before (as that one will be finished sooner)
  13                 With 10000 reducers the skew will go up, as a few of those reducers
  14                 will have to handle very long lists, while the others will finish early
  15                 and idle. This is still assuming that we don't have a combiner.
  16         \item
  17                 It will be much less skew than if we don't use a combiner. A lot of
  18                 words appear lots of times on a page, so pre-processing them with a
  19                 combiner will reduce the number of values for the reducers
  20                 significantly, and this also will reduce communication cost.
  21         \item
  22                 Communication cost will increase, as the reducer size can be set to
  23                 lower values. This will also reduce the skew in the times taken by the
  24                 reducers.
  25                 Replication rate will stay the same, as this is determined by mapped
  26                 values divided by inputs (which is about 2500 for an average webpage)
  27                 Reducer size can go down, in order to reduce reducer runtime skew.
  28 \end{enumerate}