From ccde6c717ed627ce4ae8dd25b8124a9b4ebb20ee Mon Sep 17 00:00:00 2001 From: David Kaufmann Date: Sun, 7 Apr 2019 23:55:27 +0200 Subject: [PATCH] cleanup --- ex1/main_3.tex | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/ex1/main_3.tex b/ex1/main_3.tex index 04a071a..7aa5f22 100644 --- a/ex1/main_3.tex +++ b/ex1/main_3.tex @@ -44,22 +44,23 @@ Known Distribution Table: \\ \begin{enumerate}[label=\roman*)] \item -Known Excluded by $ votes != 9 $: $320$ \\ -Known Included by $ votes < 30 $: $440 + 320 + 120 = 880$ \\ +Known Excluded by $votes$ $!=$ $9$: $320$ \\ +Known Included by $votes$ $<$ $30$: $440 + 320 + 120 = 880$ \\ Assumed uniform distributed values: $ numRecords - knownExcluded - knownIncluded = 4000 - 880 - 320 = 2800$\\ \\ Number of values below 30: $30$\\ Number of known values below 30: $4$\\ -Average votes per record: $\frac{2800}{175-4} \approx 16.37$\\ +Average rows per unique votes value:\\ +$avgRowsPerVotes = \frac{2800}{175-4} \approx 16.37$\\ Assumed uniform values below 30:\\ -$avgVotesPerRecord * (30-4) = 16.37 * 26 \approx 425.73$\\ +$avgRowsPerVotes * (30-4) = 16.37 * 26 \approx 425.73$\\ Total values below 30: $knownIncluded + assumedUniformValues = 880 + 425.73 = 1305.73$\\ $selectivity = totalValues / numRecords = 1305.73 / 4000 \approx 0.326 \approx 32.6\%$\\ \item Known rows from first filter (votes = 15): 320\\ -Assumed rows $> 50$: $avgVotesPerRecord * (175-50) = 16.37 * 125 \approx 2046.78$\\ +Assumed rows $> 50$: $avgRowsPerVote * (175-50) = 16.37 * 125 \approx 2046.78$\\ Assume no overlaps, so just sum up both selectivities:\\ $assumedRows = knownIsFifteen + assumedOverFifty = 320 + 2046.78 = 2366.78$\\ $selectivity = assumedRows / numRecords = 2366.78 / 4000 = 0.5916 \approx 59.2\%$\\ @@ -69,27 +70,39 @@ $selectivity = assumedRows / numRecords = 2366.78 / 4000 = 0.5916 \approx 59.2\% \begin{enumerate}[label=\roman*)] \item Selector over course:\\ Filter: $coursename$ $=$ $\textquotedbl$ADBS$\textquotedbl$ $OR$ $ects > 6$\\ -\\ -Again assume no overlap (as ADBS has 6 ects), so OR just gets summed up.\\ -\\ + +Again assume no overlap (as ADBS has 6 ects), so OR just gets summed up. + +We'd assume the same even without knowledge of how many ECTS ADBS has. + Assume uniform distribution over $course.coursename$:\\ $numADBSCourses = (1 / distinctCourses) * numRows = (1/490) * 540 = 1.102$\\ $bucketSize = (numRows / numBuckets) = 540 / 5 = 108$\\ -$numberOfBucketsOverSix = 1 + \frac{2}{3} = \frac{5}{3}$\\ -$numCoursesOverSixECTS = numberOfBucketsOverSix * bucketSize = \frac{5}{3} * 108 = 180$\\ -$selectivity = (1.102 + 180) / 540 = 181.102 / 540 = 0.335 \approx 33.5\%$\\ +$bucketsOverSix = 1 + \frac{2}{3} = \frac{5}{3}$\\ +$coursesOverSix = bucketsOverSix * bucketSize = \frac{5}{3} * 108 = 180$\\ +$selectivity = (1.102 + 180) / 540 = 181.102 / 540 = 0.335 \approx 33.5\%$ + \item Selector over room:\\ Filter: $capacity < 300$ $AND$ $building$ $!=$ \textquotedbl$Freihaus$\textquotedbl\\ -\\ + $bucketSize = (numRows / numBuckets) = 1700 / 5 = 340$\\ $bucketsBelow300 = 4.5$\\ $roomsBelow300 = bucketsBelow300 * bucketSize = 4.5 * 340 = 1530$\\ $remainingBuildings = 10$\\ $roomsNotInFreihaus = remainingBuildings / numBuildings * numRooms = 1545.45$\\ $selectivity = math.min(roomsBelow300, roomsNotInFreihaus) / numRooms$\\ -$selectivity = math.min(1530, 1545.45) / 1700 = 1530 / 1700 = 0.9 = 90\%$\\ +$selectivity = math.min(1530, 1545.45) / 1700 = 1530 / 1700 = 0.9 = 90\%$ \end{enumerate} +Join Order:\\ +Everything should be joined to $reservation$, as there are no common +columns for $room$ and $course$. + +The table $reservation$ is the largest table without any WHERE clauses, +$course$ has the lowest selectivity. + +Therefor we'd join first $reservation$ and $course$, and then join to $room$. + % (d) \item \begin{verbatim} -- 2.43.0