Database System Catalog
To estimate a cost of Physical operators in a DBMS we use the following statistics:
- $B(R)$ - # of blocks that relation $R$ holds
- $T(R)$ - # of tuples in $R$
- typically can be used to calculate $B(R)$ when we know how many bytes we have per block
-
$V(R, A_1, …, A_n) = \delta \pi_{A_1, …, A_n} (R) $ - # of distinct values
This statistics in DBMS is a ‘‘system catalog’’
- they are regularly collected (when needed, scheduled, etc)
- and regularly revisited
- note that this data is kept only for base relations, not for subresults of a query| | |
Statistics
For base relations we typically have some Histograms that show how values are distributed
Equal-Width Histogram
- In this type of histograms the values are grouped in equal-width buckets
- We assume that the values are distributed uniformly within there buckets
Example
range | [1, 10) | [11, 20) | [21, 30) | [31, 40) | [41, 50) | # of tuples | 50 | 2000 | 2000 | 3000 | 2950 |
See Also
Sources
- Database Systems Architecture (ULB)
- Database Systems: The Complete Book (2nd edition) by H. Garcia-Molina, J. D. Ullman, and J. Widom