Database System Catalog
To estimate a cost of Physical operators in a DBMS we use the following statistics:
- $B(R)$ - # of blocks that relation $R$ holds
- $T(R)$ - # of tuples in $R$
- typically can be used to calculate $B(R)$ when we know how many bytes we have per block
- $V(R, A_1, ..., A_n) = | \delta \pi_{A_1, ..., A_n} (R) |$ - # of distinct values
This statistics in DBMS is a system catalog
- they are regularly collected (when needed, scheduled, etc)
- and regularly revisited
- note that this data is kept only for base relations, not for subresults of a query!
Statistics
For base relations we typically have some Histograms that show how values are distributed
- In this type of histograms the values are grouped in equal-width buckets
- We assume that the values are distributed uniformly within there buckets
Example
range
|
[1, 10) |
[11, 20) |
[21, 30) |
[31, 40) |
[41, 50)
|
# of tuples
|
50 |
2000 |
2000 |
3000 |
2950
|
See Also
Sources