# ML Wiki

## Database System Catalog

To estimate a cost of Physical operators in a DBMS we use the following statistics:

• $B(R)$ - # of blocks that relation $R$ holds
• $T(R)$ - # of tuples in $R$
• typically can be used to calculate $B(R)$ when we know how many bytes we have per block
• $V(R, A_1, ..., A_n) = | \delta \pi_{A_1, ..., A_n} (R) |$ - # of distinct values

This statistics in DBMS is a system catalog

• they are regularly collected (when needed, scheduled, etc)
• and regularly revisited
• note that this data is kept only for base relations, not for subresults of a query!

## Statistics

For base relations we typically have some Histograms that show how values are distributed

### Equal-Width Histogram

• In this type of histograms the values are grouped in equal-width buckets
• We assume that the values are distributed uniformly within there buckets

Example

 range # of tuples [1, 10) [11, 20) [21, 30) [31, 40) [41, 50) 50 2000 2000 3000 2950

## Sources

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".

Share your opinion