Pipelining

Sometimes the output of one physical operator can be used directly as input for other operator. This technique is called pipelining.

  • output of an operator is stored in a buffer that serves as input for the next operator
  • results are computed as early as possible - and its as soon as enough data is available
  • no need to wait unit the previous operator finishes its work
  • dramatically speeds up the execution process!


Operators

Operators that usually can be pipelined

  • projections
  • selections
  • renaming
  • bag-based union
  • merge-joins for which input is known to be sorted

An operator that cannot be pipelined is called blocking


Example

pipelining-ex.png

  • output from index scan on $R$ can be pipelined to filter
  • filter output can be pipelined to union
  • union result can be pipelined to projection
  • (given we have enough memory buffers available)


Materialization

When we cannot pipeline, we have to materialize everything. It means we have to write all the intermediate sub-results to disk.

  • materialization.png
  • also the next operator cannot start working until everything is materialized


Sources

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".

Share your opinion