Databases

A database

  • is a collection of information that is organized to afford efficient retrieval
  • this collection exists over a long period of time

Data in a database should be self-describing and have a schema

What problems databases solve?

  • Sharing
should support concurrent access between multiple readers and writers
should make sure all applications see clean and organized data
should work with datasets too large to fit into main memory
  • Flexibility
should allow using the data in new unexpected ways


DBMS

  • usually the term database refers to a collection of data that is managed by a DBMS - a tool for managing large amounts of data

A Database Management System (DBMS) is expected to (by Data Model Enforcement)

  • allow users to create DBs and specify the schema - logical structure of the data
(using DDL - data definition language)
  • allows to query and modify the data with some query language or data manipulation language
  • support storing very large amounts of data
  • etc


Classical DBMS Architecture

db-architecture.png


Recovery Manager

deals with Crash Recovery

Concurrency Control

  • Transaction Manager is responsible for receiving read and write requests (SQL is eventually translated to them)
  • it has a Scheduler: a component that schedules commands in some sequence thus creating an impression that all users work in isolation


Query Evaluation Engine

Responsible for Query Processing


File & Access Methods

  • Provides Wrapper Around Buffer Manager
  • here B-Tree and other Indexes are implemented


Buffer Manager

Buffer Manager is mediator between external storage and main memory (see Memory Hierarchy)

Main Responsibility: Partitioning main memory into buffers

  • it maintains a buffer pool
  • it's a collection of memory slots (called buffers)
  • a buffer is a page-sized regions into which disk blocks are transferred
  • disk blocks are brought into memory per request
    • sometimes it may allocate more blocks when asked - in anticipation that some blocks will be needed

db-architecture-buffermanager.png

A replacement policy decides which block gets evicted when the buffer pool is full

  • popular policies" FIFO, Least Recently Used, Clock, etc

Blocks Management

  • Higher levels don't care care if a block in memory or not
    • BM loads it if it's not
    • BM doesn't load if it's already there
    • if no empty buffers, but need to load something, it uses the replacing policy
  • Higher levels also inform when a block is no longer needed
    • so BM can reuse the space
  • pinned block - block that should remain in the memory because it's still needed
    • pinning - making a block pinned
    • unpinning - telling BM that a block is no longer needed
  • if a block is modified, BM makes sure the changes are propagated to dosk


Disk Space Manager

sometimes also Storage Manager

  • controls where the data in main memory or on disk is stored
  • keeps track on locations of data requested by buffer manager
  • deals with requests from upper layers to allocate, deallocate, read and write blocks
  • hides details of underlaying hardware and OS
  • typically uses functionality provided by OS


Stored Information

  • data - content of the DS
  • metadata - DB schema that describes the DB
  • log records - information about recent changes to the database
  • Statistics - sizes, values, relation to other components of DB, stored in Database System Catalog
  • Indexes to support efficient access to data


Databases


Sources

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".

Share your opinion