Databases
A database
- is a collection of information that is organized to afford efficient retrieval
- this collection exists over a long period of time
Data in a database should be self-describing and have a schema
What problems databases solve?
- should support concurrent access between multiple readers and writers
- should make sure all applications see clean and organized data
- should work with datasets too large to fit into main memory
- should allow using the data in new unexpected ways
DBMS
- usually the term database refers to a collection of data that is managed by a DBMS - a tool for managing large amounts of data
A Database Management System (DBMS) is expected to (by Data Model Enforcement)
- allow users to create DBs and specify the schema - logical structure of the data
- (using DDL - data definition language)
- allows to query and modify the data with some query language or data manipulation language
- support storing very large amounts of data
- etc
Classical DBMS Architecture
Recovery Manager
deals with Crash Recovery
- Transaction Manager is responsible for receiving read and write requests (SQL is eventually translated to them)
- it has a Scheduler: a component that schedules commands in some sequence thus creating an impression that all users work in isolation
Query Evaluation Engine
Responsible for Query Processing
File & Access Methods
- Provides Wrapper Around Buffer Manager
- here B-Tree and other Indexes are implemented
Buffer Manager
Buffer Manager is mediator between external storage and main memory (see Memory Hierarchy)
Main Responsibility: Partitioning main memory into buffers
- it maintains a buffer pool
- it's a collection of memory slots (called buffers)
- a buffer is a page-sized regions into which disk blocks are transferred
- disk blocks are brought into memory per request
- sometimes it may allocate more blocks when asked - in anticipation that some blocks will be needed
A replacement policy decides which block gets evicted when the buffer pool is full
- popular policies" FIFO, Least Recently Used, Clock, etc
Blocks Management
- Higher levels don't care care if a block in memory or not
- BM loads it if it's not
- BM doesn't load if it's already there
- if no empty buffers, but need to load something, it uses the replacing policy
- Higher levels also inform when a block is no longer needed
- so BM can reuse the space
- pinned block - block that should remain in the memory because it's still needed
- pinning - making a block pinned
- unpinning - telling BM that a block is no longer needed
- if a block is modified, BM makes sure the changes are propagated to dosk
Disk Space Manager
sometimes also Storage Manager
- controls where the data in main memory or on disk is stored
- keeps track on locations of data requested by buffer manager
- deals with requests from upper layers to allocate, deallocate, read and write blocks
- hides details of underlaying hardware and OS
- typically uses functionality provided by OS
Stored Information
- data - content of the DS
- metadata - DB schema that describes the DB
- log records - information about recent changes to the database
- Statistics - sizes, values, relation to other components of DB, stored in Database System Catalog
- Indexes to support efficient access to data
Databases
Sources