Binary search trees

Binary search trees are Trees with rank = 2.

Operations on a sorted array

  • Search $\Theta(\log n)$
  • Select $O(1)$
  • min/max $O(1)$
  • pred/succ $O(1)$
  • rank $O(\log n)$
  • insertion/deletion $\Theta(n)$

Is there a data structure that allows better insertion and deletes?

Balanced trees:

  • operations like on sorted arrays
  • but with fast (logarithmic) inserts and deletes

Basic version of node

  • left child pointer
  • right child pointer
  • parent pointer

Search tree property: for a node with element $x$

  • on the left - all elements are less than $x$
  • on the right - all elements are greater than $x$


Height

  • from $\approx \log_2 n$ to $\approx n$
  • worst case - $\approx n$, like a chain
  • to avoid the worst case we need trees that can rebalance themselves


Operations

Search

Search($k$):

  • start at the root
  • if $k < \text{key}$, go left
  • if $k > \text{key}$, go right
  • return node with key $k$ or $\text{null}$


Insert

Insert($k$):

  • search for $k$
  • rewrite final $\text{null}$ pointer to point to new node with key $k$

worst-case running time $O(\text{height})$


Min (Max)

Min($k$)/Max($k$):

  • start at root
  • and follow left (right) child pointer


Pred

Pred should return next smallest element after given

Pred($k$):

  • if $k$'s subtree is not empty, return the max key in the left subtree
  • or follow parent pointer until you get a key less than $k$

In-Order Traversal

goal: to print out keys in increasing order

Traverse():

  • recurse on the left tree
  • print current node's key
  • recurse on the right tree

running time $O(n)$


Deletion

Delete($k$):

  • search for $k$
  • if $k$ has no children
    • just delete the node
  • if $k$ has one child
    • the child gets the pointer of $k$
  • if $k$ has 2 children
    • compute $k$'s predecessor $l$
      • traverse $k$'s non-NULL left child pointer
      • then right-child pointer
      • until no longer possible
    • swap $k$ and $l$
    • in a new position it's easy to delete $k$
      • it has no left child


Select

goal: to retrieve $i$th order statistics

Need to store additional information for that

  • $\text{size}(x)$ - number of subtree nodes at subtree rooted at $x$
  • $\text{size}(x) = \text{size}(l) + \text{size}(r) + 1$

Select($i$):

  • start at root $x$ with left child $\text{lt}$ and right child $\text{rt}$
  • $a = \text{size(lt)}$
  • if $a = i - 1$
    • return $x$'s key
  • if $a \geqslant i$
    • recursively compute $i$th order statistics on $\text{lt}$
  • if $a \leqslant i - 1$
    • recursively compute $(i - a - 1)$th order statistics on $r$

running time $\Theta(\text{height})$


Rank

Goal: to compute how many keys are less or equal to that value

Rank:

  • $a$ = size(lt)
  • return $a + 1$


See also

Sources

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".

Share your opinion