Binary Search Tree

Introduction

A Binary Search Tree (BST) is a binary tree organized so that every node’s key is greater than all keys in its left subtree and less than all keys in its right subtree. This simple ordering enables fast search, insertion, and deletion without shifting elements in memory. In-order traversal of a BST always visits keys in ascending order, which makes it a natural structure for maintaining a dynamic sorted set. The running time of core operations depends on tree height: balanced trees achieve logarithmic time, while highly skewed trees degrade to linear time. BSTs form the conceptual base of many indexed containers and are stepping stones to self-balancing variants such as AVL and Red-Black Trees. In practice, BSTs handle operations like membership checks, range queries, and order-statistics efficiently when kept reasonably balanced. This article explains the core theory, step-by-step algorithms, worked examples, practical pros/cons, and real applications, followed by clean C and Python implementations.

Theory (from book)

A BST enforces the binary search tree property at every node: keys in the left subtree are strictly smaller than the node’s key, and keys in the right subtree are strictly greater. This rule holds recursively for all subtrees. A direct consequence is that an in-order traversal (Left, Root, Right) yields keys in non-decreasing order, which doubles as a way to “verify” BST ordering. The height of a node is the maximum number of edges on a path from that node to any leaf. Operation costs are proportional to height: O(log n) when the height is logarithmic (balanced), and O(n) when the tree becomes a chain (skewed). Handling duplicates is a design choice: either disallow duplicates, or direct equal keys consistently to one side (e.g., to the left) to preserve an unambiguous structure. Common terminology includes root, internal nodes, leaves, subtree, successor (next key by in-order), and predecessor (previous key by in-order).

Algorithm Steps

BST procedures follow the ordering property to eliminate half the search space at each decision point. Search compares the target with the current node and branches left or right until the match is found or a null link is encountered. Insertion mirrors search and creates a new leaf at the first null position. Deletion maintains ordering with three distinct situations: removing a leaf, removing a node with one child (bypass it), or removing a node with two children (replace with in-order successor or predecessor, then delete that node). Traversals visit nodes in structured orders: in-order (sorted output), pre-order (root before subtrees), and post-order (root after subtrees). Minimum and maximum are the left-most and right-most nodes respectively. Each operation’s complexity is governed by the number of edges descended, i.e., the tree’s height.

Search

Start from the root and compare the target key k with the current node’s key x. If k == x, return the node. If k < x, move to the left child; if k > x, move to the right child. Repeat until you find the key or hit a null pointer. This logic prunes half of the remaining keys at each step in a balanced tree and is the reason search is O(log n) on average when the height is logarithmic. An iterative approach avoids recursion overhead, while a recursive approach is concise and mirrors the recursive BST property. If the search descends past a leaf to a null link, the key is not in the tree. This same path can be reused by insertion to place a new node at the discovered null link, keeping the shape consistent with the ordering property.

Insertion

Compare the new key with the current node, branching left for smaller and right for larger keys. When a null position is reached, create a node there. This inserts at a leaf without disturbing existing links. The algorithm is naturally recursive: set node->left = insert(node->left, key) or node->right = insert(node->right, key) accordingly, and return node to preserve parent links during unwinding. If duplicates are disallowed, stop when an equal key is encountered. If duplicates are allowed by policy, direct equals to a chosen side consistently. Insertion cost matches the search path length, so it is O(height). With balanced height, this is O(log n); with sorted input into an unbalanced BST, height can become n and insertion degrades to O(n). Simple mitigations include randomized insertion orders or switching to self-balancing trees.

Deletion

Locate the node to delete, then handle one of three cases. Case 1: if it is a leaf, remove it by setting the parent’s corresponding child pointer to null. Case 2: if it has exactly one child, splice it out by linking its parent directly to its child. Case 3: if it has two children, replace its key with that of its in-order successor (smallest key in the right subtree) or in-order predecessor (largest in the left subtree), then delete that successor/predecessor node, which will be leaf or single-child and thus simpler. This preserves BST ordering. The traversal to find the node and the successor/predecessor is O(height). Like other operations, performance is O(log n) when balanced and O(n) when skewed. Careful pointer updates ensure structure integrity during the operation.

Example

Insert the keys [13, 7, 15, 3, 8, 14, 19, 18] in that order. The root becomes 13. Keys less than 13 (7, 3, 8) form the left subtree; keys greater than 13 (15, 14, 19, 18) form the right subtree. In-order traversal visits [3, 7, 8, 13, 14, 15, 18, 19], which confirms sorting. The minimum is the left-most node (3) and the maximum is the right-most node (19). Deleting 15 (two children) uses the in-order successor approach: find the smallest key in its right subtree (18), copy 18 over 15, then delete the original 18 node (a leaf or single-child). The tree remains ordered without shifting data in contiguous memory. As a contrast, inserting keys in strictly increasing order like [1, 2, 3, 4, 5] creates a right-skewed chain resembling a linked list, causing operations to degrade to O(n). Balanced insertion orders or self-balancing trees avoid this pitfall.

Code (C & Python)

The following reference implementations cover search, insert, delete, and in-order traversal. Both versions use the in-order successor to handle deletion for nodes with two children. The recursive style aligns with the BST’s recursive definition and keeps code minimal. Each operation descends along a single path based on comparisons, giving O(height) time. In practice, consider input shuffling, randomized insertion, or a self-balancing variant to maintain good performance across workloads. The C version uses pointers and manual memory management; the Python version wraps logic in a class for clarity. In-order traversal is included to demonstrate the sorted output property and to aid testing. Always free nodes (C) or let the garbage collector reclaim memory (Python) after deletions.


// C implementation of a Binary Search Tree
#include <stdio.h>
#include <stdlib.h>

typedef struct Node {
    int key;
    struct Node *left, *right;
} Node;

Node* newNode(int key) {
    Node* n = (Node*)malloc(sizeof(Node));
    n->key = key; n->left = n->right = NULL;
    return n;
}

Node* insert(Node* root, int key) {
    if (root == NULL) return newNode(key);
    if (key < root->key) root->left = insert(root->left, key);
    else if (key > root->key) root->right = insert(root->right, key);
    // if equal, ignore or choose a side consistently
    return root;
}

Node* search(Node* root, int key) {
    if (!root || root->key == key) return root;
    if (key < root->key) return search(root->left, key);
    return search(root->right, key);
}

Node* minValueNode(Node* root) {
    Node* cur = root;
    while (cur && cur->left) cur = cur->left;
    return cur;
}

Node* delete(Node* root, int key) {
    if (!root) return NULL;
    if (key < root->key) {
        root->left = delete(root->left, key);
    } else if (key > root->key) {
        root->right = delete(root->right, key);
    } else {
        // Found node
        if (!root->left) {
            Node* r = root->right;
            free(root);
            return r;
        } else if (!root->right) {
            Node* l = root->left;
            free(root);
            return l;
        } else {
            Node* succ = minValueNode(root->right);
            root->key = succ->key;
            root->right = delete(root->right, succ->key);
        }
    }
    return root;
}

void inorder(Node* root) {
    if (!root) return;
    inorder(root->left);
    printf("%d ", root->key);
    inorder(root->right);
}

void freeTree(Node* root) {
    if (!root) return;
    freeTree(root->left);
    freeTree(root->right);
    free(root);
}

int main() {
    int keys[] = {13,7,15,3,8,14,19,18};
    int n = sizeof(keys)/sizeof(keys[0]);
    Node* root = NULL;
    for (int i=0; i<n; ++i) root = insert(root, keys[i]);

    printf("In-order: ");
    inorder(root); printf("\n");

    root = delete(root, 15);
    printf("After deleting 15: ");
    inorder(root); printf("\n");

    Node* f = search(root, 8);
    printf("Search 8: %s\n", f ? "found" : "not found");

    freeTree(root);
    return 0;
}

# Python implementation of a Binary Search Tree
from typing import Optional, List

class Node:
    def __init__(self, key: int):
        self.key = key
        self.left: Optional["Node"] = None
        self.right: Optional["Node"] = None

class BST:
    def __init__(self):
        self.root: Optional[Node] = None

    def insert(self, key: int) -> None:
        def _ins(node: Optional[Node], key: int) -> Node:
            if node is None:
                return Node(key)
            if key < node.key:
                node.left = _ins(node.left, key)
            elif key > node.key:
                node.right = _ins(node.right, key)
            return node
        self.root = _ins(self.root, key)

    def search(self, key: int) -> Optional[Node]:
        cur = self.root
        while cur:
            if key == cur.key:
                return cur
            cur = cur.left if key < cur.key else cur.right
        return None

    def _min_node(self, node: Node) -> Node:
        cur = node
        while cur.left:
            cur = cur.left
        return cur

    def delete(self, key: int) -> None:
        def _del(node: Optional[Node], key: int) -> Optional[Node]:
            if node is None:
                return None
            if key < node.key:
                node.left = _del(node.left, key)
            elif key > node.key:
                node.right = _del(node.right, key)
            else:
                if node.left is None:
                    return node.right
                if node.right is None:
                    return node.left
                succ = self._min_node(node.right)
                node.key = succ.key
                node.right = _del(node.right, succ.key)
            return node
        self.root = _del(self.root, key)

    def inorder(self) -> List[int]:
        res: List[int] = []
        def _in(node: Optional[Node]):
            if not node: return
            _in(node.left); res.append(node.key); _in(node.right)
        _in(self.root)
        return res

if __name__ == "__main__":
    bst = BST()
    for k in [13,7,15,3,8,14,19,18]:
        bst.insert(k)
    print("In-order:", bst.inorder())
    bst.delete(15)
    print("After deleting 15:", bst.inorder())
    print("Search 8:", "found" if bst.search(8) else "not found")

Dry Run

Build the tree by inserting 13 → root, 7 → left of 13, 15 → right of 13, 3 → left of 7, 8 → right of 7, 14 → left of 15, 19 → right of 15, 18 → left of 19. In-order traversal visits 3, 7, 8, 13, 14, 15, 18, 19. Now delete 15. Locate 15 at root’s right child. It has two children, so find its in-order successor: the smallest node in its right subtree. Move to 19, then left to 18; 18 is the successor. Copy 18 into the node that currently holds 15 (that node’s key becomes 18). Next, delete the original 18 node from the right subtree of 18/19. The original 18 is a left child of 19 with no left child; replace it with its right child (null), effectively removing it. The final in-order traversal is 3, 7, 8, 13, 14, 18, 19, confirming the tree remains ordered. Searches follow the same left/right decisions: searching 8 goes 13 → left to 7 → right to 8.

Time Complexity (HTML Table)

Operation Best Average Worst Notes
Search O(1) O(log n) O(n) O(1) when key at root; depends on height otherwise
Insert O(1) O(log n) O(n) Leaf insertion after a search path
Delete O(1) O(log n) O(n) Successor/predecessor step still O(height)
Find Min/Max O(1) O(log n) O(n) Left-most or right-most descent
Traversal (in-order) O(n) O(n) O(n) Visits each node exactly once
Space (extra) O(1) O(1) O(1) Plus recursion stack O(height) if recursive

Advantages & Disadvantages

  • Advantages:
    • Maintains a dynamic set in sorted order; in-order traversal yields sorted output.
    • Search/insert/delete run in O(log n) when height is logarithmic; no element shifting like arrays.
    • Supports range queries (report all keys in [L, R]) efficiently via in-order traversal.
    • Easy to extend with order-statistics (k-th smallest) and predecessor/successor queries.
  • Disadvantages:
    • Performance degrades to O(n) with skewed shapes (e.g., sorted insert order).
    • Pointer-heavy; higher memory overhead than arrays and may be less cache-friendly.
    • Requires careful handling of deletion cases to preserve ordering.
    • For guaranteed bounds, a self-balancing variant (AVL, Red-Black) is preferable.

Applications

  • Symbol tables and dictionaries where keys must remain sorted for iteration.
  • Indexing for in-memory databases or compilers (identifier tables, constant pools).
  • Range queries and reporting (e.g., list all values between L and R).
  • Order-statistics (k-th smallest/largest, rank/select) with minor augmentation.
  • Autocomplete or spell-check when used with strings and comparison-based ordering.
  • Event schedulers and interval handling after augmentation (e.g., interval trees built on BSTs).
  • Foundation for balanced search trees (AVL, Red-Black) used in production map/set containers.

Conclusion

A Binary Search Tree organizes keys so that each comparison steers the search into a single subtree, making core operations efficient when the tree is balanced. The in-order traversal property provides a direct path to sorted output and underpins common tasks such as range reporting, predecessor/successor lookup, and order-statistics with simple augmentations. The critical factor governing performance is height: careful input ordering or self-balancing techniques prevent pathological cases that resemble linked lists. When guaranteed logarithmic bounds are required, choose AVL or Red-Black Trees; when simplicity and clarity are the priority, a plain BST is a great fit for learning and for moderate-sized workloads. The C and Python implementations above capture the essential algorithms used in real systems and are a solid foundation for exploring advanced, balanced variants.

Scroll to Top