Chapter 21: Union-Find (Disjoint Set Union): Efficient Set Operations

Welcome back, future algorithm master! In our journey through the fascinating world of Data Structures and Algorithms, we’ve encountered many ways to organize and manipulate data. Today, we’re going to dive into a truly elegant and powerful data structure known as Union-Find, also frequently called Disjoint Set Union (DSU).

This chapter will equip you with the knowledge to efficiently manage collections of elements that are partitioned into a number of non-overlapping (disjoint) sets. You’ll learn the core operations, understand the subtle but critical optimizations that make it incredibly fast, implement it step-by-step in TypeScript, and explore its practical applications in areas like graph algorithms and network connectivity. Get ready to add another powerful tool to your algorithmic toolkit!

What is a Disjoint Set?

Before we jump into the Union-Find structure itself, let’s make sure we’re clear on what a “disjoint set” is. Imagine you have a collection of items, say numbers [0, 1, 2, 3, 4, 5]. If we partition these items into sets such that no item belongs to more than one set, and all items are accounted for, then these are disjoint sets.

For example, [0, 1], [2, 3], and [4, 5] are three disjoint sets. No element is shared between them, and together they cover all elements from 0 to 5.

The Union-Find data structure is designed to perform two primary operations on these disjoint sets:

find(element): Determine which set an element belongs to. This is usually done by returning a “representative” or “root” element of that set. If find(A) and find(B) return the same representative, it means A and B are in the same set.
union(elementA, elementB): Merge the sets containing elementA and elementB into a single, combined set.

Think of it like managing friendships. Initially, everyone is in their own “friend group.” If Alice and Bob become friends, their groups merge. If Bob and Charlie were already friends, and now Alice and Bob are friends, then Alice, Bob, and Charlie are all in one big friend group. Union-Find helps us efficiently track these groups.

Core Concepts: Building Our Union-Find Structure

Let’s start with a basic idea of how we might represent these sets. We can use an array, let’s call it parent, where parent[i] stores the parent of element i.

Initializing the Sets

When we start, every element is in its own distinct set. How can we represent this? Simple: each element is its own parent.

If we have N elements (say, from 0 to N-1), our parent array would look like this:

parent = [0, 1, 2, ..., N-1]

Here, parent[i] = i means that i is the “root” or “representative” of its own set.

The Naive `find` Operation

To find the representative of an element i, we simply follow the parent links until we reach an element that is its own parent (i.e., parent[j] === j). This element is the root of the set.

// Conceptual Naive Find
function findNaive(i: number, parent: number[]): number {
    while (parent[i] !== i) {
        i = parent[i];
    }
    return i;
}

The Naive `union` Operation

To merge the sets containing elementA and elementB, we first find the representatives of their respective sets using findNaive. Let’s say rootA is the representative of elementA’s set, and rootB is the representative of elementB’s set.

If rootA is different from rootB (meaning they are in different sets), we then “unite” them by making one root the parent of the other. For simplicity, let’s say we always make rootB’s parent rootA.

// Conceptual Naive Union
function unionNaive(elementA: number, elementB: number, parent: number[]): void {
    const rootA = findNaive(elementA, parent);
    const rootB = findNaive(elementB, parent);

    if (rootA !== rootB) {
        parent[rootB] = rootA; // Make rootA the parent of rootB
    }
}

Why Naive is Not Enough: The Problem of Skewed Trees

Consider this sequence of union operations on elements 0 through 4:

union(0, 1): parent = [0, 0, 2, 3, 4] (0 is parent of 1)
union(1, 2): parent = [0, 0, 1, 3, 4] (1’s root is 0, 2’s root is 2. Make 0 parent of 2. parent[2] becomes 0) Wait, parent[2] should become root of 1 which is 0. So parent[2] = 0. Let’s trace: union(0, 1): find(0) is 0, find(1) is 1. parent[1] = 0. parent = [0, 0, 2, 3, 4] union(1, 2): find(1) is 0, find(2) is 2. parent[2] = 0. parent = [0, 0, 0, 3, 4] union(2, 3): find(2) is 0, find(3) is 3. parent[3] = 0. parent = [0, 0, 0, 0, 4] union(3, 4): find(3) is 0, find(4) is 4. parent[4] = 0. parent = [0, 0, 0, 0, 0]

Now, if we call find(4), it will go 4 -> parent[4] (0). That’s just one step. But what if we made i+1 the parent of i?

union(0, 1): parent[0] = 1. parent = [1, 1, 2, 3, 4] (1 is parent of 0)
union(1, 2): root(1) is 1, root(2) is 2. parent[1] = 2. parent = [1, 2, 2, 3, 4] Wait, this is wrong. find(0) is 1, find(1) is 1, find(2) is 2. union(0,1): rootA = 0, rootB = 1. parent[rootB] = rootA -> parent[1] = 0. parent = [0, 0, 2, 3, 4] union(1,2): rootA = find(1) = 0, rootB = find(2) = 2. parent[rootB] = rootA -> parent[2] = 0. parent = [0, 0, 0, 3, 4] union(2,3): rootA = find(2) = 0, rootB = find(3) = 3. parent[rootB] = rootA -> parent[3] = 0. parent = [0, 0, 0, 0, 4] union(3,4): rootA = find(3) = 0, rootB = find(4) = 4. parent[rootB] = rootA -> parent[4] = 0. parent = [0, 0, 0, 0, 0]

This is a flat tree, which is good. But what if the union operation always made the first element’s root the parent of the second? Let’s try union(i+1, i): Initial: parent = [0, 1, 2, 3, 4]

union(1, 0): rootA = 1, rootB = 0. parent[0] = 1. parent = [1, 1, 2, 3, 4] (0 points to 1)
union(2, 1): rootA = 2, rootB = find(1) = 1. parent[1] = 2. parent = [1, 2, 2, 3, 4] (1 points to 2, 0 points to 1 which points to 2)
union(3, 2): rootA = 3, rootB = find(2) = 2. parent[2] = 3. parent = [1, 2, 3, 3, 4] (2 points to 3, 1 points to 2, 0 points to 1)
union(4, 3): rootA = 4, rootB = find(3) = 3. parent[3] = 4. parent = [1, 2, 3, 4, 4] (3 points to 4, 2 points to 3, 1 points to 2, 0 points to 1)

Now, if you call find(0), it would trace 0 -> 1 -> 2 -> 3 -> 4. This is a path of length 4, which is O(N) for N elements. In the worst case, find and union operations could take O(N) time, making M operations take O(M*N) time. This is not efficient for large datasets!

We need optimizations to keep our trees shallow.

Optimization 1: Path Compression (for `find`)

Path compression is a brilliant idea that flattens the tree structure during a find operation. When we traverse up the tree to find the root, why not make every node we visit point directly to the root?

Let’s say find(X) traces X -> P1 -> P2 -> Root. After finding Root, we can go back and set parent[X] = Root, parent[P1] = Root, parent[P2] = Root.

The next time we call find(X), it will immediately point to the root, taking O(1) time! This doesn’t change the outcome of find (the representative is still the same), but it dramatically speeds up future find operations for elements on that path.

Here’s how find with path compression looks:

// Find with Path Compression
function findOptimized(i: number, parent: number[]): number {
    if (parent[i] === i) { // If i is its own parent, it's the root
        return i;
    }
    // Recursively find the root and set i's parent directly to the root
    parent[i] = findOptimized(parent[i], parent);
    return parent[i];
}

Notice how elegant the recursive solution is! Once findOptimized(parent[i], parent) returns the true root, we assign it directly to parent[i]. This “compresses” the path.

Optimization 2: Union by Rank or Size (for `union`)

When we merge two sets, we have two roots, rootA and rootB. Which one should become the parent of the other? To keep the trees shallow, it’s always better to attach the root of the “smaller” tree to the root of the “larger” tree.

We can define “size” as the number of elements in the set, or “rank” as an upper bound on the height of the tree. Both strategies work similarly:

Union by Size: Keep track of the number of elements in each set. When uniting, make the root of the set with fewer elements point to the root of the set with more elements.
Union by Rank: Keep track of the “rank” (approximate height) of each tree. When uniting, if rank[rootA] < rank[rootB], make rootA point to rootB. If rank[rootB] < rank[rootA], make rootB point to rootA. If ranks are equal, pick one (e.g., rootA points to rootB) and increment the rank of the new root (rootB).

Union by rank is often preferred in theoretical analysis because it provides a tighter bound on tree height. For practical purposes, union by size is also very effective. We will use Union by Rank in our implementation.

Combined Power: Amortized O(α(N))

When both path compression and union by rank/size are used together, the amortized time complexity for both find and union operations becomes incredibly efficient: nearly constant time, specifically O(α(N)), where α is the inverse Ackermann function. The inverse Ackermann function grows so slowly that for any practical input size N, α(N) is less than 5. This makes Union-Find one of the most efficient data structures for its purpose!

Step-by-Step Implementation in TypeScript

Let’s put these concepts into a UnionFind class.

First, create a new file, say UnionFind.ts.

// src/UnionFind.ts

/**
 * Represents a Union-Find (Disjoint Set Union) data structure.
 * It manages a collection of elements partitioned into a number of disjoint (non-overlapping) sets.
 * Offers efficient operations for finding the representative of a set and merging sets.
 */
class UnionFind {
    private parent: number[]; // parent[i] stores the parent of element i
    private rank: number[];   // rank[i] stores the rank (approximate height) of the tree rooted at i
    private numSets: number;  // Keeps track of the total number of disjoint sets

    /**
     * Initializes the Union-Find structure with 'numElements' elements.
     * Each element is initially in its own set.
     * @param numElements The total number of elements (0 to numElements-1).
     */
    constructor(numElements: number) {
        this.parent = new Array(numElements);
        this.rank = new Array(numElements);
        this.numSets = numElements;

        // Initially, each element is its own parent and has a rank of 0
        for (let i = 0; i < numElements; i++) {
            this.parent[i] = i;
            this.rank[i] = 0;
        }
    }

    /**
     * Finds the representative (root) of the set containing element 'i'.
     * Applies path compression to flatten the tree during traversal.
     * @param i The element to find the set representative for.
     * @returns The representative (root) of the set.
     */
    find(i: number): number {
        // If i is its own parent, it is the root of its set
        if (this.parent[i] === i) {
            return i;
        }

        // Otherwise, recursively find the root and compress the path
        // by making i's parent point directly to the root.
        this.parent[i] = this.find(this.parent[i]);
        return this.parent[i];
    }

    /**
     * Merges the sets containing element 'i' and element 'j'.
     * Uses union by rank to keep the trees as flat as possible.
     * @param i An element in the first set.
     * @param j An element in the second set.
     * @returns True if the sets were merged, false if they were already in the same set.
     */
    union(i: number, j: number): boolean {
        const rootI = this.find(i); // Find the representative of i's set
        const rootJ = this.find(j); // Find the representative of j's set

        // If both elements are already in the same set, do nothing
        if (rootI === rootJ) {
            return false; // No merge occurred
        }

        // Union by Rank: Attach the smaller rank tree under the root of the larger rank tree
        if (this.rank[rootI] < this.rank[rootJ]) {
            this.parent[rootI] = rootJ; // rootI's tree becomes a child of rootJ's tree
        } else if (this.rank[rootJ] < this.rank[rootI]) {
            this.parent[rootJ] = rootI; // rootJ's tree becomes a child of rootI's tree
        } else {
            // If ranks are equal, pick one (e.g., rootJ becomes child of rootI)
            // and increment the rank of the new root (rootI)
            this.parent[rootJ] = rootI;
            this.rank[rootI]++;
        }

        this.numSets--; // A successful union reduces the total number of disjoint sets
        return true; // Merge occurred
    }

    /**
     * Checks if two elements are in the same set.
     * @param i An element.
     * @param j Another element.
     * @returns True if i and j are in the same set, false otherwise.
     */
    isConnected(i: number, j: number): boolean {
        return this.find(i) === this.find(j);
    }

    /**
     * Returns the current number of disjoint sets.
     * @returns The count of disjoint sets.
     */
    countSets(): number {
        return this.numSets;
    }
}

export default UnionFind;

Now, let’s create a small test file to see it in action. Create index.ts in the same src directory.

// src/index.ts
import UnionFind from './UnionFind';

console.log("--- Union-Find (Disjoint Set Union) Example ---");

// Initialize Union-Find for 5 elements (0, 1, 2, 3, 4)
const uf = new UnionFind(5);
console.log(`Initial number of sets: ${uf.countSets()}`); // Expected: 5

console.log("\n--- Performing Union Operations ---");
console.log("Union(0, 1):", uf.union(0, 1)); // Merges set of 0 and set of 1
console.log("Union(2, 3):", uf.union(2, 3)); // Merges set of 2 and set of 3
console.log("Union(0, 2):", uf.union(0, 2)); // Merges set containing 0 (and 1) with set containing 2 (and 3)
console.log("Union(4, 4):", uf.union(4, 4)); // Attempt to union an element with itself (no change)
console.log(`Number of sets after unions: ${uf.countSets()}`); // Expected: 2 (sets: {0,1,2,3}, {4})

console.log("\n--- Checking Connectivity ---");
console.log("Are 0 and 1 connected?", uf.isConnected(0, 1)); // Expected: true
console.log("Are 1 and 3 connected?", uf.isConnected(1, 3)); // Expected: true (0-1, 2-3, 0-2 -> 0-1-2-3)
console.log("Are 0 and 4 connected?", uf.isConnected(0, 4)); // Expected: false
console.log("Are 4 and 4 connected?", uf.isConnected(4, 4)); // Expected: true

console.log("\n--- Further Union ---");
console.log("Union(0, 4):", uf.union(0, 4)); // Merges the two remaining sets
console.log(`Number of sets after final union: ${uf.countSets()}`); // Expected: 1
console.log("Are 0 and 4 connected?", uf.isConnected(0, 4)); // Expected: true

console.log("\n--- Demonstrating Path Compression (Conceptual) ---");
// After find(0) was called, the path from 0 to its root was compressed.
// Let's assume the root of {0,1,2,3,4} is 0.
// If we had a path like 4 -> 3 -> 2 -> 1 -> 0 before the last union,
// now find(4) would make 4 point directly to 0.
// Subsequent calls to find(4) would be O(1).
console.log("Calling find(4) again to see effects of path compression:", uf.find(4));
console.log("Calling find(1) again:", uf.find(1));
// While we can't directly "see" the parent array changing from here,
// these calls are now much faster due to the internal flattening.

To run this, make sure you have Node.js and TypeScript set up. As of 2026-02-16, a stable Node.js LTS version is 24.13.0 and the current version is 25.6.1. We’ll use the LTS for stability. TypeScript is typically at version 5.x.

Install Node.js (if you haven’t yet): Download the LTS version from the official Node.js website: https://nodejs.org/en/download/ Verify installation:
```
node -v # Should show v24.13.0 or similar
npm -v # Should show a recent npm version (e.g., 10.x)
```
Initialize a TypeScript project:
```
mkdir union-find-project
cd union-find-project
npm init -y
npm install typescript@latest ts-node @types/node --save-dev
npx tsc --init
```
This creates tsconfig.json. You might want to adjust outDir to "./dist" and rootDir to "./src" in tsconfig.json for better project structure. Also, ensure "esModuleInterop": true is uncommented for import/export syntax.

Create src directory and files:

mkdir src
# Create src/UnionFind.ts and src/index.ts with the code above

Run the code:
```
npx ts-node src/index.ts
```
You should see the output demonstrating the Union-Find operations.

Mini-Challenge: Implementing `getSetElements`

The UnionFind class efficiently tells us if two elements are connected and how many sets there are. But what if you wanted to see all the elements belonging to a specific set?

Challenge: Add a new method getSetElements(element: number): number[] to the UnionFind class. This method should return an array of all elements that belong to the same set as the given element.

Hint: You’ll need to iterate through all elements in the parent array and check their root. Remember that find() will give you the representative of a set.

What to Observe/Learn: This challenge reinforces your understanding of how find identifies a set’s representative and how the parent array implicitly defines set membership. It also highlights that while Union-Find is fast for find and union, reconstructing entire sets can be more complex, often requiring iteration or additional data structures if this is a frequent requirement.

// Add this method to your UnionFind class in src/UnionFind.ts
// ... inside the class ...

    /**
     * Returns an array of all elements belonging to the same set as the given element.
     * Note: This operation can be O(N) where N is the total number of elements,
     * as it iterates through all elements to check their set membership.
     * @param element An element whose set members are to be retrieved.
     * @returns An array of numbers representing all elements in the same set.
     */
    getSetElements(element: number): number[] {
        const rootOfElement = this.find(element);
        const setMembers: number[] = [];

        for (let i = 0; i < this.parent.length; i++) {
            if (this.find(i) === rootOfElement) {
                setMembers.push(i);
            }
        }
        return setMembers;
    }

Now, you can add a test in index.ts:

// Add to src/index.ts after the previous example
console.log("\n--- Testing getSetElements ---");
const uf2 = new UnionFind(6);
uf2.union(0, 1);
uf2.union(1, 2); // Set: {0, 1, 2}
uf2.union(3, 4); // Set: {3, 4}
// Element 5 is still in its own set {5}

console.log("Elements in set of 0:", uf2.getSetElements(0)); // Expected: [0, 1, 2]
console.log("Elements in set of 3:", uf2.getSetElements(3)); // Expected: [3, 4]
console.log("Elements in set of 5:", uf2.getSetElements(5)); // Expected: [5]

Run npx ts-node src/index.ts again to see the output.

Common Pitfalls & Troubleshooting

Forgetting Path Compression or Union by Rank/Size: This is the most common mistake. If you implement find or union without these optimizations, your code will work for small inputs but will suffer from O(N) performance for operations in worst-case scenarios, leading to O(M*N) total time for M operations. Always include both optimizations for a truly efficient Union-Find.
Incorrect Array Initialization: Ensure your parent and rank arrays are correctly sized and initialized in the constructor. Each element i should initially have parent[i] = i and rank[i] = 0.
Off-by-One Errors with Indexing: If your problem statement uses 1-based indexing for elements, but your arrays are 0-indexed, remember to adjust your element values (element - 1) before using them as array indices. Our implementation assumes 0-indexed elements.
Confusing numSets with parent.length: The numSets counter correctly tracks the number of disjoint groups. parent.length is just the total number of elements the structure was initialized with.
Infinite Loops in find: This usually happens if parent[i] is not correctly initialized or updated, or if there’s a cycle created somehow (which shouldn’t happen with correct union logic). Always ensure parent[root] === root for roots.

Real-World Use Cases

Union-Find is not just an academic exercise; it’s a workhorse in many practical algorithms and systems:

Connected Components in Graphs: This is perhaps its most direct and intuitive application. Given a graph, Union-Find can quickly determine if two nodes are connected (i.e., belong to the same connected component) or count the total number of connected components. This is crucial in network analysis, social graph algorithms, and image processing (e.g., grouping connected pixels).
Kruskal’s Algorithm for Minimum Spanning Tree (MST): Kruskal’s algorithm builds an MST by iteratively adding the cheapest edges that do not form a cycle. Union-Find is used to efficiently detect if adding an edge (u, v) would create a cycle: if u and v are already in the same set, then adding an edge between them would form a cycle. Otherwise, they are in different sets, and we union them.
Network Connectivity: Imagine a network of computers. You can use Union-Find to quickly check if two computers are connected or if the entire network is fully connected. Each computer is an element, and a connection between two computers implies a union operation.
Percolation Simulation: In scientific simulations, such as modeling fluid flow through porous materials, Union-Find can track connected paths of “wet” cells.
Maze Generation: Some maze generation algorithms (like Kruskal’s for mazes) use Union-Find to ensure that all parts of the maze are connected while avoiding cycles.

Summary

In this chapter, we’ve explored the powerful Union-Find (Disjoint Set Union) data structure. Here’s a quick recap of what we’ve covered:

Disjoint Sets: Collections of non-overlapping sets of elements.
Core Operations: find(element) to get the set representative, and union(elementA, elementB) to merge sets.
Naive Implementation: Using a simple parent array, which can lead to O(N) performance in worst-case scenarios due to skewed trees.
Optimizations:
- Path Compression: During a find operation, make all visited nodes point directly to the root, significantly flattening the tree.
- Union by Rank (or Size): When merging two sets, attach the root of the tree with smaller rank to the root of the tree with larger rank, minimizing tree height.
Amortized Complexity: With both optimizations, Union-Find achieves nearly constant time (O(α(N))) for find and union operations, making it incredibly efficient.
Practical Applications: Crucial for graph algorithms (connected components, Kruskal’s MST), network analysis, and various simulations.

You now have a solid understanding and a working TypeScript implementation of one of the most efficient data structures for managing dynamic set memberships. This will be invaluable as you tackle more complex algorithmic challenges, especially those involving graphs.

Next, we’ll continue our exploration of graph algorithms, building on concepts like connected components to solve problems like shortest paths and graph traversals!

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.