Life is chaos and pain. Interview prep doesn't have to be.

Get instant access to anonymous mock interviews, salary negotiation, and the world's largest library of interview replays.

Depth-First Search (DFS)

By Kenny Polyak and Mike Mroczka | Published:

An essential aspect of working with graphs and trees is understanding how to traverse the search space. Traversal is the process of systematically visiting each node exactly once following a specific order or pattern. This allows us to search for a node or to trace a specific path through the data structure.

Unlike a linear data structure like an array or a linked list - where each node points to only one subsequent node - graphs and trees offer multiple distinct paths to take through the structure. The example below illustrates the many paths that exist through a tree and the single path through a linked list.

A simple graph consisting of seven nodes highlighting the various distinct paths through the graph. A singly-linked linked-list consisting of four nodes, highlighting the single linear path through the list

Different traversal algorithms will produce different traversal orders - knowing which to deploy and when enables us to solve problems more efficiently, and sometimes offers the only way to solve a particular problem. Let's look at how a specific traversal algorithm can be used! At the highest level, there are two main traversal algorithms: Depth-First Search (DFS), which is further distinguished with pre-order, in-order, and post-order traversal when specifically considering DFS in a binary tree, and Breadth-First Search (BFS). In this article, we'll focus on DFS.

Keep in mind that while we can manipulate the traversal path with different algorithms, unless the data structure is ordered in a particular way (like a BST) or our algorithm applies additional logic to omit certain paths, each traversal algorithm will ultimately visit each node once. DFS and BFS are considered blind search algorithms as they do not apply any domain-driven heuristic. Instead, the algorithms only apply traversal rules and a terminal case to determine if a goal state is reached.

Note: Since trees are merely directional, acyclic graphs, we'll just refer to both as graphs in the remainder of this article. Everything discussed below is relevant to trees as well as generic graphs, and in cases where that's not true, it will be pointed out as such.

What is Depth-First Search (DFS)?

Depth-First Search (DFS) is a type of search algorithm that explores a graph by traversing as far as possible along each branch before backtracking. It's considered "depth-first" because at each node, if there are children, the algorithm always explores deeper paths until a leaf node is finally visited or some condition is met. The algorithm starts at the root node and explores the left and right subtrees by going deeper into the tree as it processes nodes, instead of visiting all the children of a given node before moving on.

There are many paths that DFS can take, given that each node can have multiple children. In fact, DFS is commonly used as a fundamental technique for backtracking algorithms - backtracking involves exploring all possible solutions to a problem (represented as paths through the data structure) by incrementally building a candidate path and undoing or "backtracking" when a solution is found to be invalid.

For binary trees, a very specific kind of graph, the order of the DFS has a big impact on the ultimate traversal order, and is very important when considering a DFS implementation. Learn more about pre-order, in-order, or post-order traversal in a binary tree here.

Companies That Ask DFS Questions

Amazon

Interview process
& questions

Watch 33
interview replays

Airbnb

Interview process
& questions

Watch 3
interview replays

DFS Implementation

Recursive DFS

DFS is typically implemented with recursion, which works naturally with the recursive structure of a graph. We define a function that takes a node as an argument, which handles doing work on the current node (eg. printing the value) and initiating subsequent recursive calls on its children. Note that in some implementations, we will also use a set to track already visited nodes - this is important if we don't know if the graph is acyclic, otherwise we'll cause an infinite loop.

Here are the algorithm steps:

Create a visited set to keep track of visited nodes to avoid revisiting them.
Define a recursive function (dfs) that takes the current Node as a parameter.
Mark the current node as visited by adding it to the visited set.
Process the current node. This could involve performing any desired operations on the node or checking if it matches some search criteria.
Iterate through the neighbors of the current node. For each unvisited neighbor, recursively call the dfs function with that neighbor as the argument.
Repeat steps 3 to 5 for all unvisited neighbors until there are no more unvisited neighbors or the search criteria are met.

Imagine we have a graph with the following node structure:

class Node:
    def __init__(self, id):
        self._id = id
        self._neighbors = []
class Node:
    def __init__(self, id):
        self._id = id
        self._neighbors = []

DFS will recursively call itself on each neighbor of the node. Below, we implement the algorithm using this adjacency list graph, but the same idea would apply for graphs represented using an adjacency matrix or a Von Neumann neighborhood. The neighbor retrieval would only be different.

Additionally, changing the order in which we iterate over the neighbors will also change the ultimate traversal order. This is especially important when dealing with binary search trees, where the nodes are sorted in some way.

class Graph:
    def __init__(self):
        self.visited = set()

    def dfs(self, node):
        # Mark the current node as visited
        self.visited.add(node)

        # Process the current node
        print(node)

        # Iterate through neighbors
        for neighbor in node._neighbors:
            if neighbor not in self.visited:
                # Recursive call with unvisited neighbors
                self.dfs(neighbor)

# Create a Graph object
g = Graph()
# Assuming that startNode is already defined
g.dfs(startNode)
class Graph:
    def __init__(self):
        self.visited = set()

    def dfs(self, node):
        # Mark the current node as visited
        self.visited.add(node)

        # Process the current node
        print(node)

        # Iterate through neighbors
        for neighbor in node._neighbors:
            if neighbor not in self.visited:
                # Recursive call with unvisited neighbors
                self.dfs(neighbor)

# Create a Graph object
g = Graph()
# Assuming that startNode is already defined
g.dfs(startNode)

We begin the traversal by passing the root in as an argument. The function calls itself by passing in the current node's neighbors as arguments for subsequent calls, and a new instance of the function is instantiated as a new frame on the call stack.

Diagram illustrating how the call stack keeps track of the path taken by a recursive DFS traversal

By using a recursive function, the algorithm keeps track of the nodes that need to be visited with the use of the call stack. When each call eventually returns – meaning, all child nodes have been traversed – the algorithm is effectively transported back to the moment that a given call was initialized.

Iterative DFS

Although the structure of graphs lend themselves to recursive algorithms, this does not mean that one cannot also use iteration to traverse. Recursion is used in order to take advantage of the call stack - but we can also implement a stack of our own, removing the need for recursion.

Iterative approaches to DFS traversal involve using a loop and a stack for traversal. Using a while loop to iterate, a stack can be used to keep track of the nodes that still need to be visited, ensuring that the most recently added nodes, the closest parents, are visited first. This approach allows for efficient traversal of the tree and requires slightly less memory than recursive approaches, albeit no asymptotic change occurs.

class Graph:
    def __init__(self):
        self.visited = set()

    def dfs(self, node):
        stack = [node]

        while stack:
            # Pop the top node from the stack
            current_node = stack.pop()

            if current_node not in self.visited:
                # Mark the current node as visited
                self.visited.add(current_node)

                # Process the current node
                print(current_node)

                for neighbor in current_node._neighbors:
                    # Push unvisited neighbors onto the stack
                    stack.append(neighbor)

# Create a Graph object
g = Graph()
# Assuming that startNode is already defined
g.dfs(startNode)
class Graph:
    def __init__(self):
        self.visited = set()

    def dfs(self, node):
        stack = [node]

        while stack:
            # Pop the top node from the stack
            current_node = stack.pop()

            if current_node not in self.visited:
                # Mark the current node as visited
                self.visited.add(current_node)

                # Process the current node
                print(current_node)

                for neighbor in current_node._neighbors:
                    # Push unvisited neighbors onto the stack
                    stack.append(neighbor)

# Create a Graph object
g = Graph()
# Assuming that startNode is already defined
g.dfs(startNode)

Time and Space Complexity

Time complexity: O(V + E), where V is the number of vertices and E is the number of edges, as it must process each vertex and each edge exactly once. Space complexity: O(V), where V is the number of vertices, as it requires a position in the stack for each vertex.

When to Use DFS in Technical Interviews

Recall that in an acyclic graph, or a tree, all traversal algorithms will visit each node at least once, so for many problems that require basic search, both DFS and BFS will suffice. But there are certain problems where DFS is preferable:

DFS is particularly useful for backtracking algorithms: When we need to explore all possible paths, and reverse course when a leaf node or some other condition is met, DFS allows for a convenient way to analyze various paths. Examples of problems like this are those where we are calculating the path sum - adding together the values of the nodes - either searching for the max or optimal sum.
When there could be an opportunity to cache duplicate operations: If you're dealing with a graph or a tree that might include overlapping subproblems, you'll want to implement your traversal with DFS so it's easy to cache or memoize the sub-solutions. A classic example of a memoization use-case is when solving fibonacci with recursion - this algorithm effectively imagines the fibonacci resolution as a decision tree which is traversed with a DFS-like recursive algorithm.
If there is an exponential branching factor: Memoized DFS is also important to consider if the branching factor of the recursion is exponential, as the volume of computations could be costly.
In most cases, DFS is slightly more memory efficient than BFS, (though not always): This is worth mentioning if you're in a situation where both algorithms are applicable. In a tree, the space usage for DFS is defined as O(height), where height is effectively the number of levels in the tree. BFS, on the other hand, uses O(branching factor ^ height), where branching factor is the max or average number of children per node. As such, if height < branchingFactor ^ height, then DFS is more space efficient. This is rarely not the case. The absolute time complexity is unlikely to change here though, so the difference in an asymptotic sense is negligible.

Common Mistakes in Interviews Featuring DFS

Misunderstanding the tradeoffs and benefits between DFS and BFS: Remember that in many cases both will do just fine, with little to no tradeoffs, but there are specific cases when we should use BFS (finding the shortest distance between two nodes, for example).
Not using the correct supporting data structure: BFS is made possible with a queue data structure, while DFS relies on a stack. It's important to be able to identify this when implementing, and to be sure not to use the wrong supporting data structure.
Forgetting to track visited nodes: When dealing with a graph that is not guaranteed to be directed and acyclic, it is essential to track visited nodes in order to avoid an infinite loop. Interviewers will be looking for this as a common pitfall. One way to ensure you don't forget is to perform a manual dry-run of your algorithm.
Choosing BFS over DFS when it's not needed: Although in many cases these are both interchangeable, one big benefit of using DFS for backtracking algorithms is that memoization can be implemented easily.

Common Depth-First Search (DFS) interview Questions

MEDIUM