Solving a Graph Problem with Go

During the last couple of days, I have been watching live mocking interviews and even though I am not in the market for a new job (I am very happy at Google), it results interesting to me that so many people struggle with the same problem hidden behind some complicated adornments.

One of the example mock questions said something like this:

Imagine you have images, and you can duplicate any image you want, creating another image. The new image (the duplicate) can also be copied, and so on. Although the new image looks the same, the name is different, and you can also trace it back to the parent image.

Now, we want to delete all the images, but we cannot delete an image if it has duplicates. You must implement a function that returns the order in which the images can be deleted.

The function to implement is the following (we are using Go for all code in this post):

func deleteOrder(imgs []*Image, cps []*CPOps) []*Image {...}

type Image struct {
  Name string
}

// CPOps represent a copy operation. 
type CPOps struct {
  ParentImage *Image //Original images being copied. 
  CpImages []*Image  //Copies of the original image.
}

The function receives a list of images, and the copy operations between them, then it must return the order we must follow to delete all images based on the above rule.

Let’s look at one example:

Images = [A, B, C, D]

CPOps = [
  {ParentImage:A, CPImages:[B, C]},
  {ParentImage:B, CPImages:[D]},
]

This is saying that A was the original image, and we made two copies, B and C. Then we made a copy of B into D. The out of our functiondeleteOrder should be [D, B, C, A] or [D, C, B, A] or [C, D, B, A]. Notice that multiple solutions are possible, but all of them are such that every image is deleted only after its copies are deleted.

Solving the Problem

This problem is a classic Topological Sort algorithm where some kind of dependencies between objects is established. The same problem with a different phrasing was presented to the candidates of the mock interviews, but in all cases, the problem could be reduced to a Topological Sort.

The solution we present in this post is written so that each piece can be independently explained. However, in an interview is acceptable to write a less readable code for the sake of time.

Building the Graph

I hope that by now, the reader can realize this is a graph problem, which in turn should be the first signal to the interviewer that the candidate understands what is going on.

If we build the graph using the sample above, we should get something like:

A --> B --> D
 \  
  --> C

Now, let’s create a function that given the input, we can build a graph that can be used to solve the problem.

type Graph map[string][]string

func buildGraph(imgs []*Image, copos []*CPOps) Graph {
    graph := make(Graph)

    // Add all images to the graph.
    for _, img := range imgs {
        if _, ok := graph[img.Name]; !ok {
            graph[img.Name] = []string{}
        }
    }

    // Add dependencies between images.
    for _, cp := range cpops {
        if _, ok := graph[cp.ParentImage.Name]; !ok {
            graph[cp.ParentImage.Name] = []string{}
        }
        for _, child := range cp.CPImages {
            graph[cp.ParentImage.Name] = append(graph[cp.ParentImage.Name], child.Name)
        }
    }

    return graph
}

Notice that we are adding all nodes (images) to the graph first, this is important since there might be images that have never been copied, hence, they will never be in the given relations. Then, we establish the dependencies between the nodes.

Now, we can implement the main part of the algorithm, the Topological Sort.

Topological Sort

The topological sort is a simple DFS that traverses the graph using the dependencies as directed edges and returns the reverse order. In our case, we are interested in the reversed order of the dependencies since A — > B, we need to delete B first, then A.

func dfs(n string, graph Graph, visited map[string]bool, path []string) []string {
    for _, child := range graph[n] {
        if _, ok := visited[child]; ok {
            continue
        }
        visited[child] = true
        path = dfs(child, graph, visited, path)
    }
    return append(path, n)
}

The DFS traverses the graph given an initial node and returns the dependencies in reversed order. The visited variable keeps track of the nodes we have already visited in the graph (to not repeat them), and the variable path keeps track of the reversed dependencies.

Now that we have a way to build the graph and traverse it, we must put everything together at once.

// DeletionOrder returns the order the images should be deleted. 
func DeletionOrder((imgs []*Image, cps []*CPOps) []*Image {
    graph := buildGraph(imgs, cps)

    visited := map[string]bool{}
    result := []string{}
    for k := range graph {
        if _, ok := visited[k]; !ok {
            visited[k] = true
            result = dfs(k, graph, visited, result)
        }
    }

    imgsMap := map[string]*Image{}
    for _, img := range imgs {
        imgsMap[img.Name] = img
    }
    imgsOrder := []*Image{}
    for _, name := range result {
        imgsOrder = append(imgsOrder, imgsMap[name])
    }
    return imgsOrder
}

The DeletionOrder function builds the graph with the given input, for each node that has not been visited the dfs is executed. The result variable will contain the image names in the correct deletion order.

The rest of the DeleteOrder function (line 14 and on) is just making sure that we select the images from the image names (outside the graph algorithm).

The bulk of the algorithm does not look that complex, and yet so many people fail to solve this problem, which in my opinion is doable in a 45 min interview. My observation is that most people fail to identify that graph problem quickly enough to have time to implement a working solution.

Other Considerations

The same problem can be taken in different directions depending on how the interviewer wants to look at it.

For instance, the question of scaling the algorithm up, when the graph has thousands of nodes, is an interesting one. Although they might look simple, there are some tricky cases to be considered.

The problem of detecting if the given input is correct could also be an interesting one. For the graph to be sortable, hence the topological sort to work correctly, the graph must not have cycles. Detecting cycles is not a particularly complex issue, but doing it at scale could be a bit more problematic.

These are interesting questions and the reader should take a moment to think about them and the implications of each.

In the end, in a 45 min interview, there is only available time to solve a few of these question marks only after the main problem has been coded correctly. However, we hope that after this post, the reader can look at some problems and easily reduce them into this graph category of problems solvable using topological sort.

Good luck interviewing.

Nicolas Perez's blog