Image Segmentation via Min Cut

By DHRUV JANI



Example GIF

Introduction

Graph cut algorithms are a class of optimization algorithms used in computer vision and image processing for tasks such as image segmentation and object detection.

By eliminating a small set of edges, these algorithms seek to divide a graph into two disjoint sets (often referred to as source and sink) by maximizing the capacity of the remaining edges or minimizing the overall weight of the deleted edges.

This project focuses on using graph cuts to divide an image into background and foreground segments. The framework consists of two parts. First, a network flow graph is built based on the input image. Then a max-flow algorithm is run on the graph in order to find the min-cut, which produces the optimal segmentation.

Network Flow

A network flow $G=(V,E)$ is a graph where each edge has a capacity and a flow. Two vertices in the network flow are designated to be the source vertex $s$ and the sink vertex $t$, respectively. The goal is to find the maximum amount of flow that could be delivered from $s$ to $t$, while satisfying the following constraints.

The flow of the network is the flow that can be sent through some path from $s$ to $t$, which, by conservation of flow, is equal to the inflow of $s$ or the outflow of $t$. An $s/t$ cut is a partitioning of the vertices into two disjoint subsets such that one contains $s$ and the other contains $t$. The value of an $s/t$ cut is the total flow of the edges passing through the cut.

As stated by the max-flow min-cut theorem, the maximum amount of flow passing from the source to the sink is equivalent to the net flow of the edges in the minimum cut. So by solving the max-flow problem, we directly solve the min-cut problem as well. We will discuss algorithms for finding the max-flow or min-cut in a later section.

Image to Graph

One of the most challenging things about this project is how to transform an image into a graph. In Graph cuts and efficient N-D image segmentation by Boykov and Funka-Lea, the authors described in great detail how to define a graph based on an image. My implementation closely follows their idea of constructing the graph. For simplicity, we will use grayscale square images. Although the same idea could be easily extended to colored images with a suitable inter-pixel similarity measurement and also to rectangular images.

In this image-graph transformation scheme, a pixel in the image is a vertex in the graph labeled in row-major order. These vertices are called pixel vertices. In addition, there are two extra vertices that serve as the (invisible) source and the sink.

There are two types of edges in our graph. The first type is an $n$-link, which connects neighboring pixel vertices in a 4-neighboring system. The second type of edge is called a $t$-link. $t$-links connect the source or sink vertex with the pixel vertices.

The $n$-link edges must have weights carefully computed in order to reflect inter-pixel similarities. Concretely, we want the weight of an edge to be large when the two pixels are similar, and small when they are quite different. One idea is to compute the weight from a boundary penalty function that maps two pixel intensities to a positive integer. Let $I_p$ be the brightness, or intensity, of the pixel vertex $p$. For any edge $(p,q)\in E$, the boundary penalty $B(I_p, I_q)$ is defined as: $$ B(I_p, I_q) = 100\cdot \exp\Bigg(\frac{-(I_p-I_q)^2}{2\sigma ^2}\Bigg)$$.

This choice of $\sigma$ is determined from a series of trial and error. The penalty is high if $|I_p-I_q|<\sigma$, and is quite negligible if $|I_p-I_q|>\sigma$. From empirical results, we choose $\sigma=30$. Finally, the result is multiplied by 100 and cast to an integer. This is because network flow models require that the capacities be discrete rather than continuous.

To facilitate the model to make suitable $t$-links, the user is prompted to highlight at least one pixel vertex as a background pixel and at least one as a foreground pixel. These pixel vertices are called seeds. For every background seed, an edge is added from the source vertex to the background seed with capacity $\mathcal{K}$ defined as follows. $$ \mathcal {K} = \max(\{B(I_p, I_q)|(p, q)\in E\}) $$ In a similar fashion, edges to the sink vertex are added for every foreground seeds with capacity $\mathcal{K}$.

As these seeds share an edge with the source or sink vertices, they are hard-coded to be either the foreground or the background.

Image to Graph
An example network defined on a simple 3x3 image with a background seed and a foreground seed (Boykov and Funka-Lea, 2006).

Now with the graph fully defined, we can run a graph cut algorithm to find the maximum flow/minimum cut.

Algorithms

For computing minimum-cut/maximum-flow, we can use algorithms like Ford-Fulkerson, Dinic's Algorithm, and Karger's Algorithm. In this project, I will be utilizing GrabCut method, Felzenswalb's Algorithm and Boykov-Kolmogorov Algorithm for graph-based image segmentation. The time-complexity for each of them has high-variance.

GrabCut

The GrabCut algorithm is commonly used for image segmentation tasks, and does not directly use minimum cut or maximum flow algorithms. Instead, it employs an iterative optimization technique based on graph cuts.

GrabCut
An example of GrabCut method for image segmentation (Huang et al., 2017).

The algorithm used in GrabCut is an iterative optimization process that alternates between estimating the foreground and background regions of the image and updating the probability distributions based on these estimates. This process is guided by a graph-based energy function that incorporates both color similarity and spatial coherence constraints.

  # Simple implementation of GrabCut
function grabCut(image, mask): Initialize foreground and background models Repeat until convergence: Assign pixels to foreground or background based on mask Update models Minimize energy Return segmented image

Initializing the foreground and background models typically has a time complexity of $O(N)$, where N is the number of pixels in the image. Updating the Gaussian models for foreground and background involves iterating over all pixels in the image, resulting in a time complexity of $O(N)$. Minimizing the energy typically involves solving an optimization problem, such as graph cuts, which can have a time complexity that depends on the specific optimization technique used. Let's denote the time complexity of energy minimization as $O(E)$. the overall time complexity of the GrabCut algorithm can be approximated as: $O(N + K * (N + E))$

Felzenswalb's Algorithm

Felzenszwalb algorithm is an image segmentation algorithm developed by Pedro F. Felzenszwalb and Daniel P. Huttenlocher. This algorithm is widely used for segmenting images into meaningful regions based on the differences in intensity or color.

The Felzenszwalb algorithm is based on a graph-based approach where pixels are represented as nodes in a graph, and edges are defined based on the similarity between neighboring pixels. The algorithm then iteratively merges regions based on a criterion that combines both color similarity and spatial proximity.

One of the advantages of the Felzenszwalb algorithm is its ability to produce hierarchical segmentations, where segments can be merged or split at different scales. This allows for a more flexible representation of image regions compared to traditional segmentation methods.

function felzenszwalbSegmentation(image, scale, sigma, min_size):
    constructGraph(image)
    sortEdgesByWeight()
    mergeSegments()
    postProcessSegments(min_size)
    return segmentedImage

Felzenszwalb algorithm approximately runs in $O(N log N)$ time, where N is the number of pixels in the image. This approach is better than GrabCut and improves our time complexity.

Boykov-Kolmogorov Algorithm

The Boykov-Kolmogorov algorithm (the BK algorithm or BK max-flow/min-cut algorithm) is a powerful and efficient method for solving the minimum cut problem and, by extension, the maximum flow problem in graphs. It was introduced by Vladimir Kolmogorov and Yuri Boykov.

This algorithm has high efficiency and accuracy in solving large-scale graph cut problems, making it widely used in computer vision, image segmentation, and other areas where graph-based optimization is required.

The Boykov-Kolmogorov algorithm is based on the max-flow min-cut theorem, which states that the maximum flow passing through a network is equal to the capacity of the minimum cut in the network. The algorithm efficiently finds the minimum cut by iteratively finding augmenting paths and updating the flow in the network.

The Boykov-Kolmogorov algorithm's time complexity is commonly expressed as $O(V * E * log(V))$, where V is the number of vertices (nodes) and E is the number of edges in the graph. This time complexity arises from the use of binary heap data structures in the algorithm for finding augmenting paths, which typically have a time complexity of $(O(log(V))$ for insertion and deletion operations.

Limitations

While the Ford-Fulkerson algorithm is effective for finding maximum flows in networks, it can be inefficient for large networks with dense connectivity. In such cases, the algorithm may take a long time to converge to the optimal solution or may even encounter issues such as infinite loops. The algorithm may require significant memory, especially when dealing with large networks or graphs.

GrabCut requires an initial rectangle or mask to initialize the segmentation. The quality of this initialization can significantly affect the segmentation result. For large images, GrabCut may require substantial memory and processing power, especially when dealing with high-resolution images.

The performance of Felzenszwalb's algorithm can be sensitive to the parameters such as scale, sigma, and minimum size. Tuning these parameters may be required to achieve desired segmentation results across different types of images. The algorithm's time complexity can increase with the number of pixels and the complexity of the image structure. For very large images, the computational requirements may become prohibitive.

While the Boykov-Kolmogorov algorithm often provides good results, it does not guarantee an optimal solution in all cases. The quality of the segmentation may depend on the specific characteristics of the image and the chosen parameters. The computational complexity of the Boykov-Kolmogorov algorithm can be high, especially for large graphs or images with complex structures. This can limit its applicability in real-time or resource-constrained environments.

Results

The source code and other examples are available here.

The following is a visualization of original image and its segmentation using 3 different algorithms.

results.png
results2.png
comparing_algorithms.png
The following is a plot displaying time complexity of the algorithms used for image segmentation.

Based on the results and above visualizations, we infer that GrabCut is least time efficient while Boykov Kolmogorov is the best choice for image segmentation.

For example, in our tesla image segmentation, GrabCut method takes about 361 seconds (6 minutes), while Felzenszwalb algorithm takes 13.78 seconds and the Boykov-Kolmogorov algorithm takes 13.1 seconds.

Conclusion & Improvements

Minimum Cut algorithm performs better in time complexity & optimality than Karger's algorithm for directed graphs. On the contrary, Karger outperforms Minimum Cut when robustness to noise is prioritized over optimality. Karger's algorithm runs in $O(|V|)^2$ time which has higher probability of success than Minimum Cut method whose time complexity is generally $O(|V|)^3 $.

GrabCut methods provide good results for medium-sized images but is computationally expensive for large images resulting in bad time complexity.

Felzenszwalb's Algorithm does not guarantee optimal solution for image segmentation but provides a fast-efficient method for generating super-pixels. It offers linearithmic time complexity which is suitable for real-time and interactive segmentation techniques.

Boykov-Kolmogorov offer the most optimal and efficient image segmentation results as compared to the above methods. It can handle complex and large images.

There is scope of improvement by using some efficient algorithms like Push-Relable algorithm, Mean-Shift algorithm, and CNNs/FCNS (Deep-Learning segmentation techniques). Improvements can also be done by parameter tuning, feature selection, and image pre-processing.

References