S

Srikumar Ramalingam

Google (United States)

ORCID: 0000-0002-2598-0151

Publishes on Advanced Vision and Imaging, Optical measurement and interference techniques, Robotics and Sensor-Based Localization. 73 papers and 3.4k citations.

73Publications
3.4kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Entropy rate superpixel segmentation
Cited by 1k

We propose a new objective function for superpixel segmentation. This objective function consists of two components: entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes. We present a novel graph construction for images and show that this construction induces a matroid - a combinatorial structure that generalizes the concept of linear independence in vector spaces. The segmentation is then given by the graph topology that maximizes the objective function under the matroid constraint. By exploiting submodular and mono-tonic properties of the objective function, we develop an efficient greedy algorithm. Furthermore, we prove an approximation bound of ½ for the optimality of the solution. Extensive experiments on the Berkeley segmentation benchmark show that the proposed algorithm outperforms the state of the art in all the standard evaluation metrics.

CASENet: Deep Category-Aware Semantic Edge Detection
Zhiding Yu, Chen Feng, Ming-Yu Liu et al.|Unknown|2017
Cited by 308

Boundary and edge cues are highly beneficial in improving a wide variety of vision tasks such as semantic segmentation, object recognition, stereo, and object proposal generation. Recently, the problem of edge detection has been revisited and significant progress has been made with deep learning. While classical edge detection is a challenging binary problem in itself, the category-aware semantic edge detection by nature is an even more challenging multi-label problem. We model the problem such that each edge pixel can be associated with more than one class as they appear in contours or junctions belonging to two or more semantic classes. To this end, we propose a novel end-to-end deep semantic edge learning architecture based on ResNet and a new skip-layer architecture where category-wise edge activations at the top convolution layer share and are fused with the same set of bottom layer features. We then propose a multi-label loss function to supervise the fused activations. We show that our proposed architecture benefits this problem with better performance, and we outperform the current state-of-the-art semantic edge detection methods by a large margin on standard data sets such as SBD and Cityscapes.

A theory of multi-layer flat refractive geometry
Cited by 209

Flat refractive geometry corresponds to a perspective camera looking through single/multiple parallel flat refractive mediums. We show that the underlying geometry of rays corresponds to an axial camera. This realization, while missing from previous works, leads us to develop a general theory of calibrating such systems using 2D-3D correspondences. The pose of 3D points is assumed to be unknown and is also recovered. Calibration can be done even using a single image of a plane. We show that the unknown orientation of the refracting layers corresponds to the underlying axis, and can be obtained independently of the number of layers, their distances from the camera and their refractive indices. Interestingly, the axis estimation can be mapped to the classical essential matrix computation and 5-point algorithm [15] can be used. After computing the axis, the thicknesses of layers can be obtained linearly when refractive indices are known, and we derive analytical solutions when they are unknown. We also derive the analytical forward projection (AFP) equations to compute the projection of a 3D point via multiple flat refractions, which allows non-linear refinement by minimizing the reprojection error. For two refractions, AFP is either 4th or 12th degree equation depending on the refractive indices. We analyze ambiguities due to small field of view, stability under noise, and show how a two layer system can be well approximated as a single layer system. Real experiments using a water tank validate our theory.

Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Cited by 158

As with many machine learning problems, the progress of image generation methods hinges on good evaluation metrics. One of the most popular is the Frechet Inception Distance (FID). FID estimates the distance between a distribution of Inception-v3 features of real images, and those of images generated by the algorithm. We highlight important drawbacks of FID: Inception's poor representation of the rich and varied content generated by modern text-to-image models, incorrect normality assumptions, and poor sample complexity. We call for a reevaluation of FID's use as the primary quality metric for generated images. We empirically demonstrate that FID contradicts human raters, it does not reflect gradual improvement of iterative text-to-image models, it does not capture distortion levels, and that it produces inconsistent results when varying the sample size. We also propose an alternative new metric, CMMD, based on richer CLIP embeddings and the maximum mean discrepancy distance with the Gaussian RBF kernel. It is an unbiased estimator that does not make any assumptions on the probability distribution of the embeddings and is sample efficient. Through extensive experiments and analysis, we demonstrate that FID-based evaluations of text-to-image models may be unreliable, and that CMMD of-fers a more robust and reliable assessment of image quality. A reference implementation of CMMD is available at: https://github.com/google-research/google-research/tree/master/cmmd.