I am a roboticist at Tesla Optimus team.

My research interests lie at the intersection of Computer Vision, Graphics, and Robotics. My long-term goal is to build intelligent agents that can see (through vision, audio, and other senses), interact (navigate and act in an environment), and reason (plan long-term actions from sparse rewards).

I got my Ph.D in Artificial Intelligence at University of Maryland College Park. I also interned at Google, NVIDIA, and Amazon AI Labs. Before starting my Ph. D., I spent 5 years prototyping and scaling Vision and ML products in both industry (Netradyne, American Express), and academia (NICTA, ITRI, Carnegie Mellon, IIT Delhi). I also co-founded a startup helping users design their wardrobes by “learning” a fashion knowledge graph from social network data.

Research Projects

SHACIRA - Scalable HAsh-grid Compression for Implicit Neural Representations
ICCV 2023

S. Girish, A. Shrivastava, K. Gupta

An end-to-end compression framework for feature-grid INRs

Web / Code / arXiv

ChopNLearn - Generating Object-State Compositions
ICCV 2023

N. Saini, H. Wang, A. Swaminathan, V. Jayasundara, B. He, K. Gupta, A. Shrivastava

A benchmark for recognizing and generating object-state compositions from images and videos

Web / Code / arXiv

ASIC - Aligning Sparse in-the-wild Image Collections
ICCV 2023

K. Gupta, V. Jampani, C. Esteves, A. Shrivastava, A. Makadia, N. Snavely, A. Kar

Learning dense correspondences for long-tail in-the-wild image collections

Web / Code / arXiv

Teaching Matters - Investigating the Role of Supervision in Vision Transformers
CVPR 2023

M. Walmer, S. Suri, K. Gupta, A. Shrivastava

ViTs trained via different supervisions show diverse range of behaviors in their representations and downstream tasks.

Web / Code / arXiv

LilNetX - Lightweight Networks with EXtreme Model Compression and Structured Sparsification
ICLR 2023

S. Girish, K. Gupta, S. Singh, A. Shrivastava

A neural network optimization scheme that allows for trade-off between accuracy and compression during training itself

Web / Code / arXiv

Neural Space-filling Curves
ECCV 2022

H. Wang, K. Gupta, L. Davis, A. Shrivastava

A data-driven approach to infer a context-based scan order for a set of images. Allows for better compression and sequential generative models

Web / Code / arXiv

PatchGame - Learning to Signal in Referential Games
NeurIPS 2021

K. Gupta, G. Somepalli, A. Gupta, V. Jayasundara, M. Zwicker, A. Shrivastava

Emergent communication via mid-level patches in a referential game played on a large-scale image dataset

Web / Code / arXiv

LayoutTransformer - Layout Generation with Self-attention
ICCV 2021

K. Gupta, A. Achille, J. Lazarow, L. Davis, V. Mahadevan, A. Shrivastava

A generative model for layouts; results on diverse real world datasets (3D shapes, image, documents, app wireframes)

Web / Code / arXiv

The Lottery Ticket Hypothesis for Object Recognition
CVPR 2021

S. Girish, S. Maiya, K. Gupta, H. Chen, L. Davis, A. Shrivastava

How to find sparse neural networks (with up to 80% overall sparsity) on the tasks of object detection, segmentation, and pose estimation

Web / Code / arXiv

Improved Modeling of 3D Shapes with Multi-view Depth Maps
3DV 2020

K. Gupta, S. Jabbireddy, K. Shah, A. Shrivastava, M. Zwicker

A novel encoder-decoder generative model for 3D shapes using multi-view depth maps; SOTA results on single view reconstruction and generation

Web / Code / arXiv

PatchVAE - Learning Local Latent Codes for Recognition
CVPR 2020

K. Gupta, S. Singh, A. Shrivastava

A patch-based VAE formulation to learn interesting parts of image, instead of the entire image. Our bottleneck formulation learns representation better for visual recognition tasks

Web / Code / arXiv

A deep dive into location-based communities in social discovery networks
COMCOM 2017

K. Thilakarathna, S. Seneviratne, K. Gupta, M. Kaafar, A. Seneviratne

A study of the characteristics and evolution of location-based social discovery networks

Web / Code / Paper

Global pose estimation with limited gps and long range visual odometry
ICRA 2012

J. Rehder, K. Gupta, S. Nuske, S. Singh

An approach to estimate and correct for the bias in the motion estimate due to a lack of close range features in outdoors when using stereo visual odometry

Web / Code / Paper

Modeling and Calibration Visual Yield Estimates in Vineyards
FSR 2012, CMU Tech Report

S. Stephen, S. Achar, K. Gupta, S. Narasimhan, S. Singh

An approach to predict vineyard yield automatically and non-destructively using images collected from vehicles driving along vineyard rows

Web / Code / Paper

A Compression Scheme for Handwritten Patterns
ICDAR 2011

K. Gupta, M. Bansal, S. Chaudhury

A method to compress hand-written patterns recorded as strokes in order of their temporal occurrence using B-Spline Curves

Web / Code / Paper

Blog

Yet Another Machine Learning blog written with an intention of

learning more by coding and writing. I strongly believe that the best way to learn a concept is to either code it myself or write a tutorial about it. This helps in both understanding various nuances associated with the concept as well as retain the concept for a longer time.
keeping notes of various lectures/articles/papers/books/ideas I (have) come across.

Check out the blog page for more.

Lernen durch Codierung

is german for Learning by Coding. It’s a play on Learning by Teaching, a strategy for students to learn by teaching their peers popular in Germany.

Disclaimer

Contents of this blog are inference of a biological neural network, trained over a very tiny dataset for a very long time. Any statistically significant correlation to existing literature is not coincidental but an outcome of overfitting.

Website

Finally moving from old page to Jekyll, hoping to do some justice to the new website cum blog. Promising myself to be more regular and more meticulous.

Kamal Gupta

Recent News

Research Projects

Blog

Lernen durch Codierung

Disclaimer

Website