Suman Saha

Postdoctoral Researcher, Computer Vision Lab. ETH Zürich, Switzerland

I am a postdoctoral researcher at CVL (Computer Vision Lab) ETH Zürich, Switzerland. My research interests lie in Computer Vision and Machine Learning. In particular, most of my research works are based on deep learning techniques to solve computer vision problems. Deep learning is a type of machine learning where an AI model learns to recognize patterns in data by using multiple layers of artificial neurons to build a complex hierarchy of representations. In simpler terms, deep learning is a technique used to train AI models to make decisions or predictions based on input data. These models use artificial neural networks to learn from large amounts of data and improve their performance over time. By using multiple layers of neurons, deep learning models can learn to recognize more complex patterns in the data than traditional machine learning models.
My research could be broadly categorized into the following three areas:
The first one is UDA. UDA stands for Unsupervised Domain Adaptation, which refers to the process of training an AI model to work well on new, unseen data from a different domain without requiring labeled data from that domain. In simpler terms, UDA allows an AI model that was trained on one type of data to work well on a different type of data without needing to label the new data first. This can be useful in many situations, such as when a model is trained on data from one country or language and needs to work well in another country or language or when a model needs to work well on data from a different source, such as images from a different camera or sensor. I have studied UDA for semantic and panoptic segmentation, human action detection, and face-anti spoofing.
The second area is self-supervised and semi-supervised learning for semantic segmentation. I have explored self-supervised learning for monocular depth estimation (also known as self-supervided depth estimation), and semi-supervised learning for semantic segmentation. Self-supervised learning is a type of machine learning in which an AI model learns to predict some aspect of its input data without requiring explicit supervision or labels. In other words, the model is trained on a large dataset without being explicitly told what the correct output should be. Instead, it uses the patterns and relationships present in the data to make predictions about the data itself. Whereas, semi-supervised learning uses a combination of labeled data (data that has been explicitly marked with the correct output) and unlabeled data (data that has not been labeled) to train an AI model. The model learns to recognize patterns in the labeled data and uses that knowledge to make predictions about the unlabeled data. This approach can be especially useful when there is a limited amount of labeled data available but there is a large amount of unlabeled data. By leveraging the unlabeled data, the AI model can learn more about the underlying structure of the data and improve its accuracy in making predictions.
And the third one is unsupervised deep generative learning. I have studied deep generative learning for human facial behavior analysis. In particular, I build a deep generative model based on VAE and GAN frameworks. Deep generative learning uses a neural network to learn the underlying patterns and relationships in a set of data and then uses that knowledge to create new data that is similar to the original data. This can be used to generate new images, text, or other types of data that resemble the original data. Deep generative learning models can be trained in a variety of ways, such as using generative adversarial networks (GANs) or variational autoencoders (VAEs). These models can be used for a variety of applications, such as creating realistic images for computer graphics, generating new music or speech, or even for data augmentation to improve the performance of other machine learning models.
I also spent some time tackling research problems in multi-task learning (MTL). More specifically, two common challenges in developing multi-task models, i.e., incremental learning and task interference, are addressed. Multi-task learning is a type of machine learning where an AI model is trained to perform multiple tasks simultaneously by sharing some or all of its parameters across different tasks. In simpler terms, multi-task learning allows an AI model to learn how to perform multiple tasks at the same time, rather than training separate models for each task. By sharing some or all of its parameters across different tasks, the model can learn to recognize patterns and relationships that are common to all of the tasks, which can improve its performance on each individual task.
Click here to read more about me ...



EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

Suman Saha , Lukas Hoyer, Anton Obukhov, Dengxin Dai, and Luc Van Gool 2023

|   arXiv  |       Code   |      

Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection

Yifan Lu, Gurkirt Singh, Suman Saha , Luc Van Gool

WACV 2023

|   pdf   |    arXiv  |    Dataset    |    Code   |      

Spatio-Temporal Action Detection Under Large Motion

Gurkirt Singh, Vasileios Choutas, Suman Saha , Fisher Yu, Luc Van Gool

WACV 2023

|   pdf   |    arXiv  |    Video   |      

Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation

Suman Saha , Anton Obukhov, Danda Pani Paudel, Menelaos Kanakis, Yuhua Chen, Stamatios Georgoulis, Luc Van Gool

CVPR 2021

|   pdf   |    arXiv  |    Code  | Video   |      

Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation

Lukas Hoyer, Dengxin Dai, Yuhua Chen, Adrian Köring, Suman Saha , Luc Van Gool

CVPR 2021

|   pdf   |    arXiv  |    Code   |    Video   |      

Unsupervised Compound Domain Adaptation for Face Anti-Spoofing

Ankush Panwar, Pratyush Singh, Suman Saha , Danda Pani Paudel, Luc Van Gool

FG 2021

|   pdf   |    arXiv  |      

Road: The road event awareness dataset for autonomous driving

Gurkirt Singh, Suman Saha, Fabio Cuzzolin et al.


pdf   |   

Reparameterizing Convolutions for Incremental Multi-Task Learning Without Task Interference

Menelaos Kanakis, David Bruggemann, Suman Saha, Stamatios Georgoulis, Anton Obukhov, Luc Van Gool

ECCV 2020

|   pdf   |    arXiv  |    Code   |   

Domain Agnostic Feature Learning for Image and Video Based Face Anti-spoofing

Suman Saha , Wenhao Xu, Menelaos Kanakis, Stamatios Georgoulis, Yuhua Chen, Danda Pani Paudel, Luc Van Gool


|   Workshop Oral Video   |    Workshop Oral Slides  |    PDF   |    arxiv   |

Book chapter title: Spatio-Temporal Action Instance Segmentation and Localisation

Suman Saha , Gurkirt Singh, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin

Book title: Modelling Human Motion: From Human Perception to Robot Design

Publisher: Springer International Publishing, pages: 141-161, ISBN: 978-3-030-46732-6, year: 2020.

|   Springer Book   |   Project   |

Two-Stream AMTnet for Action Detection

Suman Saha , Gurkirt Singh, Fabio Cuzzolin

arXiv 2020.

|   arxiv   |

Unsupervised Deep Representations for Learning Audience Facial Behaviors

Suman Saha, Rajitha Navarathna, Leonhard Helminger, Romann M. Weber

CVPR 2018 Workshops

|   pdf   |    arXiv  |    Poster   |

Predicting Action Tubes

Gurkirt Singh, Suman Saha, Fabio Cuzzolin

ECCV 2018 Workshops

|   arXiv   |

Incremental Tube Construction for Human Action Detection

Harkirat Singh Behl, Michael Sapienza, Gurkirt Singh, Suman Saha, Fabio Cuzzolin, Philip H. S. Torr

BMVC 2018 (Oral)

|   arXiv   |

Spatio-temporal Human Action Detection and Instance Segmentation in Videos

Suman Saha

PhD thesis, Oxford Brookes University, United Kingdom, 2018

|   PhD Thesis PDF    |    PhD Thesis Defense Slides   |

TraMNet - Transition Matrix Network for Efficient Action Tube Proposals

Gurkirt Singh, Suman Saha, Fabio Cuzzolin

ACCV 2018

|   arxiv   |

Action Detection from a Robot-Car Perspective

Valentina Fontana, Manuele Di Maio, Stephen Akrigg, Gurkirt Singh, Suman Saha, Fabio Cuzzolin

arXiv 2018

|   arxiv   |

AMTnet: Action-Micro-Tube regression by end-to-end trainable deep architecture

Suman Saha, Gurkirt Singh, Fabio Cuzzolin

ICCV 2017

|   pdf   |   suppl. material   |   arxiv   |   poster   |   Code   |

Online Real-time Multiple Spatiotemporal Action Localisation and Prediction

Gurkirt Singh, Suman Saha, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin

ICCV 2017

|   pdf   |   suppl. material   |   arxiv   |   poster   |   ICCV 2017 Demo Video   |   Code   |

Spatio-temporal human action localisation and instance segmentation in temporally untrimmed videos

Suman Saha, Gurkirt Singh, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin

arXiv, 2017

|   Project Page Link   |   arxiv   |

Metric learning for Parkinsonian identification from IMU gait measurements

Fabio Cuzzolin, Michael Sapienza, Patrick Esser, Suman Saha, Miss Marloes Franssen, Johnny Collett, Helen Dawes

Gait & Posture, Volume 54, May 2017, Pages 127-132

|   ScienceDirect link   |

Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos

Suman Saha, Gurkirt Singh, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin

BMVC 2016

|   project page link  |   arxiv  |   Full Version  |   code  |   poster   |

A real-time monocular vision-based frontal obstacle detection and avoidance for low cost UAVs in GPS denied environment

Suman Saha, Ashutosh Natraj, Sonia Waharte

Aerospace Electronics and Remote Sensing Technology (ICARES), 2014 IEEE International Conference on

|   pdf  |   project page link   |

Face Recognition using PCA and Multilayer Feedforward Neural Networks

Suman Saha

European Journal of Applied Sciences and Technology [EUJAST] Volume 1 (1), March 2014

|   pdf   |

A Monocular Vision Approach for Obstacle Detection and Collision Avoidance for Low-cost Quadrocopters

Suman Saha

MSc Thesis , University of Bedfordshire, Uited Kingdom. January 2014

|   MSc Thesis  |   MSc Defense Poster  |   project page link   |

Rsearch Activities and Awards

More About Me ...

Before joining ETH Zürich, I was a Research Associate (RA) or a postdoctoral researcher at the Department of Computing and Communication Technologies, Oxford Brookes University, where I spent four wonderful years (included my PhD studies).
I have received my PhD degree under the supervision of Professor Fabio Cuzzolin at Oxford Brookes University, United Kingdom. Professor Nigel Crook and Dr Tjeerd Olde Scheper where my PhD co-supervisors.
My PhD thesis topic was Spatio-temporal Human Action Detection and Instance Segmentation in Videos. The two main objectives of my PhD thesis were to propose: (1) efficient algorithms to locate (in space and time) multiple co-occurring human action instances present in realistic videos; (2) powerful video level deep feature representation to improve the state-of-the-art action detection accuracy.
Besides, I was an active member of the Artificial Intelligence and Vision Research Group led by Professor Fabio Cuzzolin. I consider myself fortunate to have an opportunity to work closely with the world renowned Torr Vision Group (TVG) in the Department of Engineering Science at University of Oxford. More specifically, during my PhD, I worked with my PhD guide Dr Michael Sapienza and Professor Philip H. S. Torr. who is the founder of TVG.
During summer 2017, I received a wonderful opportunity to work with Dr Romann Weber Senior Research Scientist and Head of Machine Intelligence and Data Science Group at Disney Research Zurich (DRZ). At DRZ, I wokred for the project named unsupervised and semi-supervised learning of audience facial expressions using deep generative models. We improved the classification accuracy by 9% over the existing method.

I have completed my Master's study from the Department of Computer Science and Technology, University of Bedfordshire (UoB), United Kingdom. During my MSc thesis work (i.e., in 2013-2014), I proposed a novel realtime algorithm for frontal obstacle detection and avoidance for low cost unmanned aerial vehicles (UAVs). The related publication can be accessed using this link. My MSc thesis supervisors were Dr Ashutosh Natraj and Sonia Waharte , post doctoral researchers in the Department of Computer Science, University of Oxford.

Before pursuing my Master's study in UK, I worked as a Software Analyst at the the Research and Development and Scientific Services division, Tata Steel Ltd. India. At R&D Tata Steel, I worked under the supervision of DR. Sumitesh Das, Chief (Global Research Programmes) at Tata Steel Ltd. My CV can be viewed by clicking this link.

I received my Polytechnic Diploma Engineering degree in Computer Science from Siddaganga Polytechnic College, India.