Projects

Bitrate Ladder Construction using Visual Information Fidelity

  • Proposed features drawn from Visual Information Fidelity (VIF) (VIF features) are extracted from uncompressed videos to predict the visual quality (VMAF) of compressed videos.
  • Presented multiple VIF feature sets extracted from different scales and subbands of a video to tackle the problem of bitrate ladder construction.
  • Achieved significant bjontegaard delta gains against fixed bitrate ladder and achieved performance close to optimal per-title bitrate ladder constructed from exhaustive encoding.
  • Submitted in Picture Coding Symposium 2024.
An Efficient Approach to Super-Resolution with Fine-Tuning Diffusion Models

  • Explored the potential of pre-trained diffusion model SR3, specifically fine-tuning and zero-shot approaches for the task of image super-resolution.
  • Demonstrated the generalization ability of fine-tuning process of SR3. The fine-tuning process is evaluated with limited time steps, iterations and data samples.
  • Evaluated the zero-shot approach of using range-null space decomposition for super-resolution using unconditional DDPM with using a conditional DDPM SR3 trained from scratch.
Optical Flow Less Video Frame Interpolation

  • Designed a lightweight video restoration transformer to capture long-range interactions, for fast inference and smaller training requirements for video frame interpolation. The model employs self-attention for feature extraction and mutual-attention as a surrogate to motion estimation to capture temporal information and feature alignment.
  • Created a training procedure to predict intermediate frames of the video which are continuous with subsequent frames by only looking at the previous frames essentially following causality.
  • Achieved comparable results with other SoTA video interpolation models.
Similarities between local-patch quality maps of NR IQA algorithms and saliency maps of computer vision classification models

  • Achieved an understanding of similarities between perception of images by humans and classification models. NR-IQA models trained on human judgments/quality ratings are used to replicate the perception of humans. Local-patch quality maps provide the key areas focused on while rating an image.
  • Using PaQ-2-PiQ to create local-patch quality maps for images. ResNet18 is trained on images rated as good-quality images by PaQ-2-PiQ and saliency maps are generated using Grad-CAM.
  • Compared the variation in local-patch quality maps and saliency maps due natural scene distortions like brightness, contrast, jpeg-compression, motion-blur, zoom-blur, etc.
Reinforcement Learning for Autonomous Navigation of Cars

  • Applied reinforcement learning for autonomous navigation of cars with two objectives; the car stays in the lane and the speed of the car is under the speed limit. Understood the benefits and difficulties of using reinforcement learning techniques for autonomous navigation.
  • Designed the reward functions based on visual inputs from the camera using segmentation and lane detection models. The agent can regulate the speed and steering angle of the car and is trained using Deep-Q-Learning.
Effects of reduced frame corruptions on video classification

  • Used CNN-RNN architecture for classifying videos.
  • Designed various natural and adversarial single frame corruptions and understanding their impacts on classification.
  • Designed a reduced frame-level adversarial attack to fool the video classification model.