Testing LoViT using Cholec80 data
Tried to retrieve spatial features from video input
IMPLEMENTED:
- version specification for timm and torch
- video preprocessing to get tensor in shape (Batch, Channels, Frame, Height, Width)
- spatial feature extraction to get tensor in shape (Batch, Classes, Frame, 1, 1)
- spatial feature extraction for the first 10 frames of video01.mp4
SOLVED AttributeError: 'GELU' object has no attribute 'approximate' fixed by replacing gelu layer and removing approximation in gelu layers.
UNRESOLVED! OutOfMemoryError: CUDA out of memory. Tried to allocate 24.30 GiB (GPU 0; 14.65 GiB total capacity; 327.30 MiB already allocated; 3.21 GiB free; 382.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF has not been fixed yet.