VLM -- DASYA

Tagged with: VLM

PROPOSAL

Answering Questions from Multimedia Collections

Constrastive learning models have made it easier to find relevant content from multimedia collections through descriptive text queries, reducing the interactivity needed to solve simple tasks. However, for complex tasks not only pertaining visual elements, or tasks focused on answering questions involving the contents of a singular or multiple media items (videos/images) still requires a fair …
Supervisor: Omar Shahbaz Khan
Semester: Spring 2026
Tags: Multimedia Retrieval, VLM, RAG, Vector Store, Multimedia Indexing