IIT Bombay | B.Tech Chemical Engineering | Minor in Machine Intelligence and Data Science
Shubh Sareen
I build and study AI systems that turn unstructured data into structured, actionable knowledge. My work focuses on LLM pipelines and agentic workflows, along with understanding their limitations in areas like long-context reasoning, retrieval, and reliability.
I focus on building systems, not just models.
Experience
Machine Learning Intern
Recommendation Systems · Embeddings · Similarity Search
Worked on a content-based recommendation system using embedding-based representations of user interactions. Built a pipeline where user preferences were modeled as weighted combinations of interacted content and recommendations were generated using cosine similarity.
Explored limitations of this approach, particularly how aggregating embeddings can collapse multi-interest user behavior and lead to weak or ambiguous representations. Also observed how poor data quality and overlapping categories affected embedding separation and retrieval performance.
This experience helped me understand that the effectiveness of similarity-based systems depends heavily on representation quality, not just the choice of algorithm.
What I work with
Key Projects
YouTube AI Helper
Built an LLM-based system that converts long-form YouTube videos and playlists into structured notes, flashcards, and a queryable interface. Designed a modular pipeline for transcript extraction, summarization, and Q&A, with support for multi-video context. Explored limitations of full-context prompting and began working toward retrieval-based approaches for better relevance and scalability.
Systems & Experiments
Vision-to-Text SOC
Built a vision-to-text pipeline for extracting structured information from image-heavy inputs.
• Combined OCR and NLP techniques to convert visual data into usable text representations
• Designed the pipeline for practical document understanding workflows
• Explored challenges in noisy inputs, layout variation, and downstream text usability
• Focused on making extracted data structured and actionable rather than raw output
WIDS 2025 Agentic AI
Built an agentic AI system exploring multi-step reasoning and tool-based workflows.
• Designed modular agents for task decomposition, routing, and response generation
• Experimented with prompt orchestration and tool-use patterns for real-world tasks
• Explored limitations of agent reliability, control flow, and hallucination handling
• Focused on making agent behavior predictable and useful, not just autonomous