Abstract : SThis tutorial aims to provide an in-depth understanding of multimodal language models (MLMs), which integrate different modalities such as text, images, and audio to create intelligent applications. The tutorial is structured into three segments: (1) Understanding the architecture of transformers, (2) Understanding the architecture and working principles of multimodal language models, (3) Hands-on training and developing real-world applications using multimodal models. The tutorial will be interactive, featuring coding exercises with multimodal language models.
Introduction : Traditional NLP models primarily rely on text-based inputs, limiting their understanding of real-world scenarios that involve multiple data sources such as images, videos, and speech. Multimodal language models (MLMs) address this limitation by integrating different data modalities and enhancing applications in image captioning, video understanding, and conversational AI. This tutorial will explore key concepts, architectures, and best practices for implementing multimodal models in production environments.
Outline : Understanding Transformers:
Understanding Multimodal Language Models :
Building Applications with Multimodal AI:
The final part of the tutorial will provide a hands-on coding session including getting access to APIs, design prompts and multimodal language models, and how to build applications.
Learning Objectives : The tutorial will help participants :
Duration : The tutorial is divided into three sections with a total duration of 6 hours:
Target Audience :
The tutorial is intended for researchers, graduate students, and industry professionals interested in NLP, LLMs, and Generative AI. A basic understanding of deep learning and Python programming is helpful.
Instructors :
Dr. Anukriti Bansal is a data scientist at LUMIQ and a Google Developer Expert in AI/ML with expertise in generative AI, NLP, and computer vision. She has extensive experience in developing AI-driven solutions and has organized multiple workshops and community events on AI and LLMs.
Email : anukriti1107@gmail.com
Dr. Vikas Bajpai is an Assistant Professor in the Department of CSE at The LNM Institute of Information Technology. In addition, he fulfills the role of Assistant Dean of Alumni Relations and Engagement and leads the Centre for Sports Technology, Engineering and Management (C-STEAM) at the University. Vikas is also the Lead for Google Developer's Group Jaipur and is a Mentor for Google India Community Mentorship. He has provided guidance to B.Tech. and M.Tech. Students in the areas of Machine Learning and Deep Learning. His research interests span Software Engineering, Requirements Engineering, and the development of deep learning models for predictive and estimative purposes. Vikas is an active speaker, delivering talks, expert lectures, and workshops at both national and international levels. Vikas has received several accolades, including grants and fellowships such as the Open Data Science Fellowship, sponsorship for Google I/O in 2014 and 2017, and the Science Foundation Ireland Grant. Additionally, in acknowledgment of his professional standing, he has been elevated to the status of IEEE Senior Member, Life Member CSI, and ISTE.
Email : vikas.bajpai87@gmail.com
Dr. Nilotpal Chakraborty is currently working as an Assistant Professor in Computer Science and Engineering at the Indian Institute of Information Technology Guwahati. He obtained his PhD in Computer Science and Engineering from the Indian Institute of Technology Patna in 2019. He has worked as an IT Solution and Innovation Expert as a Postdoctoral Researcher at the Department of Computer Science, Aalborg University, Denmark, and at EMAX Group, Belgium. His research interests include Scheduling and Optimization in smart grid, electric vehicles, unmanned aerial vehicles, AI, and Blockchain for Cyber Physical Systems. He is an IEEE Senior Member.
Email : nilotpal@iiitg.ac.in