Banner

TUTORIAL – FULL DAY SESSION
BUILDING INTELLIGENT APPLICATIONS WITH MULTIMODAL LANGUAGE MODELS

Abstract : SThis tutorial aims to provide an in-depth understanding of multimodal language models (MLMs), which integrate different modalities such as text, images, and audio to create intelligent applications. The tutorial is structured into three segments: (1) Understanding the architecture of transformers, (2) Understanding the architecture and working principles of multimodal language models, (3) Hands-on training and developing real-world applications using multimodal models. The tutorial will be interactive, featuring coding exercises with multimodal language models.

Introduction : Traditional NLP models primarily rely on text-based inputs, limiting their understanding of real-world scenarios that involve multiple data sources such as images, videos, and speech. Multimodal language models (MLMs) address this limitation by integrating different data modalities and enhancing applications in image captioning, video understanding, and conversational AI. This tutorial will explore key concepts, architectures, and best practices for implementing multimodal models in production environments.

Outline : Understanding Transformers:

  • Understanding the general architecture of transformers
  • Understanding attention mechanism
  • Training transformers

Understanding Multimodal Language Models :

  • Introduction to Multimodal AI
  • Key challenges and solutions in multimodal learning
  • Transformer-based architectures for multimodal tasks
  • Overview of models like Gemini and CLIP

Building Applications with Multimodal AI:

The final part of the tutorial will provide a hands-on coding session including getting access to APIs, design prompts and multimodal language models, and how to build applications.

Learning Objectives : The tutorial will help participants :

  • Gain a strong foundation in multimodal AI and its real-world applications.
  • Understand different multimodal model architectures and their use cases.
  • Learn practical skills in implementing multimodal models.
  • Develop end-to-end intelligent applications using multimodal AI.

Duration : The tutorial is divided into three sections with a total duration of 6 hours:

  • Understanding transformers (2 hours)
  • Understanding multimodal models (2 hours)
  • Building applications with multimodal language (2 hours)

Target Audience :

The tutorial is intended for researchers, graduate students, and industry professionals interested in NLP, LLMs, and Generative AI. A basic understanding of deep learning and Python programming is helpful.

Instructors :

Dr. Anukriti Bansal is a data scientist at LUMIQ and a Google Developer Expert in AI/ML with expertise in generative AI, NLP, and computer vision. She has extensive experience in developing AI-driven solutions and has organized multiple workshops and community events on AI and LLMs.

Email : anukriti1107@gmail.com

Dr. Vikas Bajpai is an Assistant Professor in the Department of CSE at The LNM Institute of Information Technology. In addition, he fulfills the role of Assistant Dean of Alumni Relations and Engagement and leads the Centre for Sports Technology, Engineering and Management (C-STEAM) at the University. Vikas is also the Lead for Google Developer's Group Jaipur and is a Mentor for Google India Community Mentorship. He has provided guidance to B.Tech. and M.Tech. Students in the areas of Machine Learning and Deep Learning. His research interests span Software Engineering, Requirements Engineering, and the development of deep learning models for predictive and estimative purposes. Vikas is an active speaker, delivering talks, expert lectures, and workshops at both national and international levels. Vikas has received several accolades, including grants and fellowships such as the Open Data Science Fellowship, sponsorship for Google I/O in 2014 and 2017, and the Science Foundation Ireland Grant. Additionally, in acknowledgment of his professional standing, he has been elevated to the status of IEEE Senior Member, Life Member CSI, and ISTE.

Email : vikas.bajpai87@gmail.com

Dr. Nilotpal Chakraborty is currently working as an Assistant Professor in Computer Science and Engineering at the Indian Institute of Information Technology Guwahati. He obtained his PhD in Computer Science and Engineering from the Indian Institute of Technology Patna in 2019. He has worked as an IT Solution and Innovation Expert as a Postdoctoral Researcher at the Department of Computer Science, Aalborg University, Denmark, and at EMAX Group, Belgium. His research interests include Scheduling and Optimization in smart grid, electric vehicles, unmanned aerial vehicles, AI, and Blockchain for Cyber Physical Systems. He is an IEEE Senior Member.

Email : nilotpal@iiitg.ac.in