Interspeech 2023 Tutorial
-
Interspeech 23 Resource Efficient and Cross-Modal Learning Toward Foundation Modeling Tutorial - Video
-
ICASSP 22 Tutorial Neural Model Reprogramming and Prompting for Speech Modeling - Video | Slide
-
ICASSP 23 Tutorial Parameter-Efficient Learning (PEL) for Speech and NLP: Adapters, Prompts, and Reprogramming - Slide
Part 1. Overview of Resource Efficient Learning, Dr. Huck Yang
9:00
1.1. Parameter-Efficient Learning
- Background of Frozen Model Adaptation
- Neural Adapter, Reprogramming, Prompting, and Low-Rank Adaptation (LoRA)
| Title |
Authors |
Code |
Year |
| Differentially Private Adapters for Parameter Efficient Acoustic Modeling |
C.-W. Ho et al. |
code |
Interspeech 2023 |
| Parameter-Efficient Learning for Text-to-Speech Accent Adaptation |
L.-J. Yang et al. |
code |
Interspeech 2023 |
| A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model |
S. Radhakrishnan et al. |
code |
Interspeech 2023 |
1.2. Memory-Efficient Learning
- Reduce to GPU / TPU Memory During the Training (e.g., the Memory of Activation)
- Model Serialization
- Efficient On-Device Learning via Feature Reprogramming (CVPR 2022)
- Ladder-Side Tuning (NeurIPS 2022)
1.3 How to Estimate which Layer or which Model to Tune?
- Universal Approximation Theory (IEEE TIP 1993)
- LogME: Practical Assessment of Pre-trained Models for Transfer Learning (ICML 2021)
- Latent Space Alignment in "Reprogramming Acoustic Models for Time Series Classification" (ICML 2021)
| Title |
Authors |
Code |
Year |
| How to Estimate Model Transferability of Pre-Trained Speech Models? |
Z.-C. Chen et al. |
code |
Interspeech 2023 |
1.4 Advanced Low-Rank Adaptation (LoRA) Techniques
- Cross-Modal Merging
- Low-Rank Adaptation (LoRA) for Foundation Modeling and Pre-Training
1.5 Community Service
- Special Session in ICASSP 2024: In-Context Learning for Speech and Language Processing
- [email protected]
Break: Hand-on Session 1 (5 min)
- How to Train Your Whisper with Neural Adapter and LoRA
Part 2: Trustworthy AI and Cross-Modal Learning in the Era of Foundation Models, Dr. Pin-Yu Chen
11:00 to 11:45
Part 3: Multimodal Pre-Training for Automatic Speech Recognition and Vision Sharing, Dr. Shalini Ghosh
11:45 to 12:20
Spotlight Invited Talk, "Prompting LLM for ASR," by Dr. Chunyang Wu, Meta AI
12:20 to 12:30
QA and Plenary Discussion
12:40 to 12:45