Luxshan Thavarasa
CSE, University of Moratuwa · Independent researcher in speech & language
I work on speech emotion recognition and multilingual speech processing, with a focus on low-resource languages. My Interspeech 2026 paper, KuralHub, maps how multilingual SER holds up across typologically diverse languages. I'm now building compact, unified models that push this toward 29 languages — and beyond speech toward more general representations.
Download CV ↓Research
I am a Computer Science & Engineering graduate of the University of Moratuwa, now a machine learning engineer at H2O.ai while pursuing independent research toward a thesis-based master's. My work centres on making speech technology work for the languages it usually leaves behind — starting with emotion, and broadening toward general, transferable speech understanding.
Building on KuralHub, I'm developing a compact ("tiny") model toward a single speech-emotion-recognition system that works across 29 languages, with the hardest low-resource cases as the priority — and extending it past speech alone toward more general, transferable representations. Alongside this I'm exploring Dravidian language technologies and interpretability for text-based language models.
- Multilingual & low-resource speech processing
- Speech emotion recognition
- Compact and efficient speech models
- Representation learning beyond speech
- Dravidian language technologies
- Interpretability for language models
Publications
KuralHub: Exposing Typological Capability Frontiers in Multilingual Speech Emotion Recognition
Examines how multilingual speech emotion recognition generalises across typologically diverse languages, identifying where current models reach their capability limits — with direct implications for low-resource languages.
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
A hand-built commonsense-reasoning benchmark across 116 language varieties, 14 language families and 23 writing systems; LLMs lag on lower-resource languages by up to 37%. I contributed the Sri Lankan Tamil subset.
Incepto@DravidianLangTech 2025: Detecting Abusive Tamil and Malayalam Text Targeting Women on YouTube
A multilingual model for abusive-content detection in low-resource, code-mixed text using transfer learning and multi-head attention. Macro-F1 of 0.79 (Tamil) and 0.71 (Malayalam).
EmoTa: A Tamil Emotional Speech Dataset
The first emotional speech dataset for Tamil — 936 utterances from 22 native Sri Lankan Tamil speakers across five emotions (Fleiss' κ = 0.74), with emotion-classification F1 up to 0.91.
Experience
H2O.ai
- H2OGPTe & Agentic AI: Build agentic GenAI features end to end — agent-interaction UI/UX, the agent framework, and scalable backends (React, FastAPI, PostgreSQL).
- ChurnApp: Customer-churn prediction trained with H2O Driverless AI and deployed in an MLOps pipeline.
- Olympic App: Built a full hackathon platform solo with H2O Wave and PostgreSQL.
aaivu · Full-stack Developer
- Built project and conference modules for a research group's web platform and mentored junior contributors.
Education
BSc Engineering (Honours), Computer Science & Engineering
- Minor: Mathematics
- Standing: Second Class, Upper Division (GPA 3.47). Degree taught entirely in English.
- Final-Year Project — Multilingual Universal Speech Emotion Recognition Model: a unified SER model spanning multiple languages; the precursor to KuralHub.
- Coursework: Neural Networks & Fuzzy Logic, Machine Vision, Image Processing, Introduction to Machine Learning, Advanced Algorithms, Linear Models & Multivariate Statistics.
G.C.E. Advanced Level — Physical Science
- Three A grades · Island Rank 209 · Z-Score 2.4704.
Selected projects
DravidaKavacham
FastMCP File Server
QuickChat
Awards & service
- 2026Interspeech 2026 — paper accepted to the main track.
- 2025Reviewer, DravidianLangTech @ NAACL 2025.
- 2025Peer-reviewed papers at COLING 2025 and NAACL 2025 workshops.
- 2018All-Island Mathematics Competition — 2nd Runner-Up, Northern Province team.
Fundamentals of Deep Learning — NVIDIA (2025) · LLMs L1 — H2O.ai (2024) · Machine Learning A–Z — Udemy (2024) · Meta Front-End Development — Coursera (2023).
Contact
Open to research collaborations and graduate (MASc/MSc) opportunities in speech and language processing.
- luxshan2327@gmail.com
- GitHub
- github.com/luxshan2000
- linkedin.com/in/lux-thavarasa
- Location
- Mullaitivu, Northern Province, Sri Lanka
- Languages
- Tamil (native), English (professional)