Luxshan Thavarasa - Speech & Language Researcher

I work on speech emotion recognition and multilingual speech processing, with a focus on low-resource languages. My Interspeech 2026 paper, KuralHub, maps how multilingual SER holds up across typologically diverse languages. I'm now building compact, unified models that push this toward 29 languages — and beyond speech toward more general representations.

Download CV ↓

Research

I am a Computer Science & Engineering graduate of the University of Moratuwa, now a machine learning engineer at H2O.ai while pursuing independent research toward a thesis-based master's. My work centres on making speech technology work for the languages it usually leaves behind — starting with emotion, and broadening toward general, transferable speech understanding.

Current work — the next phase of KuralHub

Building on KuralHub, I'm developing a compact ("tiny") model toward a single speech-emotion-recognition system that works across 29 languages, with the hardest low-resource cases as the priority — and extending it past speech alone toward more general, transferable representations. Alongside this I'm exploring Dravidian language technologies and interpretability for text-based language models.

Interests

Multilingual & low-resource speech processing
Speech emotion recognition
Compact and efficient speech models
Representation learning beyond speech
Dravidian language technologies
Interpretability for language models

Publications

Interspeech 2026 Main track · Sydney

KuralHub: Exposing Typological Capability Frontiers in Multilingual Speech Emotion Recognition

Luxshan Thavarasa et al.

Examines how multilingual speech emotion recognition generalises across typologically diverse languages, identifying where current models reach their capability limits — with direct implications for low-resource languages.

PDF (coming soon)

arXiv preprint · 2025

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

Community collaboration of 335 researchers (incl. Luxshan Thavarasa)

A hand-built commonsense-reasoning benchmark across 116 language varieties, 14 language families and 23 writing systems; LLMs lag on lower-resource languages by up to 37%. I contributed the Sri Lankan Tamil subset.

arXiv

DravidianLangTech @ NAACL 2025

Incepto@DravidianLangTech 2025: Detecting Abusive Tamil and Malayalam Text Targeting Women on YouTube

Luxshan Thavarasa, Sivasuthan Sukumar, Jubeerathan Thevakumar

A multilingual model for abusive-content detection in low-resource, code-mixed text using transfer learning and multi-head attention. Macro-F1 of 0.79 (Tamil) and 0.71 (Malayalam).

ACL Anthology

CHiPSAL @ COLING 2025

EmoTa: A Tamil Emotional Speech Dataset

Jubeerathan Thevakumar, Luxshan Thavarasa, Thanikan Sivatheepan, Sajeev Kugarajah, Uthayasanker Thayasivam

The first emotional speech dataset for Tamil — 936 utterances from 22 native Sri Lankan Tamil speakers across five emotions (Fleiss' κ = 0.74), with emotion-classification F1 up to 0.91.

ACL Anthology Dataset

Experience

Nov 2023 — Present

H2O.ai

Software Engineering Intern → Software Engineer → Software Engineer, Machine Learning (Level II) · Colombo

H2OGPTe & Agentic AI: Build agentic GenAI features end to end — agent-interaction UI/UX, the agent framework, and scalable backends (React, FastAPI, PostgreSQL).
ChurnApp: Customer-churn prediction trained with H2O Driverless AI and deployed in an MLOps pipeline.
Olympic App: Built a full hackathon platform solo with H2O Wave and PostgreSQL.

Jan 2023 — Jun 2025

aaivu · Full-stack Developer

Colombo

Built project and conference modules for a research group's web platform and mentored junior contributors.

Education

Mar 2021 — Jun 2025

BSc Engineering (Honours), Computer Science & Engineering

University of Moratuwa, Sri Lanka

Minor: Mathematics
Standing: Second Class, Upper Division (GPA 3.47). Degree taught entirely in English.
Final-Year Project — Multilingual Universal Speech Emotion Recognition Model: a unified SER model spanning multiple languages; the precursor to KuralHub.
Coursework: Neural Networks & Fuzzy Logic, Machine Vision, Image Processing, Introduction to Machine Learning, Advanced Algorithms, Linear Models & Multivariate Statistics.

2019

G.C.E. Advanced Level — Physical Science

Mu/Visuvamadu Maha Vidyalayam, Mullaitivu

Three A grades · Island Rank 209 · Z-Score 2.4704.

Selected projects

2025

DravidaKavacham

Open-source abusive-content detection for Tamil & Malayalam (DravidianLangTech @ NAACL 2025).

Dec 2024

FastMCP File Server

Secure file server implementing the Model Context Protocol for AI assistants. Python, FastAPI.

2024

QuickChat

Chrome-extension help chatbot with RAG over scraped site content. OpenAI embeddings, Chroma DB, FastAPI.

Awards & service

2026Interspeech 2026 — paper accepted to the main track.
2025Reviewer, DravidianLangTech @ NAACL 2025.
2025Peer-reviewed papers at COLING 2025 and NAACL 2025 workshops.
2018All-Island Mathematics Competition — 2nd Runner-Up, Northern Province team.

Certifications

Fundamentals of Deep Learning — NVIDIA (2025) · LLMs L1 — H2O.ai (2024) · Machine Learning A–Z — Udemy (2024) · Meta Front-End Development — Coursera (2023).

Contact

Open to research collaborations and graduate (MASc/MSc) opportunities in speech and language processing.

Email: luxshan2327@gmail.com
GitHub: github.com/luxshan2000
LinkedIn: linkedin.com/in/lux-thavarasa
Location: Mullaitivu, Northern Province, Sri Lanka
Languages: Tamil (native), English (professional)