Luxshan Thavarasa

Luxshan Thavarasa

Software Engineer, Machine Learning (Level II) at H2O.ai
CSE, University of Moratuwa  ·  Independent researcher in speech & language

I work on speech emotion recognition and multilingual speech processing, with a focus on low-resource languages. My Interspeech 2026 paper, KuralHub, maps how multilingual SER holds up across typologically diverse languages. I'm now building compact, unified models that push this toward 29 languages — and beyond speech toward more general representations.

Download CV ↓

Research

I am a Computer Science & Engineering graduate of the University of Moratuwa, now a machine learning engineer at H2O.ai while pursuing independent research toward a thesis-based master's. My work centres on making speech technology work for the languages it usually leaves behind — starting with emotion, and broadening toward general, transferable speech understanding.

Current work — the next phase of KuralHub

Building on KuralHub, I'm developing a compact ("tiny") model toward a single speech-emotion-recognition system that works across 29 languages, with the hardest low-resource cases as the priority — and extending it past speech alone toward more general, transferable representations. Alongside this I'm exploring Dravidian language technologies and interpretability for text-based language models.

Interests

Publications

Interspeech 2026 Main track · Sydney

KuralHub: Exposing Typological Capability Frontiers in Multilingual Speech Emotion Recognition

Luxshan Thavarasa et al.

Examines how multilingual speech emotion recognition generalises across typologically diverse languages, identifying where current models reach their capability limits — with direct implications for low-resource languages.

arXiv preprint · 2025

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

Community collaboration of 335 researchers (incl. Luxshan Thavarasa)

A hand-built commonsense-reasoning benchmark across 116 language varieties, 14 language families and 23 writing systems; LLMs lag on lower-resource languages by up to 37%. I contributed the Sri Lankan Tamil subset.

DravidianLangTech @ NAACL 2025

Incepto@DravidianLangTech 2025: Detecting Abusive Tamil and Malayalam Text Targeting Women on YouTube

Luxshan Thavarasa, Sivasuthan Sukumar, Jubeerathan Thevakumar

A multilingual model for abusive-content detection in low-resource, code-mixed text using transfer learning and multi-head attention. Macro-F1 of 0.79 (Tamil) and 0.71 (Malayalam).

CHiPSAL @ COLING 2025

EmoTa: A Tamil Emotional Speech Dataset

Jubeerathan Thevakumar, Luxshan Thavarasa, Thanikan Sivatheepan, Sajeev Kugarajah, Uthayasanker Thayasivam

The first emotional speech dataset for Tamil — 936 utterances from 22 native Sri Lankan Tamil speakers across five emotions (Fleiss' κ = 0.74), with emotion-classification F1 up to 0.91.

Experience

Nov 2023 — Present

H2O.ai

Software Engineering Intern → Software Engineer → Software Engineer, Machine Learning (Level II) · Colombo
  • H2OGPTe & Agentic AI: Build agentic GenAI features end to end — agent-interaction UI/UX, the agent framework, and scalable backends (React, FastAPI, PostgreSQL).
  • ChurnApp: Customer-churn prediction trained with H2O Driverless AI and deployed in an MLOps pipeline.
  • Olympic App: Built a full hackathon platform solo with H2O Wave and PostgreSQL.
Jan 2023 — Jun 2025

aaivu · Full-stack Developer

Colombo
  • Built project and conference modules for a research group's web platform and mentored junior contributors.

Education

Mar 2021 — Jun 2025

BSc Engineering (Honours), Computer Science & Engineering

  • Minor: Mathematics
  • Standing: Second Class, Upper Division (GPA 3.47). Degree taught entirely in English.
  • Final-Year Project — Multilingual Universal Speech Emotion Recognition Model: a unified SER model spanning multiple languages; the precursor to KuralHub.
  • Coursework: Neural Networks & Fuzzy Logic, Machine Vision, Image Processing, Introduction to Machine Learning, Advanced Algorithms, Linear Models & Multivariate Statistics.
2019

G.C.E. Advanced Level — Physical Science

Mu/Visuvamadu Maha Vidyalayam, Mullaitivu
  • Three A grades · Island Rank 209 · Z-Score 2.4704.

Selected projects

2025

DravidaKavacham

Open-source abusive-content detection for Tamil & Malayalam (DravidianLangTech @ NAACL 2025).
Dec 2024

FastMCP File Server

Secure file server implementing the Model Context Protocol for AI assistants. Python, FastAPI.
2024

QuickChat

Chrome-extension help chatbot with RAG over scraped site content. OpenAI embeddings, Chroma DB, FastAPI.

Awards & service

Certifications

Fundamentals of Deep Learning — NVIDIA (2025) · LLMs L1 — H2O.ai (2024) · Machine Learning A–Z — Udemy (2024) · Meta Front-End Development — Coursera (2023).

Contact

Open to research collaborations and graduate (MASc/MSc) opportunities in speech and language processing.

Email
luxshan2327@gmail.com
GitHub
github.com/luxshan2000
LinkedIn
linkedin.com/in/lux-thavarasa
Location
Mullaitivu, Northern Province, Sri Lanka
Languages
Tamil (native), English (professional)