Software Engineer @ H2O.ai | Research Enthusiast | CSE @ University of Moratuwa
Location: Visuvamadu East, Visuvamadu, Mullaitivu, Northern Province, Sri Lanka
Email: luxshan2327 [AT] gmail.com
Final-year Computer Science and Engineering student at the University of Moratuwa with a passion for Full Stack Development and Artificial Intelligence. My research focuses on Speech Emotion Recognition (SER) for low-resource languages, aiming to bring AI advancements to diverse linguistic communities.
I enjoy working in collaborative environments and thrive on creating innovative solutions that make complex data accessible and meaningful to users. I'm eager to connect with like-minded individuals and work on projects that push the boundaries of technology.
Research Focus: Speech Emotion Recognition ยท Natural Language Processing ยท Agentic AI Systems ยท Full-Stack Development
Languages: Tamil (Native), English (Professional Working)
Current Role: Software Engineer at H2O.ai, developing frontend and backend solutions for H2OGPTe and Agentic Applications.
H2OGPTe: Developing the frontend with a strong focus on UI/UX for agent interactions, while contributing to backend improvements to ensure seamless functionality.
Rtuthaya Website: Contributed to the development of Dr. Uthayasankar Thayasivam's website. Managed content creators and guided new juniors in enhancing the application. Key contributions included project page and conference page development using PHP, MySQL, HTML, CSS, JavaScript, and Bootstrap.
G.C.E. (A/L) Examination 2019: 3A grades | Physical Science stream | Island Rank: 209 | Z-Score: 2.4704
G.C.E. (O/L) Examination 2016: 8A, 1B | Including Mathematics, Science, English, and Business & Accounts
To date, there exist almost no culturally-specific evaluation benchmarks for large language models (LLMs) that cover a large number of languages and cultures. In this paper, we present Global PIQA, a participatory commonsense reasoning benchmark for over 100 languages, constructed by hand by 335 researchers from 65 countries around the world. The 116 language varieties in Global PIQA cover five continents, 14 language families, and 23 writing systems. In the non-parallel split of Global PIQA, over 50% of examples reference local foods, customs, traditions, or other culturally-specific elements. We find that state-of-the-art LLMs perform well on Global PIQA in aggregate, but they exhibit weaker performance in lower-resource languages (up to a 37% accuracy gap, despite random chance at 50%). Open models generally perform worse than proprietary models. Global PIQA highlights that in many languages and cultures, everyday knowledge remains an area for improvement, alongside more widely-discussed capabilities such as complex reasoning and expert knowledge. Beyond its uses for LLM evaluation, we hope that Global PIQA provides a glimpse into the wide diversity of cultures in which human language is embedded.
This study introduces a novel multilingual model designed to effectively address the challenges of detecting abusive content in low-resource, code-mixed languages. Our model achieved a macro F1 score of 0.7864 on Tamil dataset and 0.7058 on Malayalam dataset using transfer learning techniques and multi-head attention mechanisms.
This paper introduces EmoTa, the first emotional speech dataset in Tamil, designed to reflect the linguistic diversity of Sri Lankan Tamil speakers. EmoTa comprises 936 utterances from 22 native Tamil speakers across five primary emotions with substantial Fleiss' Kappa agreement score of 0.74 and F1-scores of 0.91 and 0.90.
A versatile, secure file server implementing the Model Context Protocol (MCP) that provides AI assistants with safe file operations. Features multiple connection modes (stdio, HTTP, public access via ngrok), configurable access levels, and comprehensive security controls for various deployment scenarios. Supports comprehensive file operations, advanced text manipulation, file analysis, batch operations, archive support, and format conversion.
Technologies: Python, FastAPI, Model Context Protocol (MCP), ngrok, Security & Authentication, File I/O
Developed an open-source tool for detecting abusive Tamil and Malayalam content targeting women. Built as part of DravidianLangTech @ NAACL 2025, leveraging NLP and machine learning for accurate text classification.
Technologies: Python, Machine Learning, Transfer Learning, Attention Mechanisms
Developed Chrome extension integrating help chatbot into any website's bottom right corner. Backend processes involve web scraping, content embedding with OpenAI embeddings and Chroma DB, with real-time responses using GPT-3.5.
Technologies: JavaScript, HTML/CSS, Python, FastAPI, Web Scraping, OpenAI Embeddings, Chroma DB, Vector Databases, RAG
Developed user-friendly web portal and mobile application for driver's license exam preparation with comprehensive study materials, practice tests, and performance feedback.
Technologies: HTML, CSS, JavaScript, Bootstrap, React JS, Express JS, MongoDB, React Native, Git, CI/CD, Docker, AWS EC2, Redux.js
Developed web-based SCMS for online shopping with team of 5 members. Designed RDBMS for efficient data management and implemented triggers and functions for enhanced database performance.
Technologies: HTML, CSS, JavaScript, Bootstrap, PHP, MySQL
Developed console-based Email Client application using Java programming, design patterns, and OOP concepts. Features include contact management, email sending, automated birthday wishes, and email history.
Technologies: Java, Object-Oriented Programming, Design Patterns
Credential ID: 1tO0Ys3ITkGJkXM3sgBKrQ
Skills: Deep Learning, Machine Learning, PyTorch, Neural Networks
Skills: Large Language Models, Generative AI, Natural Language Processing
Credential ID: 60b2314b-e9de-4b14-af89-b301ec16a5ed
Skills: CNN, Deep Learning, Supervised/Unsupervised Learning, PyTorch, NLP
Comprehensive Program Including: Advanced React, React Basics, HTML and CSS in depth, Programming with JavaScript, UX Design Foundations, Version Control
Contributed as volunteer paper reviewer for the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages. Evaluated research submissions specializing in hate speech detection across text and speech data.
Email: luxshan2327@gmail.com
Mobile: +94-77-3038402
Address: Visuvamadu East, Visuvamadu, Mullaitivu, Northern Province, Sri Lanka
LinkedIn: linkedin.com/in/lux-thavarasa
GitHub: github.com/luxshan2000
Languages: Tamil (Native), English (Professional Working)
"Let's shape the future together! I'm eager to connect with like-minded individuals and work on projects that push the boundaries of technology."