About

Learn more about me

Backend & Data Engineer

Backend engineer and data specialist with 8+ years of experience designing scalable systems, high-throughput data pipelines, and intelligent applications. Proficient in Java, Python, and Spring Boot, with hands-on expertise in Elasticsearch, ETL architecture, and machine learning. MSc in Computing (Data Analytics) from Dublin City University. Passionate about clean architecture, measurable performance improvements, and bridging the gap between engineering and data science.

Publications

A Comparison of Lexicon-Based and ML-Based Sentiment Analysis: Are There Outlier Words?

Authors: Siddhant Jaydeep Mahajani, Shashank Srivastava, Alan F. Smeaton.
Conference: 31st Irish Conference on Artificial Intelligence and Cognitive Science. December 7th-8th, 2023.
Cite as: https://doi.org/10.48550/arXiv.2311.06221

Things I do

Backend Engineering

  • Build scalable REST APIs, batch processing pipelines, and microservices using Java, Spring Boot, and Spring Batch
  • Design and optimise JPA/Hibernate queries and application performance — consistently achieving 40–70% throughput improvements
  • Containerise application services using Docker for consistent, repeatable deployments

Data Engineering & Search

  • Architect ETL pipelines integrating ERP and non-ERP data sources (SAP, Oracle EBS, QlikView, Workday)
  • Build Elasticsearch-backed analytics and full-text search solutions for real-time operational insights
  • Work with Hadoop, Hive, PostgreSQL, MongoDB, and DynamoDB across structured and unstructured data

Machine Learning & Research

  • Develop NLP pipelines for sentiment analysis and text classification using Python and Scikit-Learn
  • Build computer vision models and image classifiers with TensorFlow and OpenCV
  • Published research on lexicon vs. ML-based sentiment analysis at AICS 2023

Cloud & DevOps

  • Deploy and manage workloads on AWS (Lambda, DynamoDB, S3) and Google Cloud Platform
  • Automate CI/CD workflows with Jenkins and manage builds with Maven and Gradle
  • Implement encryption solutions (OpenSSL, PGP) and data security best practices

Technology Stack

Languages

Java • Python • JavaScript • R • Go • SQL

Backend & Frameworks

Spring Boot • Spring Batch • Spring MVC • JPA/Hibernate • Flask • REST APIs • Microservices • Quartz

Data & Search

Elasticsearch • Apache Hadoop • Hive • ETL Pipelines • Pandas • NumPy • Scipy • Tableau

Machine Learning

TensorFlow • Scikit-Learn • OpenCV • Matplotlib • Seaborn • NLTK

Databases

PostgreSQL • Oracle • MongoDB • AWS DynamoDB • Redis

Infrastructure & DevOps

Docker • Jenkins • Maven • Gradle • Git • SVN • Tomcat • JIRA

Cloud

AWS (Lambda, DynamoDB, S3) • Google Cloud Platform

Security & Protocols

OpenSSL • PGP Encryption • OAuth 2.0 • REST • AJAX

Achievements

Certifications

Badges

  • Google Data Analytics badge
  • Data Science Foundations - Level 1 badge
  • Data Science Tools badge
  • Data Science Methodologies badge
  • Python for Data Science badge

Resume

Check My Resume

Education

MSc in Computing (Data Analytics)

September 2022 - August 2023

Relevant Coursework: Machine Learning, Artificial Intelligence, Data Analytics & Data Mining, Cloud Computing

Dublin City University, Dublin, Ireland

Master of Computer Applications

June 2015 - June 2018

Relevant Coursework: Software Engineering, Advance Data Structures, Design & Analysis of Algorithms, Object Oriented Analysis & Design, Service Oriented Architecture, Big Data Analytics, Cloud Computing

Savitribai Phule Pune University, Pune, India

Bachelor of Computer Applications

June 2012 - May 2015

Relevant Coursework: Software Engineering, Data Structures, Object Oriented Programming, Introduction to System Programming & Operating Systems

Savitribai Phule Pune University, Pune, India

Work Experience

Software Developer

June 2023 - Present

Saadian Technologies, Dublin, Ireland

  • Engineered high-throughput Spring Batch jobs processing 100K+ records per run, reducing data ingestion time by 60%
  • Integrated Elasticsearch for full-text search and analytics dashboards, delivering real-time insights for case management workflows
  • Optimised Spring/Hibernate queries and data access patterns, improving dashboard load times from 25 seconds to under 10 seconds (70% improvement)
  • Built computer vision POC using OpenCV for automated data extraction, reducing manual data entry effort by 40%
  • Containerised application services with Docker, standardising deployment across environments and eliminating configuration drift

Teaching Assistant

October 2022 - May 2023

Dublin City University, Dublin, Ireland

  • Department: School of Electronic Engineering
  • Teaching Assistant for EE417B and EE417 (Web Application Development)

Software Development Engineer II

October 2018 - June 2022

Pathlock (formerly Greenlight Technologies), Pune, India

  • Designed and implemented ETL pipelines to distribute and process data from ERP and non-ERP sources (SAP, Oracle EBS, QlikView, Workday), improving distribution performance by 40% and enabling near real-time file processing
  • Integrated OpenSSL and PGP encryption/decryption for ERP file security, ensuring compliance with data protection requirements across client environments
  • Developed import/export functionality for cross-system data migration, reducing human intervention and errors by 30%
  • Re-engineered core application modules with code optimisation and performance enhancements, improving overall throughput by 20–30%
  • Collaborated with clients in requirement gathering and architecture design discussions for ETL pipeline features

Software Developer Intern

October 2017 - September 2018

Pathlock (formerly Greenlight Technologies), Pune, India

  • Investigated and resolved production stability issues, contributing to improved application reliability
  • Developed alternate solutions to optimise critical code paths and reduce response latency

Projects

Sentiment Analysis Pipeline

2023

  • NLP research pipeline comparing lexicon-based and ML-based sentiment analysis, identifying systematic outlier words across methods
  • Published at the 31st Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2023)
  • Stack: Python, Scikit-Learn, NLTK, Pandas, Matplotlib
  • arXiv preprint

E-Commerce Image Classifier

2023

  • CNN-based image classification model for automated product categorisation, achieving 94% accuracy on a multi-class dataset
  • Implemented data augmentation and transfer learning to improve generalisation on limited labelled data
  • Stack: Python, TensorFlow, Keras, OpenCV, NumPy

Recommendation Engine

2023

  • Collaborative filtering recommendation system using matrix factorisation for personalised item recommendations
  • Exposed predictions via a lightweight Flask REST API for integration with front-end clients
  • Stack: Python, Scikit-Learn, NumPy, Flask, Pandas

<> with ❤ © 2026