narain@portfolio:~$

Software Engineer | ML & Cloud Developer

about_me

I am a Software Engineer focused on leveraging advanced AI technologies, cloud platforms and performance optimization techniques to build intelligent, scalable applications.

/* Tech Stack */

AWS Azure C/CPP CUDA Golang Javascript MySQL Python PyTorch Spark

achievements

Winner, AMD Synthetic Data Hackathon

  • Fine-tuned LLM reasoning via reinforcement learning for a question-answering agent.
Winner LLM Reinforcement Learning GPUs

experience

AI Engineer /* formerly MLOps & AI Development Engineer */

ASU Enterprise Technology
Jan 2024 – Present
  • Built a dynamic agentic DAG workflow engine that supports heterogeneous LLMs and tools, allowing on-the-fly composition of complex, multi-step workflows with adaptive routing and tool selection.
  • Built LLM infrastructure on ASU`s SOL Cluster using Kubernetes, enabling dynamic resource scaling for inference.
  • Trained Embedding Models for Domain specific documents using MLM Pretraining, with DeBERTa architecture.
  • Developed the backend for CreateAI, an LLM-powered chat assistant featuring multi-model support, rate monitoring, RAG, agentic evaluation to prevent hallucinations and semantic caching.
  • Engineered a indexing system for multimodal question answering, integrating text, video, image representations for data retrieval.
  • Implemented serverless inference of ML models on AWS Lambda using GGML, significantly improving scalability and flexibility by enabling deployment of custom models at scale and at reduced cost compared to established model providers.

Software Engineer

Fidelity Investments
Apr 2022 - Jul 2023
  • Developed and implemented robust Infrastructure and Pipelines for the deployment of Machine Learning Models on AWS.
  • Engineered Online Feature Stores to support real-time data and optimized batch inferencing to handle large-scale data.
  • Contributed to the development and deployment of diverse machine learning models, including Document Layout Models.
  • Led the deployment and performance optimization of Large Language Models, including BLOOM and Flan-T5, using Tensor and Model Parallelism to accelerate token generation and enhance model performance.
  • Collaborated in deploying scalable machine learning workflows, ensuring efficient model and data lifecycle management.

AI Researcher

QPiAI Technologies
May 2021 - Apr 2022
  • Implemented a Graph Convolutional Network (GCN) to train graph contrastive learning models for different circuit layouts, achieving high recall in identifying similarities between different circuit layouts.
  • Introduced a custom Ranking Algorithms for product recommendations supporting automated price estimation for sales.
  • Involved in developing Safety Monitoring Systems in warehouses using cameras to ensure compliance with social distancing.
  • Developed an AutoML platform for ML models, enabling automatic hyperparameter tuning and model selection.
  • Contributed to the hardware/software co-design process focusing on algorithmic optimizations to maximize inferencing throughput.

Data Analyst

Thoughtware Analytics
Jan 2020 - May 2021
  • Conducted Demand Forecasting for manufacturing and service-oriented businesses, focusing on Supply Chain Improvements.
  • Developed Computer Vision based for quality assurance in manufacturing processes, reducing material waste.
  • Conducted operational research to optimize logistics, utilizing advanced algorithms to improve route planning and resource allocation, reducing costs and delivery times.

education

MS Computer Engineering

Arizona State University
2023 - 2025

/* Algorithms, Digital Image Processing, Data Intensive Systems for Machine Learning */

B.Tech Mechanical Engineering

Amrita Vishwa Vidyapeetham
2016 - 2020

projects

Robotic Arm Maneuvering Using Deep Reinforcement Learning B.Tech Thesis Project at Amrita Vishwa Vidyapeetham.

  • Designed Unity3D simulation with PyTorch and built physical prototype with 3D-printed components.
  • Trained deep RL agents (DDPG, A2C, PPO) for continuous control tasks with >30 DoF, using neural policies to process raw sensor and image inputs. Successfully transferred simulation-trained policies to real-world hardware.
PyTorch Deep RL Unity3D Robotics

Whisper Training on Indian Languages

  • Enhanced Whisper v3 medium model for South Asian Language Speech Recognition. Adopted new tokenizer, incorporated audio augmentations, and expanded dataset with high-confidence transcriptions to significantly reduce word error rate.
CUDA Deep Learning NLP

[github]

Steganalysis Detection

  • CNN-based model to detect steganographic content in digital images. Modified ResNet architecture with Efficient Channel Attention (ECA) and reduced convolution stride for enhanced sensitivity to concealed data.
Python Deep learning Computer Vision

[github] [demo]

Indic LLama

  • Enhanced LLaMA model for multiple languages and domains. Integrated new tokenizer and trained on diverse translated instruction datasets and domain-specific dialogues.
WASM LLM Javascript Deep Learning

[demo] [github]

LLM Model Inference in Browser

  • Zero-installation LLM inference (Gemma, LLama, Mistral) directly in browser using WebAssembly SIMD.
Javascript WebAssembly C GPU

[demo]

Compiler System for Deep learning

  • Developed compiler system for mobile backends using TVM and CUDA. Designed SIMD kernels for ARM NEON and custom CUDA kernels for Jetson Orin, outperforming OpenCV and torchscript benchmarks.
TVM LLVM MLIR NEON CUDA