About Me
Hey, I'm Arup Chauhan. I'm a software engineer primarily working on distributed systems, search engines, and databases.
I completed my Master of Science in Computer Science from Illinois Tech, where I worked with the IIT-DB Group on optimizing database queries using machine learning techniques.
Recently I have been around database internals and search systems, especially query planning, execution behavior, and hybrid retrieval with Lucene and Solr. My focus has been on keeping relevance high and latency low at scale.
Earlier in my career, I worked extensively with Java, Spring Boot, Kafka, and AWS to build event-driven backend systems with a strong focus on reliability, maintainability, and observability.
I am also active in open-source, contributing to projects at Microsoft, Meta, Apache, and Spotify. Outside of coding, I follow footwear tech, read about culinary history, and enjoy cooking and music.
Experience
Software Engineer (Distributed DB Systems)
IIT-DB Group Research Lab, Illinois Tech | SciDB, PostgreSQL, Python, C++, TensorFlow
Chicago, ILMay 2024 - Nov 2025
Software Engineer (Distributed DB Systems)
IIT-DB Group Research Lab, Illinois Tech | SciDB, PostgreSQL, Python, C++, TensorFlow
Built a machine learning-based query optimizer using Deep Q-Networks and Support Vector Machines to predict optimal execution paths for database queries, significantly reducing latency in large-scale workloads powered by Apache Spark. Implemented parallel data ingestion with sharded I/O using Apache Arrow and Parquet formats, allowing each node to read and write its own shard independently without coordinator bottlenecks.
Deployed a multi-tenant control plane on AWS EKS with SLA-aware routing and per-tenant rate limiting, managing the complete ML model lifecycle with TensorFlow Serving. Set up comprehensive monitoring with Prometheus and Grafana to track latency distributions, throughput metrics, and resource utilization across the distributed environment.
Application Engineer
Hindustan Times through Four C Plus (Internet) Co. Ltd. | Java, Spring Boot, Kafka, ActiveMQ, Hibernate
New Delhi, IndiaJul 2018 - Apr 2022
Application Engineer
Hindustan Times through Four C Plus (Internet) Co. Ltd. | Java, Spring Boot, Kafka, ActiveMQ, Hibernate
Built an event-driven publishing platform using Spring Boot and Kafka that automated content scheduling and distribution for editorial workflows, with adaptive rate-limiting to handle backpressure during peak publishing periods. Designed concurrent data ingestion pipelines using Java ExecutorService and JMS/ActiveMQ, shifting from serialized processing to batch-oriented workflows with transaction monitoring for reliability.
Integrated social media publishing capabilities with OAuth 2.0 authentication, implementing automatic content fan-out with quota controls across multiple platforms. Maintained high availability with exactly-once delivery guarantees and dead-letter queue replay mechanisms, supporting thousands of daily transactions. Conducted chaos engineering testing using Chaos Monkey to validate system resilience and improve incident response through better observability.
Open Source
Microsoft C++ Standard Library (STL)
C++, LLVM, Concurrency, Memory Debugging
Microsoft
Microsoft C++ Standard Library (STL)
C++, LLVM, Concurrency, Memory Debugging
Working on the Windows C++ runtime by building test cases for thread-exit APIs and memory debugging with heap checks. Contributing fixes for thread synchronization edge cases in LLVM-based tests and improving the maintainability of STL versioning headers.
Meta Velox Query Execution Engine
C++, Meta Folly, Presto Integration
Meta (facebookincubator)
Meta Velox Query Execution Engine
C++, Meta Folly, Presto Integration
Extending Velox's type system to support time zone fields, ensuring compatibility with Presto's query semantics. Also simplifying code by replacing library-specific macros with standard C++ features to improve portability. Velox powers vectorized query processing in Presto and Apache Gluten.
Backstage Framework
PostgreSQL, TypeScript, React, Node.js, React Router, Monorepo (Yarn/NPM)
Spotify/Cloud Native Computing Foundation (CNCF)
Backstage Framework
PostgreSQL, TypeScript, React, Node.js, React Router, Monorepo (Yarn/NPM)
Adding multi-language search support to Backstage's PostgreSQL backend, including Chinese and other locales, to enable better service catalog discovery. Also improved catalog search with multi-attribute queries and fixed navigation issues in distributed deployments. Backstage is used as an internal developer portal at companies like Spotify and Netflix.
Apache Airflow
Python, Apache Airflow Provider, Snowflake Connector, Cloud IAM (AWS/Azure/GCP)
Apache Software Foundation
Apache Airflow
Python, Apache Airflow Provider, Snowflake Connector, Cloud IAM (AWS/Azure/GCP)
Adding support for Snowflake Workload Identity Federation to enable secure, credential-free authentication in Airflow workflows. Working with maintainers to roll this out across AWS, Azure, and GCP, establishing modern authentication patterns for the workflow orchestration platform used at Airbnb and Stripe.
Elastic UI Framework
TypeScript, React, Storybook, Web Content Accessibility Guidelines (WCAG)
Elastic
Elastic UI Framework
TypeScript, React, Storybook, Web Content Accessibility Guidelines (WCAG)
Refactoring accessibility checks in components like EuiBadge and EuiAvatar to centralize WCAG utilities and reduce code duplication. Working with maintainers to align with the new theming system, improving accessibility compliance in the component library used across Elastic's products.
Projects
Java (Spring Boot), Apache Solr/SolrCloud, PostgreSQL (pgvector), Redis, Kafka
A hybrid search system that combines Solr's traditional keyword search with vector similarity using pgvector. Built to improve relevance for natural language queries, with Redis caching for hot queries and full monitoring via Prometheus and Grafana. Deployed on Kubernetes for high availability.
Java (Spring Boot), Apache Lucene, Redis, PostgreSQL, gRPC, Docker, Kubernetes
A high-availability search engine built with Lucene that distributes queries across sharded pods and merges results at the coordinator level. Includes Redis-based caching for frequently accessed queries and PostgreSQL for metadata filtering, with comprehensive monitoring of search performance.
Java (Spring Boot), Kafka, Apache Flink, PostgreSQL, Redis, gRPC, Docker, Kubernetes
A streaming analytics pipeline that serves dashboard queries from continuously updated aggregates rather than scanning raw events. Uses Kafka for event ingestion, Flink for windowed aggregations, and Redis for caching hot queries, with real-time tracking of data freshness.
C++, Redis Streams, Apache Cassandra, WebSockets, JWT, Docker, Kubernetes
A notification delivery system built in C++ that isolates tenants using JWT authentication. Uses Redis Streams for message distribution, Cassandra for persistence, and WebSockets for real-time delivery. Includes per-tenant quotas, dead-letter handling, and comprehensive monitoring.
Golang, MySQL, Redis, Docker
A real-time notification service using Redis pub-sub for event propagation and MySQL for persistence. Built in Golang with a focus on low-latency delivery while maintaining consistency across distributed components.
C++, Qt Framework
An image preprocessing tool built with Qt that improves OCR accuracy by analyzing pixel density and identifying connected components. Helps prepare documents for text recognition workflows.
JavaScript (React), Node.js, Redis, Docker
A route optimization application using Dijkstra's Algorithm for pathfinding. Uses Redis for caching route computations to enable instant re-routing. Built with a React frontend and Node.js backend.
Education & Skills
Education
Master of Science - Computer Science
Illinois Institute of Technology | Chicago, IL
Bachelor of Technology - Computer Science & Engineering
Dr. A.P.J. Abdul Kalam Technical University | Lucknow, India
Skills
Languages & Databases
Platforms & Tools
Engineering Practices
Achievements
Dan Kohn Scholarship
The Linux Foundation
- Awarded for cloud-native and open-source contributions
- Sponsored participation at KubeCon
Graduate Pathway Scholarship
Illinois Institute of Technology
- Merit-based scholarship for academic excellence
- Recognized for leadership potential
Founding Member, ML Club @ IIT
Illinois Institute of Technology
- Founding member of the Graduate Executive Team
- Helped grow the Machine Learning Club to 400+ active members
Professional Memberships
ACM @ Illinois Tech, CodePath Alumni Association, Headstarter Fellowship
- Active member of academic and industry developer communities
