Chess.com
Built Chess.com's entire data infrastructure from the ground up — a petabyte-scale medallion architecture with real-time Kafka streaming, comprehensive dbt transformation layer enabling AI/ML use cases, and BigQuery analytics powering recommendation systems and player insights.
Designed and implemented a complete medallion architecture (bronze/silver/gold layers) processing petabytes of chess game data. Real-time streaming pipelines handle millions of concurrent games, while the transformation layer prepares data for AI/ML models that power Chess.com's recommendation engine and player analysis features.
Confluent Kafka infrastructure processing millions of events per second — live game moves, user interactions, and system telemetry. Custom producers and consumers optimized for Chess.com's unique scale, with exactly-once semantics and sub-100ms end-to-end latency for real-time features.
Petabyte-scale BigQuery warehouse serving as the analytics backbone for Chess.com's AI systems. Optimized partitioning and clustering strategies enable sub-second queries across billions of games. Powers recommendation engines, player matching algorithms, and content personalization for 200M+ users.
Comprehensive dbt transformation framework preparing data for machine learning models. 500+ models with automated testing, documentation, and data contracts. Feature engineering pipelines feed recommendation systems, player skill estimation models, and content personalization algorithms. Incremental processing ensures fresh features for real-time ML inference.
Workflow orchestration with custom operators for data processing. Dynamic DAG generation, robust error handling, and integration with monitoring systems for reliable batch processing.
Production feature store serving real-time and batch features to Chess.com's ML models. Powers player skill rating systems, content recommendation engines, cheat detection algorithms, and engagement prediction models. Integrated with Vertex AI for model training and deployment.