Duyet Le

Data & AI Engineer building scalable data platforms and AI-powered systems. 8+ years shipping production workloads across ClickHouse, Kubernetes, and cloud infrastructure. Currently exploring LLM agents and Rust.

Experience

Sr. Data EngineerOct 2023 – Present
  • Migrated from legacy stack (Spark, Iceberg, Trino) to ClickHouse.
  • Migrated 350TB+ Iceberg Data Lake to ClickHouse on Kubernetes.
  • Achieved 300% better data compression and 2x-100x faster queries with ClickHouse compared to Trino + Iceberg.
  • Automated operations with Airflow: data replication, data processing, healthchecks, etc.
  • Built AI Agents and AI Workflows on top of ClickHouse Data Lake and Documentation with LangGraph, LlamaIndex, Qdrant, Firecrawl, Cube.js, Next.js.
Sr. Data EngineerOct 2018 – Jul 2023
  • Optimized monthly costs from $45,000 to $20,000 (GCP and AWS).
  • Managed a team of 4 data engineers and 2 data analysts to provide end-to-end analytics solutions to stakeholders. Raised data-driven awareness throughout the organization and encouraged everyone to take a more data-driven approach to problem-solving.
  • Designed next-gen Data Platform in Rust ↗︎
  • Developed tools for Data Monitoring, Data Catalog, and Self-service Analytics for internal teams with everything deployed on Kubernetes.

FPT SoftwareFPT Software

Sr. Data EngineerJun 2017 – Oct 2018
  • Built data pipelines processing 2TB/day with AWS for a Recommendation System
  • Ingested and transformed 1TB+/day into Data Lake using Azure Cloud and Databricks

John von Neumann InstituteJohn von Neumann Institute

Data EngineerSep 2015 – Jun 2017
  • Developed data pipelines, data cleaning and visualizations for ad-hoc problems.
  • Trained and deployed ML models: customer lifetime value, churn prediction, sales optimization, recruitment optimization, etc.

Education

University of Information Technology

Thesis
Bachelor's degree, Information System

Technical Skills

Languages & Frameworks: Python, Rust, TypeScript, SQL, Spark
Data & AI: LlamaIndex, AI SDK, LangGraph, ClickHouse, Kafka, Airflow, BigQuery, AWS
DevOps: CI/CD, Kubernetes, Helm Charts, Cloudflare