Duyet Le
Data & AI Engineer building scalable data platforms and AI-powered systems. 8+ years shipping production workloads across ClickHouse, Kubernetes, and cloud infrastructure. Currently exploring LLM agents and Rust.
Experience
Sr. Data EngineerOct 2023 – Present
- Migrated from legacy stack (Spark, Iceberg, Trino) to ClickHouse.
- Migrated 350TB+ Iceberg Data Lake to .
- Achieved 300% better data compression and 2x-100x faster queries with ClickHouse compared to Trino + Iceberg.
- Automated operations with Airflow: data replication, data processing, healthchecks, etc.
- Built AI Agents and AI Workflows on top of ClickHouse Data Lake and Documentation with LangGraph, LlamaIndex, Qdrant, Firecrawl, Cube.js, Next.js.
Sr. Data EngineerOct 2018 – Jul 2023
- Optimized monthly costs from $45,000 to $20,000 (GCP and AWS).
- Managed a team of 4 data engineers and 2 data analysts to provide end-to-end analytics solutions to stakeholders. Raised data-driven awareness throughout the organization and encouraged everyone to take a more data-driven approach to problem-solving.
- Designed next-gen Data Platform in Rust ↗︎
- Developed tools for Data Monitoring, Data Catalog, and Self-service Analytics for internal teams with .
FPT Software
Sr. Data EngineerJun 2017 – Oct 2018
- Built data pipelines processing 2TB/day with AWS for a Recommendation System
- Ingested and transformed 1TB+/day into Data Lake using Azure Cloud and Databricks
John von Neumann Institute
Data EngineerSep 2015 – Jun 2017
- Developed data pipelines, data cleaning and visualizations for ad-hoc problems.
- Trained and deployed ML models: customer lifetime value, churn prediction, sales optimization, recruitment optimization, etc.
Education
University of Information Technology
ThesisBachelor's degree, Information System
Technical Skills
Languages & Frameworks: Python, Rust, TypeScript, SQL, Spark
Data & AI: LlamaIndex, AI SDK, LangGraph, ClickHouse, Kafka, Airflow, BigQuery, AWS
DevOps: CI/CD, Kubernetes, Helm Charts, Cloudflare
Data & AI: LlamaIndex, AI SDK, LangGraph, ClickHouse, Kafka, Airflow, BigQuery, AWS
DevOps: CI/CD, Kubernetes, Helm Charts, Cloudflare