Skip to content
View dwsmith1983's full-sized avatar

Highlights

  • Pro

Organizations

@conda-forge @GrowingInTech

Block or report dwsmith1983

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dwsmith1983/README.md

Typing SVG

About Me

Engineering Director specializing in SRE, Data Engineering, and MLOps. I build reliable data platforms at scale, lead high-performing teams, and optimize cloud costs. Former Databricks Solutions Architect. Open source contributor.

  • Currently leading SRE for Data & Analytics at Techcombank
  • Based in Ha Noi, Viet Nam
  • LinkedIn
  • Website

Tech Stack

Languages Python Scala SQL

Data Engineering Apache Spark Databricks Delta Lake Kafka Airflow

Cloud & Infrastructure AWS GCP Docker Kubernetes

SRE & Observability Prometheus Grafana CloudWatch

MLOps MLflow TensorFlow TFX

Certifications

Databricks GCP

GitHub Stats

GitHub Stats Dark GitHub Stats Light Top Languages Dark Top Languages Light

Pinned Loading

  1. spark-bestfit spark-bestfit Public

    Efficiently fit ~90 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

    Python 1 2

  2. spark-pipeline-framework spark-pipeline-framework Public

    A configuration-driven framework for building Spark pipelines with HOCON config files and PureConfig.

    Scala 3

  3. pyspark-pipeline-framework pyspark-pipeline-framework Public

    Configuration-driven PySpark pipeline framework with HOCON configuration, 5 resilience patterns, lifecycle hooks, and streaming support.

    Python 3

  4. interlock interlock Public

    Interlock prevents pipelines from executing when preconditions aren't safe. It applies Leveson's Systems-Theoretic Accident Model to data engineering: pipelines have control structures with traits …

    Go 2