PillTrack: Edge MLOps Pipeline

Overview

PillTrack is an end-to-end MLOps pipeline designed for real-time pill identification on Edge devices.

Unlike traditional classification, this system leverages Deep Metric Learning to generate robust vector embeddings for pills, allowing for few-shot identification of new pill types without full retraining. To ensure low-latency inference on edge hardware, we utilize Knowledge Distillation to compress heavy teacher models (ResNet) into lightweight student models.

graph LR
    subgraph Development_and_DataOps ["1. Development & DataOps"]
        Dev["Developer<br/>(Git Flow)"]
        GitHub["GitHub Actions<br/>(CI/CD)"]
        DVC["DVC<br/>(Data Versioning)"]
        S3["AWS S3<br/>(Remote Storage)"]
    end

    subgraph Training_Pipeline ["2. Knowledge Distillation & Training"]
        direction TB
        Teacher["Teacher (ResNet)"]
        Student["Student (Lightweight)"]
        MLflow["MLflow (Tracking)"]
        Metric["Metric Learning"]
        
        Teacher -->|Distill| Student
        Student -->|Log| MLflow
        Metric -->|Embed| Student
    end

    subgraph Edge_Deployment ["3. Edge Inference"]
        Edge["Edge Device"]
        VectorDB[("Vector Search")]
    end

    %% Interactions
    GitHub -->|Trigger dvc repro| Teacher
    Dev -->|Push Code| GitHub
    Dev -->|Push Data| DVC
    DVC -.->|Store| S3
    Student -->|Deploy| Edge
    Edge <-->|Search| VectorDB

    %% Styling
    style Development_and_DataOps fill:#f9f9f9,stroke:#333
    style Training_Pipeline fill:#e1f5fe,stroke:#01579b
    style Edge_Deployment fill:#fff3e0,stroke:#e65100

MLOps Architecture

The pipeline follows a reproducible Data-centric AI approach using DVC for data versioning and Git for code versioning.

Tech Stack & Engineering Decisions

Data Version Control (DVC):

Decision: Decouples large datasets (.zip) from the codebase while maintaining version history aligned with Git commits. Ensures reproducibility of every experiment.

Deep Metric Learning:

Decision: Used instead of Softmax classification to handle the "open-set" problem (new pills appearing in the future) via Vector Search similarity.

Knowledge Distillation:

Decision: Compresses model size by transferring knowledge from a heavy Teacher network to a lightweight Student network, optimizing for Edge latency constraints.

GitHub Actions (CI/CD):

Decision: Automates the training pipeline (dvc repro) on Pull Requests to ensure model convergence before merging.

Getting Started

Environment Setup Manage dependencies using Conda to ensure cross-platform compatibility.

Create environment

conda env create -f environment.yaml

Activate environment

conda activate pilltrack-conda

(Optional) Update environment if yaml changes

Note: Uncomment setup in yaml if using macOS Apple Silicon

conda env update --file environment.yaml --prune

Configuration (Secrets) To run the pipeline locally or in CI/CD, ensure the following environment variables are set (e.g., in .env or GitHub Secrets):

export AWS_ACCESS_KEY_ID="your_key"

export AWS_SECRET_ACCESS_KEY="your_secret"

export AWS_REGION="ap-southeast-1"

export MLFLOW_TRACKING_URI="your_mlflow_server"

Development Workflow (Git Flow + DVC)

We follow a strict Feature Branch Workflow. Direct pushes to main are prohibited to maintain pipeline integrity.

Step 1: Start a New Feature

Always create a new branch for model experiments or bug fixes.

git checkout main
git pull origin main
git checkout -b feature/improved-resnet-backbone

Step 2: Reproduce Pipeline & Train

Run the DVC pipeline to execute stages (train, convert, enroll) defined in dvc.yaml.

Runs the pipeline locally, updates artifacts and dvc.lock

dvc repro

Step 3: Commit & Push Changes Case A: Code or Hyperparameters Changed ONLY

If you only modified .py files or params.yaml:

1. Check status (Ensure dvc.lock is modified)

dvc status

2. Push tracked artifacts to S3

dvc push

3. Git Commit & Push Code

git add .
git commit -m "feat: optimize distillation temperature"
git push -u origin feature/improved-resnet-backbone

Case B: Dataset Changed

If you updated the raw dataset (e.g., data/pills_dataset_resnet.zip):

1. Update DVC tracking

dvc add --force data/pills_dataset_resnet.zip

2. Push data to Remote Storage

dvc push

3. Commit the pointer file (.dvc) to Git

git add data/pills_dataset_resnet.zip.dvc .gitignore
git commit -m "chore: update dataset v2 with new pill types"
git push

CI/CD Pipeline

On every git push, the CI pipeline executes:

DVC Pull: Fetches data from AWS S3.

Reproduction: Runs dvc repro to validate the training pipeline.

Reporting: Pushes metrics to MLflow and comments results on the PR.

👨‍💻 Author

Sitta Boonkaew
AI Engineer Intern @ AI SmartTech

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.dvc		.dvc
.github/workflows		.github/workflows
data		data
mlruns		mlruns
models		models
notebooks		notebooks
src		src
.DS_Store		.DS_Store
.dvcignore		.dvcignore
.env_example		.env_example
.gitignore		.gitignore
DockerFile		DockerFile
LICENSE		LICENSE
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
dvc_metrics.json		dvc_metrics.json
environment.yaml		environment.yaml
metrics.json		metrics.json
params.yaml		params.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PillTrack: Edge MLOps Pipeline

Overview

MLOps Architecture

Create environment

Activate environment

(Optional) Update environment if yaml changes

Note: Uncomment setup in yaml if using macOS Apple Silicon

Runs the pipeline locally, updates artifacts and dvc.lock

1. Check status (Ensure dvc.lock is modified)

2. Push tracked artifacts to S3

3. Git Commit & Push Code

1. Update DVC tracking

2. Push data to Remote Storage

3. Commit the pointer file (.dvc) to Git

👨‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

sitta07/Pilltrack-Edge-mlops

Folders and files

Latest commit

History

Repository files navigation

PillTrack: Edge MLOps Pipeline

Overview

MLOps Architecture

Create environment

Activate environment

(Optional) Update environment if yaml changes

Note: Uncomment setup in yaml if using macOS Apple Silicon

Runs the pipeline locally, updates artifacts and dvc.lock

1. Check status (Ensure dvc.lock is modified)

2. Push tracked artifacts to S3

3. Git Commit & Push Code

1. Update DVC tracking

2. Push data to Remote Storage

3. Commit the pointer file (.dvc) to Git

👨‍💻 Author

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages