Skip to content

Commit 6a9a245

Browse files
dm4hydai
authored andcommitted
[Example] ggml: add llava-base64-stream example (second-state#107)
* [Example] ggml: add llava-base64-stream example Signed-off-by: dm4 <dm4@secondstate.io> * [CI] llama: merge m1 job into matrix, build llava-base64-stream Signed-off-by: dm4 <dm4@secondstate.io> --------- Signed-off-by: dm4 <dm4@secondstate.io>
1 parent 710283f commit 6a9a245

File tree

5 files changed

+244
-97
lines changed

5 files changed

+244
-97
lines changed

‎.github/workflows/llama.yml‎

Lines changed: 19 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -24,33 +24,37 @@ jobs:
2424
build:
2525
strategy:
2626
matrix:
27-
runner: [ubuntu-20.04, macos-13, macos-14]
27+
runner: [ubuntu-20.04, macos-13, macos-14, macos-m1]
2828
job:
2929
- name: "Tiny Llama"
3030
run: |
31+
test -f ~/.wasmedge/env && source ~/.wasmedge/env
3132
cd wasmedge-ggml/llama
3233
curl -LO https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF/resolve/main/tinyllama-1.1b-chat-v0.3.Q5_K_M.gguf
3334
cargo build --target wasm32-wasi --release
3435
time wasmedge --dir .:. \
36+
--env n_gpu_layers="$NGL" \
3537
--nn-preload default:GGML:AUTO:tinyllama-1.1b-chat-v0.3.Q5_K_M.gguf \
3638
target/wasm32-wasi/release/wasmedge-ggml-llama.wasm \
3739
default \
3840
$'<|im_start|>system\nYou are an AI assistant<|im_end|>\n<|im_start|>user\nWhere is the capital of Japan?<|im_end|>\n<|im_start|>assistant'
3941
4042
- name: Gemma 2B
4143
run: |
44+
test -f ~/.wasmedge/env && source ~/.wasmedge/env
4245
cd wasmedge-ggml/gemma
4346
curl -LO https://huggingface.co/second-state/Gemma-2b-it-GGUF/resolve/main/gemma-2b-it-Q5_K_M.gguf
4447
cargo build --target wasm32-wasi --release
4548
time wasmedge --dir .:. \
46-
--env n_gpu_layers=0 \
49+
--env n_gpu_layers="$NGL" \
4750
--nn-preload default:GGML:AUTO:gemma-2b-it-Q5_K_M.gguf \
4851
target/wasm32-wasi/release/wasmedge-ggml-gemma.wasm \
4952
default \
5053
'<start_of_turn>user Where is the capital of Japan? <end_of_turn><start_of_turn>model'
5154
5255
- name: Llava v1.5 7B
5356
run: |
57+
test -f ~/.wasmedge/env && source ~/.wasmedge/env
5458
cd wasmedge-ggml/llava
5559
curl -LO https://huggingface.co/mys/ggml_llava-v1.5-7b/resolve/main/ggml-model-q5_k.gguf
5660
curl -LO https://huggingface.co/mys/ggml_llava-v1.5-7b/resolve/main/mmproj-model-f16.gguf
@@ -59,14 +63,15 @@ jobs:
5963
time wasmedge --dir .:. \
6064
--env mmproj=mmproj-model-f16.gguf \
6165
--env image=monalisa.jpg \
62-
--env n_gpu_layers=0 \
66+
--env n_gpu_layers="$NGL" \
6367
--nn-preload default:GGML:AUTO:ggml-model-q5_k.gguf \
6468
target/wasm32-wasi/release/wasmedge-ggml-llava.wasm \
6569
default \
6670
$'You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe.\nUSER:<image>\nDo you know who drew this painting?\nASSISTANT:'
6771
6872
- name: Llava v1.6 7B
6973
run: |
74+
test -f ~/.wasmedge/env && source ~/.wasmedge/env
7075
cd wasmedge-ggml/llava
7176
curl -LO https://huggingface.co/cmp-nct/llava-1.6-gguf/resolve/main/vicuna-7b-q5_k.gguf
7277
curl -LO https://huggingface.co/cmp-nct/llava-1.6-gguf/resolve/main/mmproj-vicuna7b-f16.gguf
@@ -76,18 +81,20 @@ jobs:
7681
--env mmproj=mmproj-vicuna7b-f16.gguf \
7782
--env image=monalisa.jpg \
7883
--env ctx_size=4096 \
79-
--env n_gpu_layers=0 \
84+
--env n_gpu_layers="$NGL" \
8085
--nn-preload default:GGML:AUTO:vicuna-7b-q5_k.gguf \
8186
target/wasm32-wasi/release/wasmedge-ggml-llava.wasm \
8287
default \
8388
$'You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe.\nUSER:<image>\nDo you know who drew this painting?\nASSISTANT:'
8489
8590
- name: Llama2 7B
8691
run: |
92+
test -f ~/.wasmedge/env && source ~/.wasmedge/env
8793
cd wasmedge-ggml/llama
8894
curl -LO https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf
8995
cargo build --target wasm32-wasi --release
9096
time wasmedge --dir .:. \
97+
--env n_gpu_layers="$NGL" \
9198
--nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf \
9299
target/wasm32-wasi/release/wasmedge-ggml-llama.wasm \
93100
default \
@@ -108,102 +115,14 @@ jobs:
108115
cd wasmedge-ggml/embedding
109116
cargo build --target wasm32-wasi --release
110117
111-
name: ${{ matrix.runner }} - ${{ matrix.job.name }}
112-
runs-on: ${{ matrix.runner }}
113-
steps:
114-
- uses: actions/checkout@v4
115-
- uses: actions-rust-lang/setup-rust-toolchain@v1
116-
- name: Install Rust target for wasm
117-
run: |
118-
rustup target add wasm32-wasi
119-
120-
- name: Install WasmEdge + WASI-NN + GGML
121-
run: |
122-
VERSION=0.13.5
123-
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | sudo bash -s -- -v $VERSION --plugins wasi_nn-ggml -p /usr/local
124-
125-
- name: ${{ matrix.job.name }}
126-
run: ${{ matrix.job.run }}
127-
128-
m1:
129-
strategy:
130-
matrix:
131-
runner: [macos-m1]
132-
job:
133-
- name: "Tiny Llama"
134-
run: |
135-
source ~/.wasmedge/env
136-
cd wasmedge-ggml/llama
137-
curl -LO https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF/resolve/main/tinyllama-1.1b-chat-v0.3.Q5_K_M.gguf
138-
cargo build --target wasm32-wasi --release
139-
time wasmedge --dir .:. \
140-
--nn-preload default:GGML:AUTO:tinyllama-1.1b-chat-v0.3.Q5_K_M.gguf \
141-
--env n_gpu_layers=100 \
142-
target/wasm32-wasi/release/wasmedge-ggml-llama.wasm \
143-
default \
144-
$'<|im_start|>system\nYou are an AI assistant<|im_end|>\n<|im_start|>user\nWhere is the capital of Japan?<|im_end|>\n<|im_start|>assistant'
145-
146-
- name: Gemma 2B
147-
run: |
148-
source ~/.wasmedge/env
149-
cd wasmedge-ggml/gemma
150-
curl -LO https://huggingface.co/second-state/Gemma-2b-it-GGUF/resolve/main/gemma-2b-it-Q5_K_M.gguf
151-
cargo build --target wasm32-wasi --release
152-
time wasmedge --dir .:. \
153-
--env n_gpu_layers=100 \
154-
--nn-preload default:GGML:AUTO:gemma-2b-it-Q5_K_M.gguf \
155-
target/wasm32-wasi/release/wasmedge-ggml-gemma.wasm \
156-
default \
157-
'<start_of_turn>user Where is the capital of Japan? <end_of_turn><start_of_turn>model'
158-
159-
- name: Llava v1.5 7B
160-
run: |
161-
source ~/.wasmedge/env
162-
cd wasmedge-ggml/llava
163-
curl -LO https://huggingface.co/mys/ggml_llava-v1.5-7b/resolve/main/ggml-model-q5_k.gguf
164-
curl -LO https://huggingface.co/mys/ggml_llava-v1.5-7b/resolve/main/mmproj-model-f16.gguf
165-
curl -LO https://llava-vl.github.io/static/images/monalisa.jpg
166-
cargo build --target wasm32-wasi --release
167-
time wasmedge --dir .:. \
168-
--env mmproj=mmproj-model-f16.gguf \
169-
--env image=monalisa.jpg \
170-
--env ctx_size=2048 \
171-
--env n_gpu_layers=100 \
172-
--nn-preload default:GGML:AUTO:ggml-model-q5_k.gguf \
173-
target/wasm32-wasi/release/wasmedge-ggml-llava.wasm \
174-
default \
175-
$'You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe.\nUSER:<image>\nDo you know who drew this painting?\nASSISTANT:'
176-
177-
- name: Llava v1.6 7B
118+
- name: Build llava-base64-stream
178119
run: |
179-
source ~/.wasmedge/env
180-
cd wasmedge-ggml/llava
181-
curl -LO https://huggingface.co/cmp-nct/llava-1.6-gguf/resolve/main/vicuna-7b-q5_k.gguf
182-
curl -LO https://huggingface.co/cmp-nct/llava-1.6-gguf/resolve/main/mmproj-vicuna7b-f16.gguf
183-
curl -LO https://llava-vl.github.io/static/images/monalisa.jpg
120+
cd wasmedge-ggml/llava-base64-stream
184121
cargo build --target wasm32-wasi --release
185-
time wasmedge --dir .:. \
186-
--env mmproj=mmproj-vicuna7b-f16.gguf \
187-
--env image=monalisa.jpg \
188-
--env ctx_size=4096 \
189-
--env n_gpu_layers=100 \
190-
--nn-preload default:GGML:AUTO:vicuna-7b-q5_k.gguf \
191-
target/wasm32-wasi/release/wasmedge-ggml-llava.wasm \
192-
default \
193-
$'You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe.\nUSER:<image>\nDo you know who drew this painting?\nASSISTANT:'
194122
195-
- name: Llama2 7B
196-
run: |
197-
source ~/.wasmedge/env
198-
cd wasmedge-ggml/llama
199-
curl -LO https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf
200-
cargo build --target wasm32-wasi --release
201-
time wasmedge --dir .:. \
202-
--nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf \
203-
--env n_gpu_layers=100 \
204-
target/wasm32-wasi/release/wasmedge-ggml-llama.wasm \
205-
default \
206-
$'[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you do not know the answer to a question, please do not share false information.\n<</SYS>>\nWhat is the capital of Japan?[/INST]'
123+
include:
124+
- runner: macos-m1
125+
ngl: 100
207126

208127
name: ${{ matrix.runner }} - ${{ matrix.job.name }}
209128
runs-on: ${{ matrix.runner }}
@@ -219,5 +138,8 @@ jobs:
219138
VERSION=0.13.5
220139
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- -v $VERSION --plugins wasi_nn-ggml
221140
141+
- name: Set environment variable
142+
run: echo "NGL=${{ matrix.ngl || 0 }}" >> $GITHUB_ENV
143+
222144
- name: ${{ matrix.job.name }}
223145
run: ${{ matrix.job.run }}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[package]
2+
name = "wasmedge-ggml-llava-base64-stream"
3+
version = "0.1.0"
4+
edition = "2021"
5+
6+
[dependencies]
7+
serde_json = "1.0"
8+
wasi-nn = { git = "https://github.com/second-state/wasmedge-wasi-nn", branch = "ggml" }
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Llava Example For WASI-NN with GGML Backend
2+
3+
> [!NOTE]
4+
> Please refer to the [wasmedge-ggml/README.md](../README.md) for the general introduction and the setup of the WASI-NN plugin with GGML backend. This document will focus on the specific example of the Llava model.
5+
> Refer to the [wasmedge-ggml/llava/README.md](../llava/README.md) for downloading Llava models and execution commands.
6+
7+
This example is to demonstrate the usage of the Llava model inference with inline base64 encoded image. Here we hardcode the base64 encoded image in the source code.
8+
9+
## Execute
10+
11+
Execute the WASM with the `wasmedge` using the named model feature to preload a large model:
12+
13+
> [!NOTE]
14+
> You may see some warnings stating `key clip.vision.* not found in file.` when using llava v1.5 models. These are expected and can be ignored.
15+
16+
```console
17+
$ wasmedge --dir .:. \
18+
--env mmproj=mmproj-model-f16.gguf \
19+
--nn-preload default:GGML:AUTO:ggml-model-q5_k.gguf \
20+
wasmedge-ggml-llava-base64-stream.wasm default
21+
22+
USER:
23+
what is in this picture?
24+
ASSISTANT:
25+
The image showcases a bowl filled with an assortment of fresh berries, including several strawberries and blueberries. A person is standing close to the bowl, holding it in their hand or about to grab some fruit from it. The colorful fruit arrangement adds vibrancy to the scene.
26+
USER:
27+
please tell me a kind of fruit that is not in the picture
28+
ASSISTANT:
29+
There are no bananas in the picture.
30+
```

‎wasmedge-ggml/llava-base64-stream/src/main.rs‎

Lines changed: 187 additions & 0 deletions
Large diffs are not rendered by default.
Binary file not shown.

0 commit comments

Comments
 (0)