Add neural speed example #135

grorge123 · 2024-04-25T06:13:56Z

No description provided.

juntao · 2024-04-25T06:13:59Z

Hello, I am a code review bot on flows.network. Here are my reviews of code commits in this PR.

Overall Summary:
The GitHub Pull Request titled "Add neural speed example" introduces new functionality efficiently by adding new files, updating dependencies, implementing logic for using the Neural Speed plugin, and updating relevant documentation. However, there are several potential issues and areas for improvement identified in the individual summaries:

Potential Issues and Errors:

The use of hardcoded paths in installation instructions may cause errors on different systems.
Lack of error handling in the Rust code may lead to unexpected behavior.
The Rust code lacks comments and clear variable names for readability and maintainability.
Lack of unit tests to ensure functionality and maintain code quality.
Replacing 'context.fini_single()' with 'graph.unload()' without a clear explanation or documentation of the rationale.
Incomplete comment at the end of the patch file without a meaningful explanation.

Most Important Findings:

Addressing the potential issues with hardcoded paths, error handling, documentation, and testing should be prioritized to improve the overall quality and reliability of the patch.
Properly documenting changes and ensuring alignment with project design and functionality, especially when replacing methods like 'context.fini_single()' with 'graph.unload,' is crucial for clarity and maintainability.
Verifying that backend names are consistently updated throughout the project and confirming the accessibility of the specified model for users are important for accuracy and user experience.
Enhancing the overall code quality with better practices like commenting, clear variable naming, and inclusion of unit tests will contribute to the maintainability and robustness of the project.

Details

Commit 612c2c396653f0911f3ded717016627f41a9b51a

Key Changes:

Added new files: .gitignore, Cargo.toml, README.md, and src/main.rs in the wasmedge-neuralspeed directory.
Updated Cargo.toml with package information and dependencies.
Added a README with instructions on installing WasmEdge with WASI-NN Neural Speed plugin and downloading the model.
Added a Rust source file main.rs implementing logic for using the Neural Speed plugin to perform an inference task with a Neural chat model.

Potential Problems:

The use of hardcoded paths like /usr/local/bin and /usr/local/lib in the installation instructions may not be suitable for all systems and could lead to errors on different setups.
Lack of error handling in the Rust code could result in unexpected behavior if any operations fail.
The Rust code could benefit from more comments and clarity in variable names to improve readability and maintainability.
The patch could include unit tests to ensure the functionality works as expected and to maintain code quality.

Overall, the patch introduces new functionality efficiently but could be improved in terms of error handling and documentation.

Commit 14cb1941e1cde1feecdd7b70f5bd4cca6503e125

Key Changes:

The change in this patch is replacing the 'context.fini_single()' method call with 'graph.unload()' method call in the 'main' function of the 'wasmedge-neuralspeed' Rust project.

Potential Problems:

The method call 'context.fini_single()' has been replaced with 'graph.unload()', but there is no direct explanation provided as to why this change was made. It's important to ensure that this change doesn't introduce any new issues or break existing functionality.
It seems like the comment at the end of the patch file is incomplete (line: graph.unload().expect("Failed to free resource");). It is advisable to provide a meaningful comment explaining the rationale behind this change.

Overall, it's important to review the impact of replacing 'context.fini_single()' with 'graph.unload()' to ensure that it aligns with the project's design and functionality. The changes should also be properly documented for better understanding by other contributors.

Commit 75d677266bbf80900b0038f65de3261908334cca

Key Changes:

The patch updates the backend name in the README file from "OpenVINO" to "Neural Speed."
The model download link specifies the neural-chat-7b-v3-1.Q4_0 model in GGUF format.

Potential Problems:

The patch seems straightforward with no functional issues identified.
It's essential to ensure that the backend name is accurately updated throughout the project to avoid confusion.
Confirm that the neural-chat-7b-v3-1.Q4_0 model is appropriately referenced and accessible for users.
Make sure the changes align with the project's goals and standards.

juntao · 2024-04-26T09:09:35Z

Would love to see a performance comparison of the same model on llama.cpp on Intel CPUs.

grorge123 · 2024-05-01T13:50:18Z

I run a simple test on i7-12700K.
model: llama-2-7b-chat.Q4_0
Number of input tokens: 12

llama.cpp:
Number of output tokens: 170
load time = 2706.11 ms
sample time = 3.59 ms / 170 runs ( 0.02 ms per token, 47393.36 tokens per second)
prompt eval time = 478.58 ms / 12 tokens ( 39.88 ms per token, 25.07 tokens per second)
eval time = 34458.22 ms / 169 runs ( 203.89 ms per token, 4.90 tokens per second)
total time = 37193.88 ms / 181 tokens
Neural Speed:
Number of Output tokens: 409
Load time: 5290 ms
Compute time: 157988 ms
Total time: 163278 ms

juntao · 2024-05-01T14:22:19Z

Does that mean it is actually slower than llama.cpp?

grorge123 · 2024-05-01T15:16:45Z

Yes, the current result shows neural speed is slower than llama.cpp. In addition, directly run neural speed also can not achieve llama.cpp runtime.

hydai · 2024-05-03T05:38:41Z

@grorge123
Could you check how many CPU cores are used in both benchmarks? According to the document, the neural speed should have a better performance on the Intel CPU than other runtimes.

grorge123 · 2024-05-03T14:00:18Z

Neural speed uses all the CPU cores(20).

I update the neural speed vision to 1.0.
The new result is
Load time: 4989ms
Compute time: 96263ms

I test on another computer i7-10700. The other variables are the same.

llama.cpp:
Number of output tokens: 512
load time = 1094.42 ms
sample time = 13.92 ms / 512 runs ( 0.03 ms per token, 36773.68 tokens per second)
prompt eval time = 675.74 ms / 12 tokens ( 56.31 ms per token, 17.76 tokens per second)
eval time = 77484.50 ms / 511 runs ( 151.63 ms per token, 6.59 tokens per second)
total time = 78824.58 ms / 523 tokens
neural speed
Number of Output tokens: 409
Load time: 7923ms
Compute time: 59378ms

In this case neural speed has a better performance than llama.cpp. But I have no idea why i7-10700 has better performance than i7-12700K.

wasmedge-neuralspeed/README.md

feat: add neural speed example

612c2c3

grorge123 mentioned this pull request Apr 25, 2024

[WASI-NN] neural_speed: add backend structure WasmEdge/WasmEdge#3303

Merged

feat: change finiSingle to unload

14cb194

dm4 self-requested a review May 9, 2024 08:16

dm4 requested changes May 9, 2024

View reviewed changes

wasmedge-neuralspeed/README.md Outdated Show resolved Hide resolved

fix: backend name

75d6772

dm4 approved these changes Jun 12, 2024

View reviewed changes

dm4 merged commit da18b35 into second-state:master Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add neural speed example #135

Add neural speed example #135

Uh oh!

grorge123 commented Apr 25, 2024

juntao commented Apr 25, 2024 •

edited

Loading

juntao commented Apr 26, 2024

grorge123 commented May 1, 2024

juntao commented May 1, 2024

grorge123 commented May 1, 2024

hydai commented May 3, 2024

grorge123 commented May 3, 2024 •

edited

Loading

Uh oh!

Labels

4 participants

Add neural speed example #135

Add neural speed example #135

Uh oh!

Conversation

grorge123 commented Apr 25, 2024

juntao commented Apr 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Commit 612c2c396653f0911f3ded717016627f41a9b51a

Commit 14cb1941e1cde1feecdd7b70f5bd4cca6503e125

Commit 75d677266bbf80900b0038f65de3261908334cca

juntao commented Apr 26, 2024

grorge123 commented May 1, 2024

juntao commented May 1, 2024

grorge123 commented May 1, 2024

hydai commented May 3, 2024

grorge123 commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Labels

4 participants

juntao commented Apr 25, 2024 •

edited

Loading

grorge123 commented May 3, 2024 •

edited

Loading