Skip to content
View liushz's full-sized avatar
  • Shanghai

Block or report liushz

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. open-compass/opencompass open-compass/opencompass Public

    OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

    Python 6.5k 716

  2. open-compass/MathBench open-compass/MathBench Public

    [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

    108 1

  3. open-compass/GPassK open-compass/GPassK Public

    [ACL 2025] Are Your LLMs Capable of Stable Reasoning?

    Python 32 2

  4. open-compass/CompassVerifier open-compass/CompassVerifier Public

    [EMNLP 2025] CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

    Jupyter Notebook 61 2