vision
Moondream vision service implementation.
This module provides integration with the Moondream vision-language model for image analysis and description generation.
- pipecat.services.moondream.vision.detect_device()[source]
Detect the appropriate device to run on.
Detects available hardware acceleration and selects the best device and data type for optimal performance.
- Returns:
- A tuple containing (device, dtype) where device is a torch.device
and dtype is the recommended torch data type for that device.
- Return type:
tuple
- class pipecat.services.moondream.vision.MoondreamSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>)[source]
Bases:
VisionSettingsSettings for the Moondream vision service.
- Parameters:
model – Moondream model identifier.
- class pipecat.services.moondream.vision.MoondreamService(*, model: str | None = None, revision='2025-01-09', use_cpu=False, settings: MoondreamSettings | None = None, **kwargs)[source]
Bases:
VisionServiceMoondream vision-language model service.
Provides image analysis and description generation using the Moondream vision-language model. Supports various hardware acceleration options including CUDA, MPS, and Intel XPU.
- Settings
alias of
MoondreamSettings
- __init__(*, model: str | None = None, revision='2025-01-09', use_cpu=False, settings: MoondreamSettings | None = None, **kwargs)[source]
Initialize the Moondream service.
- Parameters:
model –
Hugging Face model identifier for the Moondream model.
Deprecated since version 0.0.105: Use
settings=MoondreamService.Settings(model=...)instead.revision – Specific model revision to use.
use_cpu – Whether to force CPU usage instead of hardware acceleration.
settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.**kwargs – Additional arguments passed to the parent VisionService.
- async run_vision(frame: UserImageRawFrame) AsyncGenerator[Frame, None][source]
Analyze an image and generate a description.
- Parameters:
frame – The image frame to process.