feat: add workflow implementation and postgres store#122
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a workflow-based architecture and PostgreSQL storage support for the memory service. It restructures the codebase to use a pipeline system for orchestrating memorize and retrieve operations, consolidates embedding functionality into the LLM client, and adds PostgreSQL/pgvector storage backends with Alembic migrations.
Key changes:
- Implements workflow/pipeline abstraction for memorize and retrieve operations
- Consolidates LLM and embedding clients (removes separate embedding config)
- Adds PostgreSQL storage with pgvector support and Alembic migrations
- Introduces scope-based memory isolation and multi-profile LLM support
Reviewed changes
Copilot reviewed 27 out of 32 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/memu/workflow/step.py | Defines workflow step abstraction and execution logic |
| src/memu/workflow/runner.py | Implements workflow runner interface and local execution backend |
| src/memu/workflow/pipeline.py | Provides pipeline management with versioning and validation |
| src/memu/workflow/init.py | Exports workflow public API |
| src/memu/models.py | Adds dynamic model building utilities for user/memory model composition |
| src/memu/llm/openai_sdk.py | Adds embed method to OpenAI SDK client |
| src/memu/llm/http_client.py | Consolidates embedding support into HTTP LLM client |
| src/memu/database/postgres/schema.py | Defines SQLModel schemas with pgvector support |
| src/memu/database/postgres/postgres.py | Implements PostgreSQL storage backend with vector search |
| src/memu/database/postgres/migrations/versions/0001_create_memory_tables.py | Creates initial database tables with pgvector extension |
| src/memu/database/postgres/migrations/env.py | Configures Alembic migration environment |
| src/memu/database/inmemory/repo.py | Refactors in-memory store to support model customization |
| src/memu/database/factory.py | Factory for initializing storage backends based on config |
| src/memu/app/settings.py | Consolidates embedding into LLM config, adds storage/scope config |
| src/memu/app/service.py | Restructures service to use workflows and scope-based isolation |
| src/memu/app/retrieve.py | Extracts retrieve logic into mixin with workflow-based implementation |
| src/memu/app/memorize.py | Extracts memorize logic into mixin with workflow-based implementation |
| pyproject.toml | Adds PostgreSQL and SQLAlchemy dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/memu/database/inmemory/models.py
Outdated
| base_model_cls = build_memory_model(user_model, BaseModel) | ||
| resource_model = build_memory_model(user_model, Resource) | ||
| memory_item_model = build_memory_model(user_model, MemoryItem) | ||
| memory_category_model = build_memory_model(user_model, MemoryCategory) | ||
| category_item_model = build_memory_model(user_model, CategoryItem) | ||
| return base_model_cls, resource_model, memory_item_model, memory_category_model, category_item_model |
There was a problem hiding this comment.
The variable name 'base_model_cls' is ambiguous as it suggests a base class when it's actually a merged model that combines user_model with BaseModel. Consider renaming to 'merged_base_model' or 'combined_base_model' to clarify its purpose.
| base_model_cls = build_memory_model(user_model, BaseModel) | |
| resource_model = build_memory_model(user_model, Resource) | |
| memory_item_model = build_memory_model(user_model, MemoryItem) | |
| memory_category_model = build_memory_model(user_model, MemoryCategory) | |
| category_item_model = build_memory_model(user_model, CategoryItem) | |
| return base_model_cls, resource_model, memory_item_model, memory_category_model, category_item_model | |
| merged_base_model = build_memory_model(user_model, BaseModel) | |
| resource_model = build_memory_model(user_model, Resource) | |
| memory_item_model = build_memory_model(user_model, MemoryItem) | |
| memory_category_model = build_memory_model(user_model, MemoryCategory) | |
| category_item_model = build_memory_model(user_model, CategoryItem) | |
| return merged_base_model, resource_model, memory_item_model, memory_category_model, category_item_model |
| *, | ||
| dsn: str, | ||
| scope_key: str, | ||
| ddl_mode: str = "create", # kept for API compatibility |
There was a problem hiding this comment.
The ddl_mode parameter is marked as 'kept for API compatibility' but isn't actually used in the implementation. Consider either removing it or documenting why it's preserved if future use is planned.
|
|
||
|
|
||
| def _embedding_type(context) -> sa.types.TypeEngine: | ||
| use_vector = context.config.attributes.get("use_vector", True) |
There was a problem hiding this comment.
The default value for 'use_vector' is True, but the PostgresStore initializer determines this from vector_provider=='pgvector'. This could cause inconsistency if the default doesn't match the actual configuration. Consider aligning the default with the actual logic or making it required.
| use_vector = context.config.attributes.get("use_vector", True) | |
| use_vector = bool(context.config.attributes.get("use_vector")) |
| if isinstance(data, dict) and "default" not in data: | ||
| data = dict(data) | ||
| data["default"] = LLMConfig() |
There was a problem hiding this comment.
The model_validator mutates the input dict on line 169-170. This can cause unexpected side effects if the same dict is reused. Consider always creating a new dict: data = dict(data) if isinstance(data, dict) else {} followed by data.setdefault('default', LLMConfig()).
| if isinstance(data, dict) and "default" not in data: | |
| data = dict(data) | |
| data["default"] = LLMConfig() | |
| if isinstance(data, dict): | |
| # Always work on a copy to avoid mutating the caller's dict. | |
| new_data = dict(data) | |
| new_data.setdefault("default", LLMConfig()) | |
| return new_data |
| ) -> list[dict[str, Any]]: | ||
| """Use LLM to rank memory items from relevant categories""" | ||
| if not category_ids: | ||
| print("[LLM Rank Items] No category_ids provided") |
There was a problem hiding this comment.
Using print() for logging instead of the logger instance. This should use logger.debug() or logger.warning() for consistency with the rest of the codebase.
| print("[LLM Rank Items] No category_ids provided") | |
| logger.warning("[LLM Rank Items] No category_ids provided") |
Co-authored-by: sairin1202 <952141617@qq.com>
No description provided.