Skip to content

feat: add workflow implementation and postgres store#122

Merged
ankaisen merged 6 commits intomainfrom
feat/workflow
Dec 22, 2025
Merged

feat: add workflow implementation and postgres store#122
ankaisen merged 6 commits intomainfrom
feat/workflow

Conversation

@ankaisen
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings December 19, 2025 06:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a workflow-based architecture and PostgreSQL storage support for the memory service. It restructures the codebase to use a pipeline system for orchestrating memorize and retrieve operations, consolidates embedding functionality into the LLM client, and adds PostgreSQL/pgvector storage backends with Alembic migrations.

Key changes:

  • Implements workflow/pipeline abstraction for memorize and retrieve operations
  • Consolidates LLM and embedding clients (removes separate embedding config)
  • Adds PostgreSQL storage with pgvector support and Alembic migrations
  • Introduces scope-based memory isolation and multi-profile LLM support

Reviewed changes

Copilot reviewed 27 out of 32 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/memu/workflow/step.py Defines workflow step abstraction and execution logic
src/memu/workflow/runner.py Implements workflow runner interface and local execution backend
src/memu/workflow/pipeline.py Provides pipeline management with versioning and validation
src/memu/workflow/init.py Exports workflow public API
src/memu/models.py Adds dynamic model building utilities for user/memory model composition
src/memu/llm/openai_sdk.py Adds embed method to OpenAI SDK client
src/memu/llm/http_client.py Consolidates embedding support into HTTP LLM client
src/memu/database/postgres/schema.py Defines SQLModel schemas with pgvector support
src/memu/database/postgres/postgres.py Implements PostgreSQL storage backend with vector search
src/memu/database/postgres/migrations/versions/0001_create_memory_tables.py Creates initial database tables with pgvector extension
src/memu/database/postgres/migrations/env.py Configures Alembic migration environment
src/memu/database/inmemory/repo.py Refactors in-memory store to support model customization
src/memu/database/factory.py Factory for initializing storage backends based on config
src/memu/app/settings.py Consolidates embedding into LLM config, adds storage/scope config
src/memu/app/service.py Restructures service to use workflows and scope-based isolation
src/memu/app/retrieve.py Extracts retrieve logic into mixin with workflow-based implementation
src/memu/app/memorize.py Extracts memorize logic into mixin with workflow-based implementation
pyproject.toml Adds PostgreSQL and SQLAlchemy dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 67 to 72
base_model_cls = build_memory_model(user_model, BaseModel)
resource_model = build_memory_model(user_model, Resource)
memory_item_model = build_memory_model(user_model, MemoryItem)
memory_category_model = build_memory_model(user_model, MemoryCategory)
category_item_model = build_memory_model(user_model, CategoryItem)
return base_model_cls, resource_model, memory_item_model, memory_category_model, category_item_model
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name 'base_model_cls' is ambiguous as it suggests a base class when it's actually a merged model that combines user_model with BaseModel. Consider renaming to 'merged_base_model' or 'combined_base_model' to clarify its purpose.

Suggested change
base_model_cls = build_memory_model(user_model, BaseModel)
resource_model = build_memory_model(user_model, Resource)
memory_item_model = build_memory_model(user_model, MemoryItem)
memory_category_model = build_memory_model(user_model, MemoryCategory)
category_item_model = build_memory_model(user_model, CategoryItem)
return base_model_cls, resource_model, memory_item_model, memory_category_model, category_item_model
merged_base_model = build_memory_model(user_model, BaseModel)
resource_model = build_memory_model(user_model, Resource)
memory_item_model = build_memory_model(user_model, MemoryItem)
memory_category_model = build_memory_model(user_model, MemoryCategory)
category_item_model = build_memory_model(user_model, CategoryItem)
return merged_base_model, resource_model, memory_item_model, memory_category_model, category_item_model
Copilot uses AI. Check for mistakes.
*,
dsn: str,
scope_key: str,
ddl_mode: str = "create", # kept for API compatibility
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ddl_mode parameter is marked as 'kept for API compatibility' but isn't actually used in the implementation. Consider either removing it or documenting why it's preserved if future use is planned.

Copilot uses AI. Check for mistakes.


def _embedding_type(context) -> sa.types.TypeEngine:
use_vector = context.config.attributes.get("use_vector", True)
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for 'use_vector' is True, but the PostgresStore initializer determines this from vector_provider=='pgvector'. This could cause inconsistency if the default doesn't match the actual configuration. Consider aligning the default with the actual logic or making it required.

Suggested change
use_vector = context.config.attributes.get("use_vector", True)
use_vector = bool(context.config.attributes.get("use_vector"))
Copilot uses AI. Check for mistakes.
Comment on lines +168 to +170
if isinstance(data, dict) and "default" not in data:
data = dict(data)
data["default"] = LLMConfig()
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model_validator mutates the input dict on line 169-170. This can cause unexpected side effects if the same dict is reused. Consider always creating a new dict: data = dict(data) if isinstance(data, dict) else {} followed by data.setdefault('default', LLMConfig()).

Suggested change
if isinstance(data, dict) and "default" not in data:
data = dict(data)
data["default"] = LLMConfig()
if isinstance(data, dict):
# Always work on a copy to avoid mutating the caller's dict.
new_data = dict(data)
new_data.setdefault("default", LLMConfig())
return new_data
Copilot uses AI. Check for mistakes.
) -> list[dict[str, Any]]:
"""Use LLM to rank memory items from relevant categories"""
if not category_ids:
print("[LLM Rank Items] No category_ids provided")
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for logging instead of the logger instance. This should use logger.debug() or logger.warning() for consistency with the rest of the codebase.

Suggested change
print("[LLM Rank Items] No category_ids provided")
logger.warning("[LLM Rank Items] No category_ids provided")
Copilot uses AI. Check for mistakes.
@ankaisen ankaisen changed the title feat: init workflow implementation and postgres store Dec 22, 2025
@ankaisen ankaisen merged commit a175811 into main Dec 22, 2025
2 checks passed
@ankaisen ankaisen deleted the feat/workflow branch December 22, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants