Skip to content
ACM-AI Documentation

Technology Stack

Technology decisions, dependency list, and rationale for ACM-AI

Technology Stack

Core Technologies

LayerTechnologyVersionPurpose
FrontendNext.js15 (App Router)React framework with /api proxy
FrontendReact19UI component library
FrontendTailwind CSS4Utility-first styling
FrontendRadix UILatestAccessible primitive components
FrontendAG Grid^31.xSpreadsheet component for ACM data
FrontendCopilotKitLatestAG-UI protocol chat client
FrontendZustandLatestClient state management
FrontendReact QueryLatestServer state and caching
FrontendRecharts^3Dashboard charts
BackendPython3.11+Runtime
BackendFastAPILatestREST API and SSE endpoints
BackendLangGraphLatestAI workflow orchestration
BackendLangChainLatestLLM abstraction
BackendPydanticv2Data models and validation
BackendMinerU (magic-pdf)LatestPrimary table extraction
BackendDoclingLatestFallback PDF text/layout extraction
Backendopenpyxl^3.1BAR-compliant Excel export
BackenduvLatestPython package management
DatabaseSurrealDBLatestMulti-model database with vector support
Jobssurreal-commandsLatestBackground job queue

Key Technology Decisions

Why SurrealDB?

SurrealDB was chosen as the primary datastore because it combines:

  • Relational queries via a SQL-like query language (for record filtering and grouping)
  • Graph relationships (source → acm_record → acm_table_section)
  • Native vector fields (for semantic embedding search without a separate vector database)
  • Real-time live queries (used for the agui_events SSE relay pattern)

Why AG Grid?

AlternativeReason Not Chosen
React Table (TanStack)Missing built-in row grouping and filtering UI
HandsontableLess performant with large datasets (1,000+ rows)
SheetJSSpreadsheet rendering engine, not a display component
Custom implementationFeature set (grouping, virtual scroll, cell renderers) too large

AG Grid provides virtual scrolling (handles 1,000+ records), enterprise-grade column filtering, built-in row grouping by Building and Room, and a customisable cell renderer API used for the risk badge component.

Why MinerU (magic-pdf)?

MinerU provides superior table extraction compared to Docling for the specific challenges in asbestos register PDFs:

  • Merged cellscolspan and rowspan are correctly parsed into expanded HTML tables
  • Multi-page table continuity — tables spanning multiple pages are automatically stitched
  • Bounding box tracking — coordinates are captured for cell-level PDF provenance

Docling is retained as a fallback for text-based PDFs where MinerU is not optimal.

Why LangGraph?

LangGraph provides a structured graph execution model for the 7-stage pipeline, with:

  • Conditional edges for corrective re-extraction loops (Stage 2.5)
  • Native async execution compatible with FastAPI
  • State management that maps directly to the PipelineRunState SSE event structure
  • First-class support for the AG-UI protocol via ag-ui-langgraph adapter

Why CopilotKit / AG-UI?

AG-UI (Agent-User Interaction) is an open protocol for agents to communicate state, tool calls, and reasoning to frontend UIs. CopilotKit implements this protocol client-side:

  • Handles SSE event parsing automatically
  • Provides useCoAgent hook for extraction progress (incremental record streaming)
  • Enables custom tool result renderers (ACM record tables, stats cards)
  • Works with any AG-UI-compatible backend — the same protocol is used for both chat and extraction

Why the Generic Configurable Parser?

The original design called for separate parser classes per consultant (Prensa, Greencap). This was replaced with a single GenericParser driven by FieldSchemaConfig (a declarative JSON schema derived from the official BAR Excel template).

Benefits:

  • Zero code changes for new consultant formats — only JSON configuration
  • AG Grid column definitions and BAR export column order are derived from the same config, eliminating drift
  • Enum validation uses the same config, ensuring consistency across extraction, UI, and export

Design System

VAEA (Victorian Asbestos Eradication Agency) brand tokens are implemented as CSS custom properties in OKLCH colour space, mapped through a 3-tier cascade:

Brand Layer  (--vaea-teal, --vaea-coral, --vaea-navy)

Semantic Layer  (--primary, --accent, --destructive, --background)

Component Layer  (shadcn/ui variants, AG Grid theme overrides, risk badge tokens)

Brand Colours:

  • Primary: VAEA Teal — oklch(0.52 0.09 185) (#0D7377)
  • Accent: Coral — oklch(0.65 0.14 15) (#EB787A)
  • Navy: oklch(0.27 0.04 260) (#1B2B4B)

Environment Variables

# SurrealDB connection
SURREAL_URL=ws://localhost:8000/rpc
SURREAL_USER=root
SURREAL_PASSWORD=root
SURREAL_NAMESPACE=open_notebook
SURREAL_DATABASE=development

# At least one AI provider required
OPENAI_API_KEY=sk-...
# Optional additional providers
ANTHROPIC_API_KEY=...
OPENROUTER_API_KEY=...  # Enables 6 additional frontier models

New Frontier Models (E17)

When OPENROUTER_API_KEY is set, the following models are auto-provisioned at startup:

ModelProviderNotable Capability
MiniMax M2.1MiniMaxLarge context, efficient
Kimi K2.5Moonshot AILong context
DeepSeek V3.2DeepSeekStrong coding and extraction
Claude Sonnet 4.6Anthropic via OpenRouterExtended thinking support
GPT 5.2OpenAI via OpenRouterGeneral purpose
Gemini 2.5 ProGoogle via OpenRouterMultimodal, large context

Models with extended thinking (DeepSeek R1, Claude) stream reasoning tokens visible in the "Agent Thinking" panel during extraction.