About Projects Prompt Lab Contact Work With Us →
Available for consulting, freelance & select full-time opportunities

Chuks
Forge

AI Systems Engineer & LLM Reliability Specialist

I build production-ready AI systems, from LLM pipelines and multi-agent architectures to scalable SaaS platforms and APIs, with a focus on reliability, structured reasoning, and performance under real-world conditions. I turn raw prompts into fully deployed products that deliver meaningful business impact.

My approach: building AI systems that are measurable, reliable and genuinely useful, not just impressive demos.

10+
AI Projects shipped
3
SaaS products live
5
LLM providers integrated
Scroll to explore
Python
FastAPI
React / Vite
Node.js
OpenAI API
Anthropic Claude API
Google Gemini API
OpenRouter
Mistral API
Supabase
Railway
Vercel
Tailwind CSS
LangGraph
LangChain
RAG Pipelines
Prompt Engineering
Multi-Agent Systems
Streamlit
Tavily
Pine Script
Python
FastAPI
React / Vite
Node.js
OpenAI API
Anthropic Claude
Google Gemini
OpenRouter
Mistral API
Supabase
Railway
Vercel
Tailwind CSS
LangGraph
LangChain
RAG Pipelines
Prompt Engineering
Multi-Agent Systems
Streamlit
Tavily
Pine Script

Engineer.
Trader.
Builder.

I'm Christian Chuks (ChuksForge) — an AI Systems Engineer focused on designing and deploying production-grade systems. I develop end-to-end LLM-powered applications, from evaluation pipelines to multi-tenant SaaS backends, with an emphasis on performance, scalability, and real-world usability.

My background in systematic trading, spanning price action, forex and crypto, shapes how I approach AI: prioritizing signal over noise, building for robustness under uncertainty, and relying on data-driven decision making.

Current work spans a multi-tenant commerce assistant, a trading-grade data analytics platform, and a suite of multi-agent developer tools designed for real CI/CD workflows. Production standard, not proof-of-concept.

LLM Application Development
🔬
Prompt Engineering & Optimization
🤖
Agentic Systems
📊
RAG & Knowledge Systems
🧠
LLM Evaluation & Benchmarking
⚙️
AI Workflow Orchestration
🏗️
Multi-tenant SaaS Architecture
🚀
AI SaaS Product Development
🔒
API Security & Proxying
📈
Data Analytics Systems

Products &
Projects

Click any project to expand architecture details, key contributions, and technical decisions.

01
Shipped Python Multi-Agent LangGraph
Code Review Agent — Noise-Reduced LLM Feedback for CI
Filters and prioritises automated code review findings to surface only actionable, high-signal issues.
+

Built a multi-agent code review system that addresses the core failure mode of LLM-based review tools: noisy, repetitive, low-priority findings that slow down rather than accelerate development. A critic loop re-evaluates initial LLM outputs against static analysis results, deduplicates findings, and produces a ranked, structured report suitable for direct CI/CD consumption.

  • Designed the Planner → Critic → Rewriter → Reconciler agent pipeline using LangGraph
  • Built a critic loop that re-evaluates LLM findings against deterministic static analysis output before surfacing results
  • Implemented confidence-scored deduplication and severity-based prioritisation of findings
  • Applied an open-core repo strategy: public tooling and schemas, private prompt architecture and calibration logic
  • Structured output schema designed for direct ingestion by CI pipeline tooling
Planner scopes review
LLM generates findings
Critic filters + static analysis
Rewriter reformats
Reconciler ranks + outputs
Reduced low-signal findings in test runs
CI-ready structured output
Python LangGraph LLM APIs Static Analysis Structured Outputs
02
Shipped Python LangGraph RAG
Research Synthesis Agent — Multi-Source Intelligence Pipeline
Autonomous agent that researches, retrieves, evaluates, and synthesises multi-source findings into structured reports.
+

Built a LangGraph-powered research agent that goes beyond standard RAG by combining live web search with vector-backed document retrieval, then running both through an evaluation layer before synthesis. The agent plans its own research strategy, identifies gaps in retrieved evidence, and iterates until it reaches a confidence threshold. Producing reports grounded in both real-time and stored knowledge.

  • Designed the full LangGraph agent graph: Research Planner → Web Search → RAG Retrieval → Evidence Evaluator → Synthesiser
  • Integrated Tavily search for live web retrieval alongside ChromaDB for persistent vector storage
  • Built an evidence evaluation node that scores retrieved content for relevance, recency, and credibility before passing to synthesis
  • Implemented a 25-question evaluation harness to benchmark output quality, citation accuracy, and hallucination rate
  • Shipped a Streamlit UI for interactive research sessions with source attribution display
Planner scopes query
Web search + RAG retrieval
Evidence evaluator scores sources
Gap check, iterate if needed
Synthesiser produces report
25-question eval harness
Live + stored knowledge retrieval
Self-directed gap detection
Citation-accurate outputs
Python LangGraph Tavily Search ChromaDB RAG Pipeline Streamlit
03
Shipped Python Multi-Agent Streamlit
Startup Analyst — Autonomous VC Research Pipeline
Six-agent system that produces structured investment memos from a single company name.
+

Built a fully autonomous VC research pipeline that takes a company name as input and produces a structured investment memo; covering market sizing, competitive landscape, founder signals, financial indicators, and risk factors. Each domain is handled by a specialised agent and a synthesis agent reconciles outputs into a final, human-readable report.

  • Designed a six-agent orchestration architecture with clearly scoped agent roles and handoff contracts
  • Built the Market Analyst, Competitive Intelligence, Founder Signal, Financial, Risk, and Synthesis agents independently
  • Implemented inter-agent context passing so each agent builds on prior findings rather than starting cold
  • Integrated live market data via yfinance for real-time financial grounding
  • Produced a Streamlit dashboard for interactive memo review and export
Input: company name
Specialist agents run in parallel
Synthesis agent reconciles
Structured investment memo
Full memo from single input
Parallel agent execution
Live financial data grounding
Python Anthropic SDK yfinance Multi-Agent Orchestration Streamlit
04
Shipped Python RAG NLP
LexisAI — Structured Research Synthesis Engine
Transforms multiple raw sources into claim-based, attributed, contradiction-aware insight reports.
+

Developed a research synthesis system that solves a specific problem with standard RAG pipelines: they retrieve and summarise, but don't reason across sources. LexisAI extracts discrete claims, tags each with source attribution and claim type (fact, inference, general knowledge), cross-compares for contradictions, and ranks outputs by evidential strength. Producing reports that are traceable, not just plausible.

  • Built a claim extraction pipeline that segments document content into discrete, attributable assertions
  • Implemented a three-class tagging system: sourced facts, model inferences, and general knowledge, each treated differently in output
  • Designed cross-document contradiction detection with conflict flagging and resolution suggestions
  • Evidence-strength ranking system that surfaces the most credible, corroborated claims first
  • Applied open-core repo strategy: public eval harness and schemas, private pipeline and prompt calibration
Ingest documents
Extract + classify claims
Cross-compare for conflicts
Rank by evidence strength
Attributed insight report
Claim-level source attribution
Contradiction detection across sources
Audit-ready structured reports
Python LLM APIs RAG Pipeline Claim Extraction / NLP ChromaDB Streamlit
05
Shipped Python Multi-Agent Streamlit
Multi-Agent Autonomous Investment Analyst
Simulated hedge fund with specialised agents that debate, stress-test, and produce investment decisions.
+

A six-agent system built with the raw Anthropic SDK that replicates a hedge fund research process. Agents specialise in market data, news sentiment, technical signals, and fundamental analysis. Opposing "bull" and "bear" agents stress-test each thesis before a final decision agent synthesises a clear action with confidence level and risk context.

  • Designed agent specialisation and orchestration architecture using raw Anthropic SDK (no LangChain)
  • Built adversarial bull/bear debate layer to stress-test investment theses
  • Integrated live market data via yfinance for real-time analysis
  • Produced a Streamlit dashboard for interactive analysis and output review
Research agents gather data
Bull / Bear debate
Decision agent synthesises
Action + confidence output
Python Anthropic SDK yfinance Streamlit
06
Live FastAPI React Multi-tenant
ARIA — AI Fashion Commerce Assistant
Embeddable AI shopping assistant for fashion merchants with multi-tenant SaaS architecture.
+

Production-ready SaaS platform with multi-tenant architecture, secure API proxying, embeddable web widget for fashion merchants, and a demo experience for prospects. Deployed on modern cloud infrastructure.

  • Designed multi-tenant architecture.
  • Built aria-widget.js — a drop-in embeddable widget deployable on any merchant site
  • Implemented secure API proxying.
  • Shipped a live demo page for cold outreach and prospect qualification
Merchant embeds widget
Edge proxy validates client
FastAPI routes to LLM
Response streamed to UI
Python FastAPI React / Vite Vercel Railway Supabase
07
Planned Next.js 14 Turborepo Multi-tenant
AI Business Automation Suite
Multi-tenant SaaS platform hosting independently deployable AI workflow and productivity applications.
+

A production-grade SaaS monorepo that packages multiple AI applications, including an email assistant, RAG chatbot, workflow builder, and multi-agent operator, under a single shared platform layer. Each app is independently deployable but shares auth, billing, job queuing, and observability infrastructure. Designed to demonstrate enterprise-level AI product architecture at scale.

  • Architected a Turborepo monorepo with shared packages for auth, UI components, and API clients
  • Designed per-tenant data isolation and access control using Clerk + Prisma row-level scoping
  • Implemented BullMQ job queue for async AI task processing with retry and failure handling
  • Integrated Langfuse for LLM observability: tracing, cost tracking, and output evaluation across all apps
  • Built the workflow builder as a node-based editor for composing multi-step AI pipelines
Tenant authenticates (Clerk)
App routes to isolated context
BullMQ dispatches AI job
LLM processes + Langfuse traces
Result delivered to tenant
Next.js Turborepo Clerk Prisma / PostgreSQL pgvector BullMQ / Redis Langfuse Vercel
08
Shipped Python Gradio
PivotPro — Career Pivot Coach
Viability assessment and action planning for professionals changing careers.
+

An LLM-powered career coaching agent that evaluates career pivot viability, maps skills gaps, and generates a structured action plan. Designed to replace shallow advice with grounded, role-specific guidance across any industry combination.

  • Built a multi-step evaluation chain: feasibility → gap analysis → action plan generation
  • Designed prompt architecture for role-specific skill mapping across industries
  • Implemented structured PDF export for shareable coaching outputs
Python LLM APIs Gradio PDF Export
09
Shipped Python Gradio
PriorityPilot — Personal Productivity Agent
Intelligent task orchestration with dynamic prioritisation frameworks.
+

A task management agent that applies dynamic prioritisation logic. Combining urgency, impact, and effort scoring, to surface what actually deserves attention. Demonstrates structured tool use and output generation in a practical, non-trivial domain.

  • Designed multi-dimensional scoring logic (urgency × impact × effort) for task ranking
  • Implemented tool-use architecture for context gathering and structured scheduling
  • Built structured output pipeline with actionable daily planning summaries
Python LLM APIs Gradio Tool Use
10
Planned Next.js Turborepo
AI Tools Suite
Modular multi-app SaaS platform with shared infrastructure and independently deployable AI tools.
+

A composable SaaS platform built on a Turborepo monorepo housing a content generation engine and a career development toolkit. Designed with shared auth, billing, and API infrastructure so each tool is independently deployable without duplicating platform logic.

  • Monorepo architecture with Turborepo enabling shared packages and independent app deployments
  • Shared auth and billing layer across all tools (Clerk + Stripe)
  • Content generation engine with multi-format output support
  • Career toolkit integrating PivotPro and PriorityPilot as sub-applications
Next.js 14 Turborepo Clerk Prisma BullMQ Vercel
11
In Progress React Supabase
Trading Journal & Analytics Dashboard
Full-stack platform for tracking, analysing, and improving trading performance.
+

A professional trading analytics platform with CSV import, exchange API sync, live candlestick charting with entry/exit markers, R-multiple tracking, and multi-pair equity curves. Designed for systematic traders who need data, not opinions.

  • CSV import pipeline for Binance and Bybit trade history
  • Live candlestick charting with trade entry/exit overlays via lightweight-charts
  • R-multiple tracking and tiered risk scaling analysis
  • Supabase Edge Functions for exchange API sync
React Supabase lightweight-charts Tailwind CSS Vercel
12
Shipped RAG Gradio Capstone
AI Document Intelligence System
End-to-end document processing with RAG-based QA, hallucination detection, and LLM benchmarking.
+

Capstone project from the prompt-engineering-lab tying together document ingestion, retrieval-augmented QA, hallucination mitigation strategies, and a multi-model benchmarking dashboard. Demonstrates the full stack of applied prompt engineering from raw document to trusted output.

  • Document ingestion pipeline with chunking, embedding, and vector retrieval
  • Hallucination detection with confidence calibration and mitigation techniques
  • Multi-model benchmarking suite with Streamlit results dashboard
  • Gradio interface for live document QA demonstration
Ingest & chunk docs
Embed & index
RAG retrieval + LLM
Hallucination check
Verified answer
Python RAG Pipeline Vector DB Gradio Streamlit Multi-LLM Eval

9 experiments.
1 framework.

A structured research portfolio covering the full spectrum of applied prompt engineering — from evaluation benchmarks to hallucination mitigation.

P-01
Summarization Benchmarks
Systematic evaluation of summarization quality across multiple LLMs with scoring rubrics.
P-02
Style Transfer
Controlled experiments in tone, formality, and persona-driven style adaptation.
P-03
Instruction Following
Edge cases, constraint adherence, and format compliance across prompt architectures.
P-04
promptlab Python Library
Reusable prompt templating, chaining, and evaluation utilities packaged as a Python library.
P-05
RAG-based QA
Retrieval-Augmented Generation pipeline with document chunking, embedding, and citation.
P-06
Email Summarization + Gradio Demo
Production-ready email summarizer with a live Gradio interface for demonstration.
P-07
LLM Benchmarking Dashboard
Multi-model benchmarking suite with a Streamlit dashboard and aggregate results analysis.
P-08
Hallucination Detection & Mitigation
Detection strategies, confidence calibration, and mitigation techniques for LLM outputs.
P-09
AI Document Intelligence — Capstone
Full-stack document intelligence system tying together RAG, evaluation, and agentic routing.
View Full Lab on GitHub →

Building
in public.

2024 — Present
Founder and Principal Engineer — ChuksForge AI Solutions Ltd.

Founded and lead a registered AI engineering consultancy delivering production-grade LLM systems, multi-agent architectures, and AI SaaS platforms for startups and growth-stage companies. Operating at the intersection of applied AI research and scalable product engineering, with a focus on reliability, scalability, and real-world deployment constraints.

  • Architect and deploy multi-tenant LLM systems with secure user isolation, retrieval pipelines, and evaluation frameworks, improving system robustness and response consistency
  • Design agentic infrastructures, embeddable AI components, and production-ready SaaS solutions for diverse product integrations
  • Build and launch a portfolio of scalable AI tools under the ChuksForge brand, translating advanced AI capabilities into commercially viable products
  • Deliver end-to-end AI systems engineering, from research-driven architecture design to deployment, optimization, and long-term operational resilience
Ongoing
Systematic Trader (Independent) — Forex & Crypto

Apply discretionary and rule-based strategies in high-noise, probabilistic environments across forex and crypto markets.

  • Design and test structured trading systems under uncertainty, optimizing for risk-adjusted outcomes
  • Build custom Pine Script strategies and AI-assisted market analysis workflows
  • Transfer trading principles (signal vs noise, probabilistic thinking, risk control) into AI system design
2025
Prompt Engineering Research — 9-Project Lab

Executed a structured research initiative exploring LLM behavior, reliability, and system integration patterns.

  • Built a multi-project prompt engineering lab covering summarization, style transfer, RAG, and hallucination mitigation
  • Developed reusable evaluation patterns, prompt frameworks, and shared utilities
  • Emphasized reproducibility, benchmarking, and production-oriented integration

↳ Let's work together

Ready to
build something?

I'm open to freelance projects, consulting engagements, and full-time AI system engineering roles. Let's talk.

Get In Touch →