ChuksForge — AI Systems Engineer & LLM Architect

01

Shipped Python Multi-Agent LangGraph

Code Review Agent — Noise-Reduced LLM Feedback for CI

Filters and prioritises automated code review findings to surface only actionable, high-signal issues.

+

Overview

Built a multi-agent code review system that addresses the core failure mode of LLM-based review tools: noisy, repetitive, low-priority findings that slow down rather than accelerate development. A critic loop re-evaluates initial LLM outputs against static analysis results, deduplicates findings, and produces a ranked, structured report suitable for direct CI/CD consumption.

Key Contributions

Designed the Planner → Critic → Rewriter → Reconciler agent pipeline using LangGraph
Built a critic loop that re-evaluates LLM findings against deterministic static analysis output before surfacing results
Implemented confidence-scored deduplication and severity-based prioritisation of findings
Applied an open-core repo strategy: public tooling and schemas, private prompt architecture and calibration logic
Structured output schema designed for direct ingestion by CI pipeline tooling

How It Works

Planner scopes review

→

LLM generates findings

→

Critic filters + static analysis

→

Rewriter reformats

→

Reconciler ranks + outputs

Impact

Reduced low-signal findings in test runs

CI-ready structured output

Tech Stack

Python LangGraph LLM APIs Static Analysis Structured Outputs

GitHub

02

Shipped Python LangGraph RAG

Research Synthesis Agent — Multi-Source Intelligence Pipeline

Autonomous agent that researches, retrieves, evaluates, and synthesises multi-source findings into structured reports.

+

Overview

Built a LangGraph-powered research agent that goes beyond standard RAG by combining live web search with vector-backed document retrieval, then running both through an evaluation layer before synthesis. The agent plans its own research strategy, identifies gaps in retrieved evidence, and iterates until it reaches a confidence threshold. Producing reports grounded in both real-time and stored knowledge.

Key Contributions

Designed the full LangGraph agent graph: Research Planner → Web Search → RAG Retrieval → Evidence Evaluator → Synthesiser
Integrated Tavily search for live web retrieval alongside ChromaDB for persistent vector storage
Built an evidence evaluation node that scores retrieved content for relevance, recency, and credibility before passing to synthesis
Implemented a 25-question evaluation harness to benchmark output quality, citation accuracy, and hallucination rate
Shipped a Streamlit UI for interactive research sessions with source attribution display

How It Works

Planner scopes query

→

Web search + RAG retrieval

→

Evidence evaluator scores sources

→

Gap check, iterate if needed

→

Synthesiser produces report

Impact

25-question eval harness

Live + stored knowledge retrieval

Self-directed gap detection

Citation-accurate outputs

Tech Stack

Python LangGraph Tavily Search ChromaDB RAG Pipeline Streamlit

GitHub

03

Shipped Python Multi-Agent Streamlit

Startup Analyst — Autonomous VC Research Pipeline

Six-agent system that produces structured investment memos from a single company name.

+

Overview

Built a fully autonomous VC research pipeline that takes a company name as input and produces a structured investment memo; covering market sizing, competitive landscape, founder signals, financial indicators, and risk factors. Each domain is handled by a specialised agent and a synthesis agent reconciles outputs into a final, human-readable report.

Key Contributions

Designed a six-agent orchestration architecture with clearly scoped agent roles and handoff contracts
Built the Market Analyst, Competitive Intelligence, Founder Signal, Financial, Risk, and Synthesis agents independently
Implemented inter-agent context passing so each agent builds on prior findings rather than starting cold
Integrated live market data via yfinance for real-time financial grounding
Produced a Streamlit dashboard for interactive memo review and export

How It Works

Input: company name

→

Specialist agents run in parallel

→

Synthesis agent reconciles

→

Structured investment memo

Impact

Full memo from single input

Parallel agent execution

Live financial data grounding

Tech Stack

Python Anthropic SDK yfinance Multi-Agent Orchestration Streamlit

GitHub

04

Shipped Python RAG NLP

LexisAI — Structured Research Synthesis Engine

Transforms multiple raw sources into claim-based, attributed, contradiction-aware insight reports.

+

Overview

Developed a research synthesis system that solves a specific problem with standard RAG pipelines: they retrieve and summarise, but don't reason across sources. LexisAI extracts discrete claims, tags each with source attribution and claim type (fact, inference, general knowledge), cross-compares for contradictions, and ranks outputs by evidential strength. Producing reports that are traceable, not just plausible.

Key Contributions

Built a claim extraction pipeline that segments document content into discrete, attributable assertions
Implemented a three-class tagging system: sourced facts, model inferences, and general knowledge, each treated differently in output
Designed cross-document contradiction detection with conflict flagging and resolution suggestions
Evidence-strength ranking system that surfaces the most credible, corroborated claims first
Applied open-core repo strategy: public eval harness and schemas, private pipeline and prompt calibration

How It Works

Ingest documents

→

Extract + classify claims

→

Cross-compare for conflicts

→

Rank by evidence strength

→

Attributed insight report

Impact

Claim-level source attribution

Contradiction detection across sources

Audit-ready structured reports

Tech Stack

Python LLM APIs RAG Pipeline Claim Extraction / NLP ChromaDB Streamlit

GitHub

05

Shipped Python Multi-Agent Streamlit

Multi-Agent Autonomous Investment Analyst

Simulated hedge fund with specialised agents that debate, stress-test, and produce investment decisions.

+

Overview

A six-agent system built with the raw Anthropic SDK that replicates a hedge fund research process. Agents specialise in market data, news sentiment, technical signals, and fundamental analysis. Opposing "bull" and "bear" agents stress-test each thesis before a final decision agent synthesises a clear action with confidence level and risk context.

Key Contributions

Designed agent specialisation and orchestration architecture using raw Anthropic SDK (no LangChain)
Built adversarial bull/bear debate layer to stress-test investment theses
Integrated live market data via yfinance for real-time analysis
Produced a Streamlit dashboard for interactive analysis and output review

How It Works

Research agents gather data

→

Bull / Bear debate

→

Decision agent synthesises

→

Action + confidence output

Tech Stack

Python Anthropic SDK yfinance Streamlit

GitHub

06

Live FastAPI React Multi-tenant

ARIA — AI Fashion Commerce Assistant

Embeddable AI shopping assistant for fashion merchants with multi-tenant SaaS architecture.

+

Overview

Production-ready SaaS platform with multi-tenant architecture, secure API proxying, embeddable web widget for fashion merchants, and a demo experience for prospects. Deployed on modern cloud infrastructure.

Key Contributions

Designed multi-tenant architecture.
Built aria-widget.js — a drop-in embeddable widget deployable on any merchant site
Implemented secure API proxying.
Shipped a live demo page for cold outreach and prospect qualification

How It Works

Merchant embeds widget

→

Edge proxy validates client

→

FastAPI routes to LLM

→

Response streamed to UI

Tech Stack

Python FastAPI React / Vite Vercel Railway Supabase

GitHub Live Demo ↗

07

Planned Next.js 14 Turborepo Multi-tenant

AI Business Automation Suite

Multi-tenant SaaS platform hosting independently deployable AI workflow and productivity applications.

+

Overview

A production-grade SaaS monorepo that packages multiple AI applications, including an email assistant, RAG chatbot, workflow builder, and multi-agent operator, under a single shared platform layer. Each app is independently deployable but shares auth, billing, job queuing, and observability infrastructure. Designed to demonstrate enterprise-level AI product architecture at scale.

Key Contributions

Architected a Turborepo monorepo with shared packages for auth, UI components, and API clients
Designed per-tenant data isolation and access control using Clerk + Prisma row-level scoping
Implemented BullMQ job queue for async AI task processing with retry and failure handling
Integrated Langfuse for LLM observability: tracing, cost tracking, and output evaluation across all apps
Built the workflow builder as a node-based editor for composing multi-step AI pipelines

How It Works

Tenant authenticates (Clerk)

→

App routes to isolated context

→

BullMQ dispatches AI job

→

LLM processes + Langfuse traces

→

Result delivered to tenant

Tech Stack

Next.js Turborepo Clerk Prisma / PostgreSQL pgvector BullMQ / Redis Langfuse Vercel

GitHub

08

Shipped Python Gradio

PivotPro — Career Pivot Coach

Viability assessment and action planning for professionals changing careers.

+

Overview

An LLM-powered career coaching agent that evaluates career pivot viability, maps skills gaps, and generates a structured action plan. Designed to replace shallow advice with grounded, role-specific guidance across any industry combination.

Key Contributions

Built a multi-step evaluation chain: feasibility → gap analysis → action plan generation
Designed prompt architecture for role-specific skill mapping across industries
Implemented structured PDF export for shareable coaching outputs

Tech Stack

Python LLM APIs Gradio PDF Export

GitHub

09

Shipped Python Gradio

PriorityPilot — Personal Productivity Agent

Intelligent task orchestration with dynamic prioritisation frameworks.

+

Overview

A task management agent that applies dynamic prioritisation logic. Combining urgency, impact, and effort scoring, to surface what actually deserves attention. Demonstrates structured tool use and output generation in a practical, non-trivial domain.

Key Contributions

Designed multi-dimensional scoring logic (urgency × impact × effort) for task ranking
Implemented tool-use architecture for context gathering and structured scheduling
Built structured output pipeline with actionable daily planning summaries

Tech Stack

Python LLM APIs Gradio Tool Use

GitHub

10

Planned Next.js Turborepo

AI Tools Suite

Modular multi-app SaaS platform with shared infrastructure and independently deployable AI tools.

+

Overview

A composable SaaS platform built on a Turborepo monorepo housing a content generation engine and a career development toolkit. Designed with shared auth, billing, and API infrastructure so each tool is independently deployable without duplicating platform logic.

Key Contributions

Monorepo architecture with Turborepo enabling shared packages and independent app deployments
Shared auth and billing layer across all tools (Clerk + Stripe)
Content generation engine with multi-format output support
Career toolkit integrating PivotPro and PriorityPilot as sub-applications

Tech Stack

Next.js 14 Turborepo Clerk Prisma BullMQ Vercel

GitHub

11

In Progress React Supabase

Trading Journal & Analytics Dashboard

Full-stack platform for tracking, analysing, and improving trading performance.

+

Overview

A professional trading analytics platform with CSV import, exchange API sync, live candlestick charting with entry/exit markers, R-multiple tracking, and multi-pair equity curves. Designed for systematic traders who need data, not opinions.

Key Contributions

CSV import pipeline for Binance and Bybit trade history
Live candlestick charting with trade entry/exit overlays via lightweight-charts
R-multiple tracking and tiered risk scaling analysis
Supabase Edge Functions for exchange API sync

Tech Stack

React Supabase lightweight-charts Tailwind CSS Vercel

GitHub Live Demo ↗

12

Shipped RAG Gradio Capstone

AI Document Intelligence System

End-to-end document processing with RAG-based QA, hallucination detection, and LLM benchmarking.

+

Overview

Capstone project from the prompt-engineering-lab tying together document ingestion, retrieval-augmented QA, hallucination mitigation strategies, and a multi-model benchmarking dashboard. Demonstrates the full stack of applied prompt engineering from raw document to trusted output.

Key Contributions

Document ingestion pipeline with chunking, embedding, and vector retrieval
Hallucination detection with confidence calibration and mitigation techniques
Multi-model benchmarking suite with Streamlit results dashboard
Gradio interface for live document QA demonstration

How It Works

Ingest & chunk docs

→

Embed & index

→

RAG retrieval + LLM

→

Hallucination check

→

Verified answer

Tech Stack

Python RAG Pipeline Vector DB Gradio Streamlit Multi-LLM Eval

GitHub

Chuks
Forge

Engineer.
Trader.
Builder.

Products &
Projects

9 experiments.
1 framework.

Building
in public.

Ready to
build something?

Chuks Forge

Engineer.Trader.Builder.

Products &Projects

9 experiments.1 framework.

Buildingin public.

Ready tobuild something?

Chuks
Forge

Engineer.
Trader.
Builder.

Products &
Projects

9 experiments.
1 framework.

Building
in public.

Ready to
build something?