Production Teardowns

Obsidian + Claude Code: My LLM Wiki for Competitive Intel

May 29, 2026Shubham Kashyap13 min read

RAG rediscovers knowledge every query. Karpathy's LLM Wiki compiles it once. Here is the Obsidian + Claude Code setup I use to track competitors.

RAG rediscovers knowledge every time you ask. The LLM Wiki compiles it once.

In April 2026, Andrej Karpathy published a GitHub gist called LLM Knowledge Bases. It described a pattern he had been running for months: instead of uploading documents to a RAG system and asking questions against retrieved chunks, he pointed an AI agent at a folder of raw sources and told it to build and maintain a wiki.

The wiki was plain markdown. Interlinked concept pages. Entity files for people and companies. Source summaries with backlinks. A living index that grew every time he added a new article, paper, or transcript.

His framing was precise:

"The knowledge is compiled once and then kept current, not re-derived on every query."

That sentence is the whole argument against naive RAG for personal knowledge work. NotebookLM, ChatGPT file uploads, and most vector-database RAG setups all work the same way: you ask a question, the system retrieves relevant chunks, the model synthesizes an answer from scratch. Nothing accumulates. Ask the same subtle question twice and the model rediscovers the same five fragments both times.

The LLM Wiki pattern flips that. The expensive synthesis happens at ingest time. The cheap retrieval happens at query time. And because the output is persistent markdown files with cross-references, every new source strengthens the whole graph instead of sitting in a chunk index waiting to be found.

I have been running a version of this setup for competitive intelligence on the Instagram and WhatsApp automation market. Not as a productivity experiment. As operational infrastructure for deciding what FusionSync should build next, what competitors are claiming, and where the positioning gaps are.

This post is the teardown: why RAG fails for this job, how the three-layer architecture works, why Obsidian is the right vault, and what the setup actually produces when you run it for two weeks on 130+ YouTube transcripts.

Why RAG is the wrong primitive for a second brain

RAG (Retrieval-Augmented Generation) is the default answer when someone says "I want AI on top of my documents." Upload PDFs, embed chunks, ask questions. It works for search. It fails for synthesis.

The failure modes are structural, not implementation bugs.

Nothing compounds

Every query starts from zero. The vector database has chunks. The model has to find the right chunks, read them, and synthesize an answer in one pass. There is no persistent artifact that gets smarter when you add source number 47.

Ask "what hook formats do my top five competitors use?" on day one and the model pieces together an answer from whatever chunks it retrieves. Ask the same question on day 30, after you have ingested 40 more videos, and it does the same work again. The 40 new sources improved the chunk index, but they did not improve any pre-built synthesis. The model still has to rediscover the pattern from fragments.

Chunk boundaries break cross-source reasoning

Production RAG in 2026 still struggles with the chunking problem. Fixed-size splits fragment coherent ideas. Semantic chunking helps but adds ingestion cost. Hierarchical parent-child chunking (retrieve small, generate large) is the current best practice, and it is still a workaround for the fact that embeddings compress paragraphs into single points in vector space.

When your question requires synthesizing five documents, RAG has to retrieve five chunks and hope they land in the context window in a useful order. The "lost in the middle" problem is well documented: models often ignore information buried in long multi-chunk contexts. Research on context window utilization shows quality degrades well before you hit the advertised token limit.

Bookkeeping does not happen

RAG does not update cross-references when a new source contradicts an old claim. It does not notice that your page on "WhatsApp Flows pricing" is now stale because Meta changed the rate card in April. It does not merge two entity pages that turned out to be about the same company.

Humans abandon wikis for exactly this reason. The maintenance burden grows faster than the value. Karpathy's insight is that LLMs do not get bored, do not forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

Dimension	Naive RAG	LLM Wiki
When knowledge is processed	At query time (every question)	At ingest time (once per source)
Cross-references	Discovered ad-hoc per query	Pre-built and maintained
Contradictions	May go unnoticed	Flagged during ingestion
Knowledge accumulation	None	Compounds with every source
Output format	Ephemeral chat responses	Persistent markdown files
Human role	Upload and query	Curate sources, explore wiki, question

The mental model shift: LLM as compiler and librarian, not chatbot.

The three-layer architecture

Karpathy's pattern has three layers. I run the same shape, with slightly different folder names.

Layer 1: `/raw` (immutable sources)

This is where unstructured information lands. No formatting required. No organization required. You drop files and walk away.

What goes in /raw:

Full transcripts of YouTube videos (competitor content, Meta product updates, operator teardowns)
Web articles clipped from blogs and docs (via Obsidian Web Clipper or manual paste)
Meeting notes and call transcripts
Screenshots of pricing pages (as markdown descriptions, not images)
PDF exports from reports

The rule: never edit files in /raw after ingestion. They are the source of truth. If you need to correct something, add a new file with a correction note and let the agent update the wiki.

My /raw folder currently holds 130+ YouTube transcripts from my own channel and competitors in the Instagram DM automation, WhatsApp qualification, and event-company inbound space. File naming convention: YYYY-MM-DD-source-slug.md.

Layer 2: `/wiki` (agent-maintained, never edit manually)

This is the compiled artifact. Structured markdown pages that the agent creates, updates, and cross-references. You read from it. You query against it. You do not edit it by hand.

What the agent creates in /wiki:

Page type	Example	Purpose
Concept	`whatsapp-flows.md`	Evergreen explanation of a product or idea
Entity	`manychat.md`	Company or person profile, updated as new sources arrive
Source summary	`source-2026-04-karpathy-llm-wiki.md`	One-page digest of a single raw file
Comparison	`manychat-vs-fusionsync-positioning.md`	Synthesis across multiple sources
Index	`_index.md`	Master list of all wiki pages with one-line descriptions
Changelog	`_changelog.md`	Log of what changed after each ingest run

Every wiki page uses [[wikilinks]] to reference other pages. When the agent adds a new concept page for "WhatsApp Calling API," it also updates the whatsapp-business-platform.md page to link to it, updates the _index.md, and adds a line to _changelog.md. That is the bookkeeping humans never do consistently.

The agent can touch 15 files in one ingest pass. That is the compounding mechanism.

Layer 3: `CLAUDE.md` (the manager schema)

This is the rulebook. A markdown file at the vault root (or repo root) that tells the agent how to process raw inputs, format wiki pages, handle contradictions, and maintain cross-references.

Mine includes:

Folder structure and what each folder is for
Page templates for each wiki page type (frontmatter fields, section headings)
Linking rules (when to create a new concept page vs. update an existing one)
Contradiction policy (if a new source disagrees with the wiki, add a "Conflicting claims" section, do not silently overwrite)
Naming conventions for files and slugs
A list of "anchor concepts" that should always exist (FusionSync positioning, Instagram Graph API, WhatsApp Business Platform, GoHighLevel, speed-to-lead)

CLAUDE.md is the same pattern as AGENTS.md in my Telegram agent repo. The agent reads it on startup. The intelligence is in the model; the contract is in the markdown.

Why Obsidian, not Notion or plain folders

The tool choice matters because the vault is the agent's working directory.

Notion fails the agent test

Notion stores content in a proprietary block-based format behind an API. To let an agent read or write Notion pages, you need OAuth, API keys, rate limits, and pagination. The agent cannot grep across your knowledge base. It cannot do a bulk find-and-replace across 200 pages. It cannot see the file system.

Notion also holds your data on Notion's servers. For competitive intelligence that includes client conversations, pricing experiments, and positioning drafts, that is the wrong trust boundary.

Plain folders lose navigation

A folder of markdown files works for agents (Claude Code reads files natively). But without a linking layer, you lose the graph. You cannot see which concepts connect. You cannot spot orphan pages. You cannot open a note and see every other note that references it.

Obsidian gets both

Obsidian stores everything as plain .md files in a local folder (a "vault"). No account required for local use. No telemetry. The files are yours.

Three properties make it the right shell for an LLM Wiki:

Markdown is LLM-native. Claude reads and writes .md without conversion. Wikilinks ([[concept]]), frontmatter, tags, and headings are all plain text the agent already understands.

Graph View shows what the agent built. After an ingest run, you can open Obsidian's graph and see the concept clusters, orphan pages, and dense connection nodes. This is a QA step, not just aesthetics. If the agent created 12 pages but the graph shows three disconnected clusters, the ingest missed cross-references.

The vault is a directory Claude Code can open. Point Claude Code at the vault path and it has full read/write/search access. No API wrapper. No MCP server required (though you can add one). The same agent that edits your codebase can maintain your wiki.

This is the same reason I run the FusionSync writing room as a git repo with markdown sources under scripts/posts/, not as a CMS-only workflow. Files on disk, agent-editable, git-versioned.

The Claude Code implementation

Claude Code is Anthropic's terminal-native agent. It reads files, edits files, runs shell commands, and follows project-level instructions from CLAUDE.md. It is the same category of tool as Cursor Agent CLI, which I already run headlessly on a VPS for the Telegram writing room.

For the LLM Wiki, Claude Code is the compiler.

Setup (15 minutes)

Create an Obsidian vault. Download Obsidian, create a new vault at a path like ~/vaults/competitive-intel. Obsidian creates the folder; you add structure.

Create the folder skeleton:

Write CLAUDE.md. Start from Karpathy's gist template or write your own. Mine is about 120 lines. It covers folder rules, page templates, linking policy, and the anchor concept list. This is the highest-leverage file in the setup.

Open the vault in Claude Code:

Initialize the wiki. First prompt:

Ingesting a web article

Clip the article to /raw using the Obsidian Web Clipper browser extension (cleans HTML into markdown) or paste manually.
Tell Claude Code:

The agent reads the raw file, creates or updates wiki pages, adds cross-links, updates _index.md, and logs the change in _changelog.md.

Ingesting a YouTube video

This is where MCP matters. Claude Code can connect to tools via the Model Context Protocol, an open standard Anthropic released in November 2024 for connecting AI agents to external data sources.

For YouTube transcripts, you have three options:

Method	How	Trade-off
Manual paste	Copy transcript from YouTube, save to `/raw`	Slow but zero setup
`yt-dlp` via Bash	Claude Code runs `yt-dlp --write-auto-sub` in terminal	Fast, no MCP needed, requires yt-dlp installed
YouTube MCP server	Connect a transcript MCP server to Claude Code	Cleanest UX, extra dependency

I use yt-dlp because Claude Code already has Bash access and I do not want another MCP server to maintain. Prompt:

The agent fetches, saves, reads, synthesizes, and updates the wiki in one pass.

Querying the wiki

Once sources are compiled, queries are cheap and high-quality.

The agent reads pre-synthesized, cross-referenced pages. It is not retrieving random chunks from 130 transcripts. It is reading a concept page on "hook formats" that already integrates evidence from 40 ingested videos, with links to the source summaries.

That is the quality difference Karpathy describes. The LLM is reading an encyclopedia entry, not re-assembling fragments.

What two weeks of compounding actually produced

I run this vault for competitive intelligence on the Instagram-to-WhatsApp inbound market. After two weeks and roughly 40 ingested sources (competitor videos, Meta docs, operator blog posts, pricing pages), here is what the wiki contained:

Wiki page type	Count	Example
Entity pages	18	ManyChat, Wati, 360dialog, Funaway
Concept pages	24	WhatsApp Flows, CTWA window, quality rating, DM qualification
Source summaries	40	One per ingested raw file
Comparison pages	6	FusionSync vs. ManyChat positioning, RAG vs. LLM Wiki
Index + changelog	2	Master index, ingest log

Queries that would have taken an afternoon of manual transcript reading:

"What topic gaps exist between my content and my top three competitors?" The agent read entity pages, compared topic lists, and returned five gaps with source links.
"Which competitors mention GoHighLevel integration and how do they position it?" Cross-referenced four entity pages and returned a positioning matrix.
"What changed in Meta's WhatsApp pricing between March and May 2026?" The contradiction policy in CLAUDE.md had flagged a rate card change during ingest. The agent surfaced it with both source summaries.

None of these required re-reading 40 raw files. The synthesis was already compiled.

Where this breaks (honest limits)

The LLM Wiki is not a universal replacement for RAG. Karpathy's own framing and subsequent analysis (including Particula's breakdown) draw the boundary clearly.

Scale ceiling

Compiled wikis work well up to roughly 400K words of source material. Beyond that, the wiki itself becomes too large for a single agent pass to maintain coherently. RAG with a proper vector store scales to millions of documents. If you are building enterprise search over 100K PDFs, RAG (or hybrid RAG) is the right tool.

Ingest cost is real

Every new source triggers a multi-file agent run. A 30-minute YouTube transcript might cost $0.50 to $2.00 in Claude API tokens for a thorough ingest (read transcript, create source summary, update 3 to 5 concept pages, refresh cross-links). That is cheaper than re-deriving on every query, but it is not free. Budget for ingest, not just query.

Agent drift

If CLAUDE.md rules are vague, the agent will invent inconsistent page structures, duplicate concepts under different names, and miss cross-references. The manager schema is the quality lever. Spend time on it. Revise it when you notice drift.

Not real-time

This is a batch compilation model. If a competitor publishes a video at 9 AM and you need the insight by 9:15 AM, you ingest manually and wait for the agent pass. For real-time monitoring, you still want alerts and feeds. The wiki is for synthesis, not surveillance.

The hybrid is probably where most teams land

Compiled wiki for stable knowledge (positioning, product docs, competitor profiles, market maps). RAG for volatile, high-volume corpora (support tickets, user feedback, analytics exports). The wiki handles "what do we know?" RAG handles "find the needle."

How this connects to how I run FusionSync

The LLM Wiki is not separate from the production work. It feeds it.

When I decided to write the WhatsApp Flows qualification post, the wiki already had concept pages on WhatsApp Flows, CTWA entry points, and competitor approaches to in-chat forms. The blog post took an afternoon instead of a day of re-research.

When I wrote the WhatsApp bans post, the entity pages for ManyChat and Wati already had notes on their compliance messaging. The quality rating concept page had the Green/Yellow/Red thresholds from Meta docs I had ingested weeks earlier.

The Telegram writing room ships blog posts from the same repo. The LLM Wiki vault informs what to write. The writing room executes the writing. Different tools, same operator.

This is the compounding loop Karpathy described. Knowledge does not reset between tasks. It accumulates in files the agent maintains, and every subsequent task starts smarter.

FAQ

Is this the same as NotebookLM?

No. NotebookLM is RAG with a nice UI. You upload sources, ask questions, get answers. Nothing persists between sessions except the uploaded files themselves. There is no wiki, no cross-references, no compounding. The LLM Wiki produces durable markdown pages that grow over time.

Do I need Claude Code specifically?

No. Any agentic coding tool that can read and write files in a directory works: Claude Code, Cursor Agent CLI, OpenAI Codex, or Windsurf. The pattern is tool-agnostic. Claude Code is what I use because it handles multi-file edits reliably and respects CLAUDE.md project instructions.

Can I use Cursor instead of Claude Code?

Yes. Open the vault folder in Cursor and use Agent mode with the same CLAUDE.md rules. The Cursor Agent CLI (which I run headlessly) uses the same pattern with AGENTS.md instead of CLAUDE.md. Same architecture, different agent wrapper.

How is this different from just asking ChatGPT to summarize files?

ChatGPT summaries are ephemeral. You get a answer in chat, it disappears, and the next session starts over. The LLM Wiki writes persistent files with cross-links that persist and compound. The difference is the same as between "asking a librarian a question" and "having the librarian maintain an encyclopedia."

What does the Obsidian Graph View actually show you?

After ingest runs, the graph reveals concept clusters (dense areas = well-connected topics), orphan pages (nodes with no links = concepts the agent failed to integrate), and bridge nodes (pages that connect otherwise separate clusters). I use it as a QA check after large ingest batches.

How much does it cost to run?

Obsidian is free for personal use. Claude Code requires a Claude Pro ($20/month) or Max ($100/month) subscription. Ingest costs depend on source length; a typical YouTube transcript ingest is $0.50 to $2.00 in tokens. Query costs are low because the agent reads pre-compiled wiki pages, not raw sources.

The bottom line

The second brain hype cycle produced a decade of note-taking apps that turned into graveyards. The LLM Wiki pattern works because it removes the maintenance burden that killed every previous attempt. The agent does the bookkeeping. You do the curating and the questioning.

RAG rediscovers knowledge on every query. An LLM Wiki compiles it once and keeps it current.
Three layers: /raw for sources, /wiki for agent-maintained pages, CLAUDE.md for the rules.
Obsidian is the right shell: local markdown, graph view, and a directory the agent can read and write directly.
Claude Code (or any file-native agent) is the compiler. Ingest is the expensive step. Query is cheap.
The pattern breaks at enterprise scale (100K+ docs) and is not real-time. For sub-400K word knowledge bases, it beats RAG on synthesis quality.

If you run an agency or a product company and your competitive research lives in scattered bookmarks, this setup is worth a weekend. Create the vault, write the CLAUDE.md, ingest ten sources, and ask your first compound question.

If you want to see what FusionSync builds with this intelligence, start with a free 7-day production pilot on one campaign, or a free AI audit if you want the leak map before the build.

Free 7-day pilot or a free AI audit

Turn Instagram and WhatsApp inquiries into booking-ready conversations.

FusionSync is the inbound operating system for event companies. Pick the starting point that fits where you are: run a free 7-day production pilot, or start with a free audit of your Instagram, WhatsApp, and CRM flow.

Book Free 7-Day Pilot Get a Free AI Audit

Not sure which fits? Pick the audit. We can scope the pilot from there.

Option 1

Free 7-day production pilot

We install the full Instagram-to-WhatsApp inbound system on one campaign you choose. You run real traffic. You decide on day seven.

Capture, qualify, route, CRM-sync on one live campaign
4 to 7 days setup, then 7 cost-free production days
Keep the same system if it works. No rebuild.
Stop with no obligation if it does not improve handoffs.

Option 2

Free AI audit of your sales process

No build, no commitment. We map where your current inbound and sales process is leaking, then hand you the AI fix order. Useful if you are not ready for a full pilot yet.

Walk-through of your Instagram, WhatsApp, and CRM flow
Map the leak points: missed DMs, cold handoffs, late sync
Written diagnosis and AI fix order, not a sales deck
Free, no commitment to the pilot afterward

FusionSyncAI