Build your knowledge base

The knowledge base (KB) is the shared memory of a workspace. Orkestral scans your repositories, writes structured pages about them, links those pages together, and indexes everything for fast retrieval. Your agents read this KB before they plan or touch code, so the better your KB, the sharper their work. This guide walks you through generating a KB from your sources, watching the analysis run, browsing the result (pages, wikilinks, and the graph), and understanding how agents consume it.

Generated from your repos

A deterministic scan maps files, languages, dependencies, entrypoints, and risks. An AI pass writes deep architecture pages on top.

Hybrid search

Pages are indexed with lexical search (BM25) plus local embeddings, then merged into one ranked result for agents and for you.

Linked and visual

Wikilinks and entity relations connect pages into a navigable graph you can explore in the galaxy view.

Local and free

Scanning and indexing run on your machine. The repository overview can be written by the bundled local model at zero API cost.

How it works

When you analyze a source, Orkestral runs a pipeline. Understanding the phases helps you read the progress and know what to expect.

Walk the repository

Orkestral walks the source folder, skipping noise like node_modules, .git, dist, build, coverage, and other build or vendor directories. It keeps documentation files (.md, .mdx, .markdown), code files (.ts, .tsx, .js, .py, .go, .rs, .java, and more), and key config files (package.json, tsconfig.json, Dockerfile). Files over 256 KB are skipped, and the scan stops at 800 files to keep the UI responsive.

Extract entities

Dependencies from package.json become tech entities (runtime and dev). Imports parsed from a sample of code files add more external libraries as entities. These feed the graph even before any AI runs.

Write the base overview

A root page titled Repo: <name> is created. Its overview can be written by the bundled local model (the Forge) from the in-memory inventory of languages, top directories, and docs. If the local route is unavailable or returns nothing, a deterministic summary is used instead.

Create deterministic coverage pages

Seven coverage pages are created as children of the root: a structural map, dependencies and scripts, entrypoints, a code inventory, contracts and integrations, tests and quality, and reading risks. These exist immediately, even if the AI pass fails later.

Run the deep AI analysis

An orchestrator agent (Claude or Codex) is spawned with your repo as its working directory and the Orkestral MCP tools attached. It reads the important files and writes rich pages: Overview, Architecture, Tech Stack, Dependencies, Directory Structure, Main Flows, Pain Points and Risks, Conventions, and Setup. It links pages with wikilinks as it goes.

Snapshot and index

Chunks are rebuilt, a binary snapshot is written to disk, and an embedding job is queued. Search indexing and embeddings make the pages retrievable.

If the AI pass succeeds and writes a rich tree (at least four of its own pages), the shallow deterministic coverage pages are archived (not deleted) so the KB shows the richer analysis. If the AI pass fails, the deterministic pages stay visible so you always have a usable base.

Before you start

A source with a valid local path

Analysis reads files from disk, so the source must have a local path that exists. If a source points only to a remote repo, clone it first. See Connect your repos.

An executable agent for the deep pass

The deep AI analysis needs a runnable agent in the workspace (a claude_local or codex_local adapter). Orkestral prefers the orchestrator (CEO). Without one, the scan still produces the deterministic base, but you get no AI-written architecture pages. See Hire your team.

The Forge for a local overview (optional)

When the bundled local model is available and routing allows it, the repository overview is written locally at zero API cost. This is optional: a deterministic overview is used as a fallback.

Generate the knowledge base

Open the knowledge base

Go to the Knowledge area of your workspace. If you have never run an analysis, you see an empty state inviting you to generate the KB from a source.

Pick a source to analyze

Choose the source (repository or folder) you want to map. Orkestral creates the Repo: <name> root page right away and starts the job in the background, so the UI stays responsive.

Analyze your most important source first. You can analyze more sources later, and each one becomes its own planet in the graph.

Watch the phases

Progress streams live. You see the current phase (walk, coverage-pages, ai-analysis, snapshot) and a running count of pages, entities, and files. During the AI pass, the tool calls the agent makes are surfaced as progress, so you can see it reading and writing pages.

Cancel if needed

You can cancel a running job at any time. The job stops, its status becomes cancelled, and anything already written (the root page, entities, coverage pages) stays in place.

Review the result

When the job completes, the root page and its children are populated. If the AI pass had a problem, the job finishes with a warning and the deterministic base remains, so you still have coverage.

Re-analyzing a source clears the old auto-generated pages for that source (except the root) before writing fresh ones. Pages you wrote by hand are a different kind and are not touched, but do not rely on edits made to auto-generated pages surviving a re-analysis.

Browse pages, wikilinks, and the graph

Once the KB exists, there are three ways to move through it.

Pages
Wikilinks
Graph

Pages are organized as a tree. The Repo: <name> page is the root, with children for architecture, stack, flows, risks, and the rest. Open any page to read its markdown. Each page shows its backlinks: the other pages that point to it, so you can trace what references a concept.

Inside page content, [[Title of another page]] is a wikilink. Orkestral resolves these to real pages automatically, so clicking one jumps you to the target. Agents create wikilinks while writing, which is how the tree connects: an architecture page links to a flow page, a flow page links to a tech entity, and so on. Wikilinks can also cross sources, connecting a frontend page to the backend API page it calls.

Reading the graph stats

The graph view includes a heads-up display summarizing the KB. These numbers come straight from the graph snapshot.

Pages, entities, and chunks

Total pages and total entities (including orphan entities like dependencies that the graph hides to avoid clutter but still counts as knowledge). Chunks are the indexed segments of your pages, the units that search and embeddings operate on.

Top hubs and constellations

Top hubs are the most connected nodes, the pages and entities everything points at. Constellations are clusters of two or more entities connected by relations, a quick read on how tightly your knowledge links together.

Growth and recently added

Weekly growth counts pages created per day over the last seven days, and recently added highlights pages from the past week. Use these to see your KB expanding as you analyze more sources and write more pages.

Grow your knowledge base

A KB is not a one-time export. It grows as you add sources, write pages, and re-analyze.

Analyze more sources

Repeat the generation flow for each repository in the workspace. Multiple sources share one graph, and the AI is told about sibling sources so it can link across them (for example, a frontend calling a backend API).

Write and edit pages by hand

Create your own pages for tribal knowledge the scan cannot infer: deployment runbooks, on-call notes, product decisions. Every page you create or update is reindexed for lexical search and queued for embeddings, so your manual notes are searchable alongside the generated ones.

Link as you write

Use [[Page title]] wikilinks in your content to weave new pages into the existing tree. Good linking turns scattered notes into a navigable web and strengthens the graph.

Rebuild snapshots after big changes

After a batch of edits, trigger a rebuild to refresh chunks, the binary snapshot on disk, and the embeddings. This keeps retrieval current with the latest content.

Re-analyze when a repo changes a lot

When a source evolves significantly, run the analysis again. Orkestral refreshes the auto-generated pages and re-queues embeddings so the KB reflects the new state of the code.

How agents use it

The KB is built for your agents first. Here is how they reach into it.

Hybrid retrieval

When an agent (or you) searches, Orkestral runs a lexical BM25 pass and a local-embedding semantic pass, then merges them into one ranked list. Lexical catches exact terms and identifiers; embeddings catch meaning even when the words differ.

Read before they plan

The orchestrator reads relevant KB pages before planning and delegating, so specialists inherit architecture, conventions, and risks instead of rediscovering them.

Write through MCP tools

During analysis, the agent uses kb_create_page and related MCP tools to materialize pages and links. The same tools let agents extend the KB during normal work.

Binary snapshot for bulk reads

An aggregated snapshot is written to disk so an agent can process the whole base from one ordered file instead of many round trips.

The deterministic coverage pages exist specifically so agents always have a baseline (structure, dependencies, entrypoints, risks) even when the deep AI pass is unavailable. Honest risk pages help agents avoid fragile areas before they edit them.

Troubleshooting

The job finished with an AI warning

The deterministic base was created but the AI pass failed. Confirm you have an executable agent in the workspace and that its CLI (claude or codex) is on your PATH, then re-analyze. The deterministic pages remain usable in the meantime.

The source has no valid local path

Analysis reads files from disk. If the source is not cloned locally, clone it first, then run the analysis again.

The graph looks empty or sparse

Sparse graphs usually mean few wikilinks. Encourage linking in the pages you write, and re-analyze so the AI pass can connect concepts and create entity relations.

Some files were not analyzed

The scan caps at 800 files and skips files over 256 KB. The reading-risks page flags when the cap was hit. For very large repos, focus the KB on the most important sources and folders.

What to do next

Connect more repos

Add the rest of your workspace sources so the graph spans your whole system.

Hire your team

Make sure an orchestrator and specialists are in place to power the deep AI analysis.

Run your first task

Put the KB to work: ask the team to plan and execute against your newly mapped codebase.

Explore the graph

Open the galaxy view to see how your pages and entities connect.

Generated from your repos

Hybrid search

Linked and visual

Local and free

​How it works

​Before you start

​Generate the knowledge base

​Browse pages, wikilinks, and the graph

​Reading the graph stats

​Grow your knowledge base

​How agents use it

Hybrid retrieval

Read before they plan

Write through MCP tools

Binary snapshot for bulk reads

​Troubleshooting

​What to do next

Connect more repos

Hire your team

Run your first task

Explore the graph

How it works

Before you start

Generate the knowledge base

Browse pages, wikilinks, and the graph

Reading the graph stats

Grow your knowledge base

How agents use it

Troubleshooting

What to do next