zoobzio January 18, 2026 Edit this page

Concepts

Chisel is built around a few core abstractions: chunks, kinds, providers, and context. Understanding these helps you reason about what chisel extracts and why.

Chunks

A Chunk is a semantic unit of code. Unlike line-based splitting, chunks follow the natural boundaries of code: where functions start and end, where classes are defined, where documentation lives.

type Chunk struct {
    Content   string   // The actual source code
    Symbol    string   // Name: "Add", "UserService", "Config"
    Kind      Kind     // Category: function, method, class, etc.
    StartLine int      // Where it begins (1-indexed)
    EndLine   int      // Where it ends (1-indexed)
    Context   []string // Parent chain: ["class UserService"]
}

Each chunk is self-contained. The Content field holds the complete source—including comments and documentation—so embeddings capture the full meaning.

See Types Reference for the complete definition.

Kinds

Kind categorizes what a chunk represents. This lets you filter, group, or weight chunks differently in your pipeline.

KindDescriptionExample
functionStandalone functionfunc Add(a, b int) int
methodFunction with receiver/selffunc (c *Calc) Add(n int)
classClass or struct definitionclass UserService {}
interfaceInterface or traitinterface Reader {}
typeType alias or other typetype ID = string
enumEnumerationenum Status { Active }
constantConstant declarationconst MaxSize = 100
variableVariable declarationvar cache = map{}
sectionMarkdown header## Installation
modulePackage/file levelPackage documentation

Not every language uses every kind. Go has no enums; Python has no interfaces. Chisel maps language constructs to the closest semantic equivalent.

See Types Reference for the complete list.

Providers

A Provider parses a specific language into chunks. Each provider understands its language's AST and extracts meaningful units.

type Provider interface {
    Chunk(ctx context.Context, filename string, content []byte) ([]Chunk, error)
    Language() Language
}

Chisel ships with providers for:

  • Go — Uses stdlib go/parser, zero external dependencies
  • Markdown — Header-based splitting, zero dependencies
  • TypeScript/JavaScript — Tree-sitter parser
  • Python — Tree-sitter parser
  • Rust — Tree-sitter parser

The provider isolation is intentional. If you only need Go support, you don't pay for tree-sitter. Import only what you use.

See Providers Guide for language-specific behavior.

Context

Context captures the parent chain for nested definitions. When you chunk a method, the context tells you which class it belongs to.

class UserService {
    private db: Database;

    async getUser(id: string): Promise<User> {
        return this.db.find(id);
    }
}

The getUser chunk will have:

Chunk{
    Symbol:  "getUser",
    Kind:    KindMethod,
    Context: []string{"class UserService"},
}

Context flows downward. A method inside a class inside a module might have:

Context: []string{"module api", "class UserService"}

This enables queries like "find all methods in UserService" or "show me everything in the api module."

Languages

Language identifies which provider handles a file. Use it with the Chunker to route files automatically.

const (
    Go         Language = "go"
    TypeScript Language = "typescript"
    JavaScript Language = "javascript"
    Python     Language = "python"
    Rust       Language = "rust"
    Markdown   Language = "markdown"
)

The Chunker maps languages to providers. If you're processing a single language, you can use the provider directly without the chunker.

Next Steps