Concepts
Chisel is built around a few core abstractions: chunks, kinds, providers, and context. Understanding these helps you reason about what chisel extracts and why.
Chunks
A Chunk is a semantic unit of code. Unlike line-based splitting, chunks follow the natural boundaries of code: where functions start and end, where classes are defined, where documentation lives.
type Chunk struct {
Content string // The actual source code
Symbol string // Name: "Add", "UserService", "Config"
Kind Kind // Category: function, method, class, etc.
StartLine int // Where it begins (1-indexed)
EndLine int // Where it ends (1-indexed)
Context []string // Parent chain: ["class UserService"]
}
Each chunk is self-contained. The Content field holds the complete source—including comments and documentation—so embeddings capture the full meaning.
See Types Reference for the complete definition.
Kinds
Kind categorizes what a chunk represents. This lets you filter, group, or weight chunks differently in your pipeline.
| Kind | Description | Example |
|---|---|---|
function | Standalone function | func Add(a, b int) int |
method | Function with receiver/self | func (c *Calc) Add(n int) |
class | Class or struct definition | class UserService {} |
interface | Interface or trait | interface Reader {} |
type | Type alias or other type | type ID = string |
enum | Enumeration | enum Status { Active } |
constant | Constant declaration | const MaxSize = 100 |
variable | Variable declaration | var cache = map{} |
section | Markdown header | ## Installation |
module | Package/file level | Package documentation |
Not every language uses every kind. Go has no enums; Python has no interfaces. Chisel maps language constructs to the closest semantic equivalent.
See Types Reference for the complete list.
Providers
A Provider parses a specific language into chunks. Each provider understands its language's AST and extracts meaningful units.
type Provider interface {
Chunk(ctx context.Context, filename string, content []byte) ([]Chunk, error)
Language() Language
}
Chisel ships with providers for:
- Go — Uses stdlib
go/parser, zero external dependencies - Markdown — Header-based splitting, zero dependencies
- TypeScript/JavaScript — Tree-sitter parser
- Python — Tree-sitter parser
- Rust — Tree-sitter parser
The provider isolation is intentional. If you only need Go support, you don't pay for tree-sitter. Import only what you use.
See Providers Guide for language-specific behavior.
Context
Context captures the parent chain for nested definitions. When you chunk a method, the context tells you which class it belongs to.
class UserService {
private db: Database;
async getUser(id: string): Promise<User> {
return this.db.find(id);
}
}
The getUser chunk will have:
Chunk{
Symbol: "getUser",
Kind: KindMethod,
Context: []string{"class UserService"},
}
Context flows downward. A method inside a class inside a module might have:
Context: []string{"module api", "class UserService"}
This enables queries like "find all methods in UserService" or "show me everything in the api module."
Languages
Language identifies which provider handles a file. Use it with the Chunker to route files automatically.
const (
Go Language = "go"
TypeScript Language = "typescript"
JavaScript Language = "javascript"
Python Language = "python"
Rust Language = "rust"
Markdown Language = "markdown"
)
The Chunker maps languages to providers. If you're processing a single language, you can use the provider directly without the chunker.
Next Steps
- Architecture — How parsing works internally
- Providers Guide — Language-specific details
- Types Reference — Complete type definitions