AST-Aware Code Chunking for Go. Semantic Boundaries. Not Line Counts.
Parse source code into semantic units — functions, classes, methods, types — with structural context and metadata. Feed embedding models chunks that respect code boundaries, not arbitrary line splits.
Get Startedimport "github.com/zoobz-io/chisel"
// Register language providers
chunker := chisel.New(
chisel.Go(), // stdlib go/parser — zero C deps
chisel.TypeScript(), // tree-sitter
chisel.Python(), // tree-sitter
chisel.Rust(), // tree-sitter
)
// Parse into semantic chunks
chunks := chunker.Chunk(ctx, "go", "service.go", sourceCode)
for _, chunk := range chunks {
fmt.Printf("%s %s [%d-%d]\n",
chunk.Kind, // function, method, class, interface...
chunk.Symbol, // "Handler.ServeHTTP"
chunk.StartLine,
chunk.EndLine,
)
fmt.Println(chunk.Context) // ["class UserService"]
// chunk.Content = full source including comments
}
// Feed to embedding model → vector database
for _, chunk := range chunks {
embedding := embed(chunk.Content)
store(chunk.Symbol, chunk.Kind, embedding)
}Why Chisel?
Code chunks that respect structure — designed for semantic search pipelines.
Semantic Boundaries
Chunks split at function, class, and method definitions — not arbitrary line counts. Every chunk is a complete, meaningful unit.
Hierarchical Context
Methods know their parent class. Nested types preserve the full scope chain. Enables queries like 'find all methods in UserService'.
Go Provider at 32us
Stdlib go/parser with zero C dependencies. ~10x faster than tree-sitter providers for typical files.
Universal Kind Mapping
Language-specific constructs normalize to universal kinds. Python class and Go struct both become KindClass — downstream tools treat them uniformly.
Precise Line Mapping
Every chunk carries exact StartLine and EndLine (1-indexed). Seamless navigation back to original source files.
Five Language Providers
Go (stdlib), TypeScript, Python, Rust (tree-sitter), and Markdown. Isolated dependencies — tree-sitter only imported for languages that need it.
Capabilities
AST-aware parsing, context preservation, and language normalization for code intelligence pipelines.
| Feature | Description | Link |
|---|---|---|
| Language Providers | Go, TypeScript, Python, Rust, and Markdown. Each provider extracts language-specific symbols with consistent output format. | Providers |
| Chunk Kinds | Function, method, class, interface, type, enum, constant, variable, section, module — covering all major code constructs. | Concepts |
| Context Preservation | Parent chain tracks enclosing scope. Full source preservation including comments and documentation for embedding models. | Architecture |
| Vicky Integration | Foundational chunking layer for the vicky code search and retrieval service. Chunks feed directly to vector databases. | Vicky |
| Testing | Test chunking output, verify symbol extraction, and validate context chains for custom provider implementations. | Testing |
| Troubleshooting | Common issues with language detection, chunk boundaries, and tree-sitter provider configuration. | Troubleshooting |
Articles
Browse the full chisel documentation.