AST-Aware Code Chunking for Go. Semantic Boundaries. Not Line Counts.

Parse source code into semantic units — functions, classes, methods, types — with structural context and metadata. Feed embedding models chunks that respect code boundaries, not arbitrary line splits.

Get Started
import "github.com/zoobz-io/chisel"

// Register language providers
chunker := chisel.New(
    chisel.Go(),         // stdlib go/parser — zero C deps
    chisel.TypeScript(), // tree-sitter
    chisel.Python(),     // tree-sitter
    chisel.Rust(),       // tree-sitter
)

// Parse into semantic chunks
chunks := chunker.Chunk(ctx, "go", "service.go", sourceCode)

for _, chunk := range chunks {
    fmt.Printf("%s %s [%d-%d]\n",
        chunk.Kind,    // function, method, class, interface...
        chunk.Symbol,  // "Handler.ServeHTTP"
        chunk.StartLine,
        chunk.EndLine,
    )
    fmt.Println(chunk.Context) // ["class UserService"]
    // chunk.Content = full source including comments
}

// Feed to embedding model → vector database
for _, chunk := range chunks {
    embedding := embed(chunk.Content)
    store(chunk.Symbol, chunk.Kind, embedding)
}
91%Test Coverage
A+Go Report
MITLicense
1.24+Go Version
v0.0.3Latest Release

Why Chisel?

Code chunks that respect structure — designed for semantic search pipelines.

Semantic Boundaries

Chunks split at function, class, and method definitions — not arbitrary line counts. Every chunk is a complete, meaningful unit.

Hierarchical Context

Methods know their parent class. Nested types preserve the full scope chain. Enables queries like 'find all methods in UserService'.

Go Provider at 32us

Stdlib go/parser with zero C dependencies. ~10x faster than tree-sitter providers for typical files.

Universal Kind Mapping

Language-specific constructs normalize to universal kinds. Python class and Go struct both become KindClass — downstream tools treat them uniformly.

Precise Line Mapping

Every chunk carries exact StartLine and EndLine (1-indexed). Seamless navigation back to original source files.

Five Language Providers

Go (stdlib), TypeScript, Python, Rust (tree-sitter), and Markdown. Isolated dependencies — tree-sitter only imported for languages that need it.

Capabilities

AST-aware parsing, context preservation, and language normalization for code intelligence pipelines.

FeatureDescriptionLink
Language ProvidersGo, TypeScript, Python, Rust, and Markdown. Each provider extracts language-specific symbols with consistent output format.Providers
Chunk KindsFunction, method, class, interface, type, enum, constant, variable, section, module — covering all major code constructs.Concepts
Context PreservationParent chain tracks enclosing scope. Full source preservation including comments and documentation for embedding models.Architecture
Vicky IntegrationFoundational chunking layer for the vicky code search and retrieval service. Chunks feed directly to vector databases.Vicky
TestingTest chunking output, verify symbol extraction, and validate context chains for custom provider implementations.Testing
TroubleshootingCommon issues with language detection, chunk boundaries, and tree-sitter provider configuration.Troubleshooting

Articles

Browse the full chisel documentation.

Learn

OverviewAST-aware code chunking for semantic search and embeddings
QuickstartGet productive with chisel in minutes
ConceptsCore abstractions in chisel
ArchitectureHow chisel works internally

Guides

Providers GuideLanguage-specific chunking behavior
Testing GuideTesting code that uses chisel
TroubleshootingCommon issues and solutions

Integrations

Vicky IntegrationUsing chisel with vicky for code search

Reference

API ReferenceFunction signatures and behavior
Types ReferenceType definitions and constants