Aruuz Nigar — Scansion Pipeline Overview
This document assumes familiarity with basic arūz concepts and focuses on execution flow.
Purpose of This Document
- Describe how Aruuz Nigar processes poetic text from input to scansion output
- Present the engine pipeline at a conceptual but accurate level
- Provide a mental model consistent with the actual execution flow
- Bridge high-level concepts and detailed internal documentation
Pipeline at a Glance
- Input is processed through a sequence of constrained transformations
- Ambiguity is introduced early and resolved late
- Combination and pruning occur together, not as separate phases
- The pipeline is driven by structural constraints, not guesswork
Stage 1: Input Normalization
- Raw poetic text is cleaned of punctuation and non-visible characters
- Characters are normalized to consistent forms
- Each line is treated as an independent analytical unit
- No rhythmic or metrical assumptions are made at this stage
Stage 2: Tokenization into Words
- Each normalized line is split into lexical word units
- Word boundaries follow orthographic conventions
- Each word becomes a structured object for later analysis
- Diacritics, when present, are preserved as optional cues
Stage 3: Word-Level Scansion
- Each word is analyzed in isolation
- Possible syllabic interpretations are inferred
- Long, short, and ambiguous syllables are represented symbolically
- Multiple scansion codes per word are expected and preserved
- Dictionary knowledge is preferred where available, heuristics fill gaps
Stage 4: Contextual Prosodic Adjustment
- Inter-word prosodic rules are applied
- Pronunciation-dependent effects modify scansion possibilities
- Additional variants may be introduced based on context
- No scansion possibilities are discarded at this stage
Stage 5: CodeTree Construction and Search Space Formation
- Word-level scansion variants are organized into a tree structure
- Each branch represents a complete rhythmic possibility for the line
- The tree encodes the full combinatorial search space implicitly
- No explicit Cartesian product of codes is materialized
Stage 6: Tree Traversal and Meter Constraint Application
- The code tree is traversed depth-first
- Partial rhythmic paths are continuously checked against meter constraints
- Incompatible meters are eliminated as soon as constraints are violated
- Ambiguity is preserved where classical rules permit flexibility
- Matching and pruning occur simultaneously during traversal
Stage 7: Line-Level Scansion Results
- Each surviving traversal path produces a complete line-level result
-
Results include:
-
Word-by-word taqti
- Complete rhythmic pattern
- One or more matching meters
- Multiple valid results per line may coexist
Stage 8: Dominant Meter Resolution
- A classical assumption of meter consistency across related lines is applied
- Candidate meters are scored across all analyzed lines
- Structural consistency is prioritized over local or partial matches
- All non-dominant meters are explicitly discarded
Stage 9: Final Output
- Output consists of structured scansion results
- Each result is traceable to the decisions made during the pipeline
- The engine makes no assumptions about presentation or formatting
- Consumers are free to render, visualize, or post-process results
Key Design Properties of the Pipeline
Deterministic Execution
- The same input always produces the same results
- No probabilistic or stochastic mechanisms are used
Late Commitment
- Uncertainty is preserved until sufficient structural context exists
- Early decisions are avoided wherever possible
Explainability
- Each transformation stage has a clear rationale
- Intermediate representations are inspectable
- Final results can be traced back through the pipeline
Relationship to Other Documents
- This document explains what happens, and when
- It does not describe how individual functions are implemented
- For engine phase-level detail, see Pure Engine Execution Flow
- For function-level tracing, see Scansion Data Flow