Aruuz Nigar — Scansion Pipeline Overview

This document assumes familiarity with basic arūz concepts and focuses on execution flow.

Purpose of This Document

Describe how Aruuz Nigar processes poetic text from input to scansion output
Present the engine pipeline at a conceptual but accurate level
Provide a mental model consistent with the actual execution flow
Bridge high-level concepts and detailed internal documentation

Pipeline at a Glance

Input is processed through a sequence of constrained transformations
Ambiguity is introduced early and resolved late
Combination and pruning occur together, not as separate phases
The pipeline is driven by structural constraints, not guesswork

Stage 1: Input Normalization

Raw poetic text is cleaned of punctuation and non-visible characters
Characters are normalized to consistent forms
Each line is treated as an independent analytical unit
No rhythmic or metrical assumptions are made at this stage

Stage 2: Tokenization into Words

Each normalized line is split into lexical word units
Word boundaries follow orthographic conventions
Each word becomes a structured object for later analysis
Diacritics, when present, are preserved as optional cues

Stage 3: Word-Level Scansion

Each word is analyzed in isolation
Possible syllabic interpretations are inferred
Long, short, and ambiguous syllables are represented symbolically
Multiple scansion codes per word are expected and preserved
Dictionary knowledge is preferred where available, heuristics fill gaps

Stage 4: Contextual Prosodic Adjustment

Inter-word prosodic rules are applied
Pronunciation-dependent effects modify scansion possibilities
Additional variants may be introduced based on context
No scansion possibilities are discarded at this stage

Stage 5: CodeTree Construction and Search Space Formation

Word-level scansion variants are organized into a tree structure
Each branch represents a complete rhythmic possibility for the line
The tree encodes the full combinatorial search space implicitly
No explicit Cartesian product of codes is materialized

Stage 6: Tree Traversal and Meter Constraint Application

The code tree is traversed depth-first
Partial rhythmic paths are continuously checked against meter constraints
Incompatible meters are eliminated as soon as constraints are violated
Ambiguity is preserved where classical rules permit flexibility
Matching and pruning occur simultaneously during traversal

Stage 7: Line-Level Scansion Results

Each surviving traversal path produces a complete line-level result
Results include:
Word-by-word taqti
Complete rhythmic pattern
One or more matching meters
Multiple valid results per line may coexist

Stage 8: Dominant Meter Resolution

A classical assumption of meter consistency across related lines is applied
Candidate meters are scored across all analyzed lines
Structural consistency is prioritized over local or partial matches
All non-dominant meters are explicitly discarded

Stage 9: Final Output

Output consists of structured scansion results
Each result is traceable to the decisions made during the pipeline
The engine makes no assumptions about presentation or formatting
Consumers are free to render, visualize, or post-process results

Key Design Properties of the Pipeline

Deterministic Execution

The same input always produces the same results
No probabilistic or stochastic mechanisms are used

Late Commitment

Uncertainty is preserved until sufficient structural context exists
Early decisions are avoided wherever possible

Explainability

Each transformation stage has a clear rationale
Intermediate representations are inspectable
Final results can be traced back through the pipeline

Relationship to Other Documents

This document explains what happens, and when
It does not describe how individual functions are implemented
For engine phase-level detail, see Pure Engine Execution Flow
For function-level tracing, see Scansion Data Flow

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search