Seshat — Egyptian goddess of writing, knowledge, and measurement.
Tools for ancient scripts that never claim more than the evidence supports. Deciphered scripts get real tooling; undeciphered ones get honest limit-analysis; nothing is ever "deciphered", "translated", or assigned phonetic values by SESHAT. The contract is enforced in code, not just in prose.
The flagship study is Linear A — a measurement, not a decipherment — and every method is calibrated on Linear B (deciphered Mycenaean Greek): if a technique can't recover what we already know about Linear B, we don't trust it on Linear A. A rigorous negative is a valid headline.
| Phase | Question | Result |
|---|---|---|
| 1 — Information limit | How much structure does the corpus hold? | Linear A H(next|prev) ≈ 3.7 bits, redundancy 31% vs Linear B 63% — far sparser, quantifying why it resists decipherment. Entropy-rate analysis: reliable only to n=2; n≥3 is undersampling (even for Linear B here) |
| 2 — GNN sign embeddings | Does co-occurrence encode phonology? | On Linear B, recovers vowels (+8–11 pts, null-controlled z≈4) but not consonants (null) |
| 3 — Linear B → Linear A transfer | Do shared signs carry their Linear B vowels? | No (|z|<2) → Linear A does not distributionally mirror Linear B — consistent with a different language |
| + Typology | Is Linear A statistically like any comparandum? | Size-matched fingerprints (bootstrap): Linear A is robustly nearest Linear B — but that is writing-system kinship (both Aegean syllabaries), not a shared-language claim |
| + Positional | Do signs specialise by position? | Yes (calibrated on Linear B); Linear A shows real positional structure — a measurement, not a reading |
SESHAT is now a per-script platform (ADR-0003), with the honesty contract modelled on two independent axes — because they genuinely come apart:
- script readable? — are the sign values known? · language understood?
| readable | language | what SESHAT does | |
|---|---|---|---|
| Egyptian, cuneiform, Phoenician, Linear B, … | ✓ | ✓ | real sign tooling + analysis |
| Meroitic | ✓ | ✗ | sign tooling, flagged language-undeciphered |
| Linear A | ✗ | ✗ | limit-analysis only — no value assigned to any sign |
16 scripts. Deciphered scripts get sign-inventory tooling straight from the
Unicode standard (unicodedata — nothing hand-typed): Egyptian hieroglyphs
(Gardiner codes), Sumero-Akkadian cuneiform (readings + composites), and a generic
tool covering Linear B syllabary, Anatolian (Luwian) hieroglyphs, Phoenician,
Ugaritic, Old Persian, Old Turkic, Gothic, Old Italic, Imperial Aramaic, Carian,
Lycian, Lydian, and Meroitic. The registry routes every request and refuses to
break the contract (decipher/translate/assign-values raise for every script;
sign tooling is refused for unreadable ones).
from seshat_analysis.registry import route, tooling
route("linear_a", "decipher") # -> ContractError (never, for any script)
route("linear_a", "sign_inventory") # -> ContractError (Linear A is not readable)
route("linear_a", "info_limit") # -> OK (limit-analysis)
tooling("egyptian")() # -> 1071 hieroglyphs with Gardiner codes| component | language | role |
|---|---|---|
seshat-analysis/ |
Python | info-limit, block-entropy, GNN embeddings, typology, positional, Unicode sign tooling, the registry, null-model controls |
seshat-core/ |
Rust | corpus parser, sign inventory, bigram matrices |
seshat-anneal/ |
C++/CUDA | QUBO annealer — a future refinement layer (Phase 4), synthetic-only, not a decipherment claim |
seshat-viz/ |
Rust/egui | interactive sign tables and heatmaps |
cd seshat-analysis && pip install -e .
pytest # ~78 tests
python -m seshat_analysis.info_limit --data ../data/corpus # Phase 1
python -m seshat_analysis.typology --data ../data/corpus # comparative typology
python -m seshat_analysis.positional --data ../data/corpus # positional structure
python -m seshat_analysis.egyptian # hieroglyph inventory
python -m seshat_analysis.registry # the multi-script contractEverything is seeded, deterministic, offline, and CPU-only.
Linear A is not deciphered here; no phonetic value is asserted for any undeciphered sign. Deciphered scripts are ground truth; undeciphered ones are measurement, never announcement. Methods are trusted only after recovering known structure on a deciphered script and surviving a null-model control. The contract is enforced structurally by the registry.
- Linear A: John Younger's Linear A Database (Univ. of Kansas)
- Linear B: Ventris & Chadwick; Duhoux & Morpurgo Davies; sign values from the authoritative Unicode Linear B Syllabary names (not hand-typed)
- Comparanda: Luwian, Hurrian
- Sign inventories: the Unicode standard, via
unicodedata
Antonio Zambudio Rodriguez
