Skip to content

QuantumDrizzy/SESHAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SESHAT — an honest computational-epigraphy platform

CI Rust C++/CUDA Python

Seshat — Egyptian goddess of writing, knowledge, and measurement.

Tools for ancient scripts that never claim more than the evidence supports. Deciphered scripts get real tooling; undeciphered ones get honest limit-analysis; nothing is ever "deciphered", "translated", or assigned phonetic values by SESHAT. The contract is enforced in code, not just in prose.

The flagship study is Linear A — a measurement, not a decipherment — and every method is calibrated on Linear B (deciphered Mycenaean Greek): if a technique can't recover what we already know about Linear B, we don't trust it on Linear A. A rigorous negative is a valid headline.


Flagship result — Linear A, honestly

Phase Question Result
1 — Information limit How much structure does the corpus hold? Linear A H(next|prev) ≈ 3.7 bits, redundancy 31% vs Linear B 63% — far sparser, quantifying why it resists decipherment. Entropy-rate analysis: reliable only to n=2; n≥3 is undersampling (even for Linear B here)
2 — GNN sign embeddings Does co-occurrence encode phonology? On Linear B, recovers vowels (+8–11 pts, null-controlled z≈4) but not consonants (null)
3 — Linear B → Linear A transfer Do shared signs carry their Linear B vowels? No (|z|<2) → Linear A does not distributionally mirror Linear B — consistent with a different language
+ Typology Is Linear A statistically like any comparandum? Size-matched fingerprints (bootstrap): Linear A is robustly nearest Linear B — but that is writing-system kinship (both Aegean syllabaries), not a shared-language claim
+ Positional Do signs specialise by position? Yes (calibrated on Linear B); Linear A shows real positional structure — a measurement, not a reading

Signal vs no-signal


The multi-script platform

SESHAT is now a per-script platform (ADR-0003), with the honesty contract modelled on two independent axes — because they genuinely come apart:

  • script readable? — are the sign values known? · language understood?
readable language what SESHAT does
Egyptian, cuneiform, Phoenician, Linear B, … real sign tooling + analysis
Meroitic sign tooling, flagged language-undeciphered
Linear A limit-analysis only — no value assigned to any sign

16 scripts. Deciphered scripts get sign-inventory tooling straight from the Unicode standard (unicodedata — nothing hand-typed): Egyptian hieroglyphs (Gardiner codes), Sumero-Akkadian cuneiform (readings + composites), and a generic tool covering Linear B syllabary, Anatolian (Luwian) hieroglyphs, Phoenician, Ugaritic, Old Persian, Old Turkic, Gothic, Old Italic, Imperial Aramaic, Carian, Lycian, Lydian, and Meroitic. The registry routes every request and refuses to break the contract (decipher/translate/assign-values raise for every script; sign tooling is refused for unreadable ones).

from seshat_analysis.registry import route, tooling
route("linear_a", "decipher")        # -> ContractError (never, for any script)
route("linear_a", "sign_inventory")  # -> ContractError (Linear A is not readable)
route("linear_a", "info_limit")      # -> OK (limit-analysis)
tooling("egyptian")()                # -> 1071 hieroglyphs with Gardiner codes

How it works (each language where it fits)

component language role
seshat-analysis/ Python info-limit, block-entropy, GNN embeddings, typology, positional, Unicode sign tooling, the registry, null-model controls
seshat-core/ Rust corpus parser, sign inventory, bigram matrices
seshat-anneal/ C++/CUDA QUBO annealer — a future refinement layer (Phase 4), synthetic-only, not a decipherment claim
seshat-viz/ Rust/egui interactive sign tables and heatmaps

Reproduce

cd seshat-analysis && pip install -e .
pytest                                              # ~78 tests
python -m seshat_analysis.info_limit       --data ../data/corpus   # Phase 1
python -m seshat_analysis.typology         --data ../data/corpus   # comparative typology
python -m seshat_analysis.positional       --data ../data/corpus   # positional structure
python -m seshat_analysis.egyptian                                 # hieroglyph inventory
python -m seshat_analysis.registry                                 # the multi-script contract

Everything is seeded, deterministic, offline, and CPU-only.

Honesty contract

Linear A is not deciphered here; no phonetic value is asserted for any undeciphered sign. Deciphered scripts are ground truth; undeciphered ones are measurement, never announcement. Methods are trusted only after recovering known structure on a deciphered script and surviving a null-model control. The contract is enforced structurally by the registry.

Data & provenance

  • Linear A: John Younger's Linear A Database (Univ. of Kansas)
  • Linear B: Ventris & Chadwick; Duhoux & Morpurgo Davies; sign values from the authoritative Unicode Linear B Syllabary names (not hand-typed)
  • Comparanda: Luwian, Hurrian
  • Sign inventories: the Unicode standard, via unicodedata

Author

Antonio Zambudio Rodriguez

About

Honest computational-epigraphy platform — Unicode sign tooling, transliteration & glossary across 16 ancient scripts (Egyptian, cuneiform, Linear A/B…), decipherment contract enforced in code. Measurement, never announcement. No cloud.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors