The big data layer
Reverse Interlinear: Translation-Specific Alignment
How it is built
A traditional interlinear keeps the original Hebrew or Greek word order and contorts the English around it. A reverse interlinear keeps the English exactly as the translator wrote it, and places the matching original-language data beneath each word in natural reading order — per translation, for every supported Bible. Where an authoritative alignment source exists (KJV via STEPBible, the Berean Standard Bible's published interlinear, the unfoldingWord Literal Text), that data is used directly. For other translations, the pipeline runs a computer-assisted alignment engine: it assigns each source token a set of candidate English spans, then scores them using lexical glosses, learned translation probabilities (how likely this translation renders a given Hebrew or Greek word with a specific English word), part-of-speech and lemma matching, and positional evidence. Reviewed rows become runtime gold data; generated rows stay labelled as approximate until they pass Alignment Error Rate audit.
Technical complications
Biblical Hebrew is often Verb-Subject-Object; English is Subject-Verb-Object, so the source word rarely lands directly below its English rendering. The engine models positional displacement — how far a word is likely to drift between source and target — using the same diagonal model that drives statistical machine translation systems. One Hebrew word can become an English phrase and vice versa; translators also supply words English grammar requires that have no separate source token (the possessive in a Hebrew construct chain, the copula, the article). The pipeline explicitly models supplied and implied words rather than forcing false one-to-one matches. Disambiguation is harder still: 'the LORD God' in Genesis 2 maps to two Hebrew words whose glosses overlap — the system separates them using learned co-occurrence, morphology, and local positional evidence. Transliterated names that vary across translations (Nebuchadnezzar vs Nabuchodonosor) are matched using edit distance on the orthographic form.
Desired result
The operating principle is "zero wrong anchors": a confident wrong link is worse than a visible gap. Mechanical alignments are evaluated against hand-verified rows using Alignment Error Rate (AER: precision and recall across sure and possible links). Only alignments that pass are promoted to gold. Hard cases — free paraphrase, idiomatic collapses, English idioms where no source word cleanly corresponds — are flagged for careful review rather than guessed. When a reader taps a word in the English text, they see original-language data the system can stand behind.