Our detection pipeline and the criteria that define a Talmudic narrative
This project uses text from Sefaria, the free online library of Jewish texts. Sefaria provides Hebrew and English translations aligned segment-by-segment—essential for our analysis.
Without Sefaria's open API and dedication to making Jewish texts accessible, this project wouldn't be possible.
Visit Sefaria.org →Our current system (v7) decomposes detection into specialized stages. Instead of asking one AI call to do everything, each stage focuses on what it does best:
First, we retrieve all pages from Sefaria with Hebrew and English aligned. Then event triage classifies every segment into one of four types:
NARRATIVE_EVENT — physical actions, changes in state
VERBAL_ACT — speech acts (blessings, rulings, questions)
DELIBERATION — legal debate, analysis, reasoning
HABITUAL — repeated customs or practices
Pages with fewer than 2 narrative events are skipped entirely—saving 50-66% of detection calls. This is the single biggest accuracy improvement in v7: by pre-filtering, the detector never sees purely legal pages that used to cause false positives.
On pages that pass triage, Google Gemini analyzes the text with segments pre-annotated with their event types. The AI sees labels like "[NARRATIVE_EVENT] Segment 3: Rabbi Yochanan went to..." which helps it distinguish narrative action from legal discussion.
The prompt includes curated examples from a Ground Truth database—real cases where our expert validator corrected the AI, organized by error type. The AI evaluates each passage against six criteria and checks for disqualifiers.
Stories often start or end with legal material that isn't part of the narrative. Using the event triage labels, we automatically trim DELIBERATION segments from story edges—so stories begin at the first narrative event and end at the last action.
Hebrew narrative markers help pinpoint where stories begin:
מעשה (ma'aseh—"an incident")
יומא חד (yoma chad—"one day")
פעם אחת (pa'am achat—"one time")
Talmud page breaks are arbitrary—they shouldn't split a story. We detect stories that span page boundaries by checking for narrative events on both sides of a break, then merge them into single entries.
Each finding is categorized: YES (definite story), HIGH (likely), LOW (borderline—one event with discussion), or rejected. Duplicate stories quoted on multiple pages are flagged so each is counted once.
For a passage to be classified as a story, it should meet most of these criteria:
Specific people appear—named rabbis like "Rav Hisda" or anonymous characters like "a certain man" or "a certain woman." Both count equally.
More than one thing happens. A single action isn't a story—we need a sequence of events.
Events connect logically. One thing causes another—not just "A happened, then B happened."
There's a sense of before, during, and after. Time markers like "one day" or "eventually" signal this.
The text describes what did happen, not what should happen or what might happen.
The situation at the end differs from the beginning. There's a transformation, not just a report.
These patterns look like stories but aren't—we automatically remove them:
"Rabbi X said it is permitted to..." — stating law, not telling a story
"If someone were to..." — not something that actually happened
"He would always..." — regular practice, not a one-time event
Legal codifications marked with מתני׳ — not narrative
"Rabbi X quotes Rabbi Y as saying..." — citing sources, not characters in action
"He thought about acting" or "experienced difficulty" — mental activity, not narrative events
"One sage sitting before another debating" — physical setting of debate is not a story
"He visited and recited a blessing" — action without transformation
"Rav Reḥumi would study before Rava in Meḥoza. He was accustomed to come home every year on the eve of Yom Kippur. One day he was engrossed in the halakha. His wife was expecting him: Now he is coming, now he is coming. He did not come. She was distressed. A tear fell from her eye. He was sitting on the roof. The roof collapsed under him and he died."
Why it's a story: Named character (Rav Reḥumi), multiple events in sequence, clear causation (engrossed → didn't return → wife distressed → tear → death), time passes ("one day"), describes what happened, and a dramatic transformation occurs.
"Rabbi Shmuel bar Naḥmani quotes Rabbi Yonatan as saying it is permitted to look at the face of a bride."
Why it's not a story: The rabbis appear only to attribute a legal ruling. There are no events, no causation, no transformation. This is legal citation, not narrative.