← Back to Home

How We Find Stories

Our detection pipeline and the criteria that define a Talmudic narrative

Built on Sefaria

This project uses text from Sefaria, the free online library of Jewish texts. Sefaria provides Hebrew and English translations aligned segment-by-segment—essential for our analysis.

Without Sefaria's open API and dedication to making Jewish texts accessible, this project wouldn't be possible.

Visit Sefaria.org →

How It Works: The 4-Stage Pipeline

Our current system (v7) decomposes detection into specialized stages. Instead of asking one AI call to do everything, each stage focuses on what it does best:

1
Fetch & Triage
Classify every segment
2
Detect
Event-guided classification
3
Refine
Trim boundaries
4
Merge
Cross-page stories
1

Fetch All Pages & Event Triage

First, we retrieve all pages from Sefaria with Hebrew and English aligned. Then event triage classifies every segment into one of four types:

NARRATIVE_EVENT — physical actions, changes in state
VERBAL_ACT — speech acts (blessings, rulings, questions)
DELIBERATION — legal debate, analysis, reasoning
HABITUAL — repeated customs or practices

Pages with fewer than 2 narrative events are skipped entirely—saving 50-66% of detection calls. This is the single biggest accuracy improvement in v7: by pre-filtering, the detector never sees purely legal pages that used to cause false positives.

2

Constrained Detection

On pages that pass triage, Google Gemini analyzes the text with segments pre-annotated with their event types. The AI sees labels like "[NARRATIVE_EVENT] Segment 3: Rabbi Yochanan went to..." which helps it distinguish narrative action from legal discussion.

The prompt includes curated examples from a Ground Truth database—real cases where our expert validator corrected the AI, organized by error type. The AI evaluates each passage against six criteria and checks for disqualifiers.

3

Boundary Refinement

Stories often start or end with legal material that isn't part of the narrative. Using the event triage labels, we automatically trim DELIBERATION segments from story edges—so stories begin at the first narrative event and end at the last action.

Hebrew narrative markers help pinpoint where stories begin:
מעשה (ma'aseh—"an incident")   יומא חד (yoma chad—"one day")   פעם אחת (pa'am achat—"one time")

4

Cross-Page Merge & Output

Talmud page breaks are arbitrary—they shouldn't split a story. We detect stories that span page boundaries by checking for narrative events on both sides of a break, then merge them into single entries.

Each finding is categorized: YES (definite story), HIGH (likely), LOW (borderline—one event with discussion), or rejected. Duplicate stories quoted on multiple pages are flagged so each is counted once.

The 6 Story Criteria

For a passage to be classified as a story, it should meet most of these criteria:

1

Identifiable Characters

Specific people appear—named rabbis like "Rav Hisda" or anonymous characters like "a certain man" or "a certain woman." Both count equally.

2

Multiple Events

More than one thing happens. A single action isn't a story—we need a sequence of events.

3

Cause and Effect

Events connect logically. One thing causes another—not just "A happened, then B happened."

4

Time Passes

There's a sense of before, during, and after. Time markers like "one day" or "eventually" signal this.

5

Actually Happened

The text describes what did happen, not what should happen or what might happen.

6

Something Changes

The situation at the end differs from the beginning. There's a transformation, not just a report.

What We Filter Out

These patterns look like stories but aren't—we automatically remove them:

Legal Rulings

"Rabbi X said it is permitted to..." — stating law, not telling a story

Hypothetical Cases

"If someone were to..." — not something that actually happened

Habitual Actions

"He would always..." — regular practice, not a one-time event

Mishnah Sections

Legal codifications marked with מתני׳ — not narrative

Attribution Chains

"Rabbi X quotes Rabbi Y as saying..." — citing sources, not characters in action

Legal Deliberation

"He thought about acting" or "experienced difficulty" — mental activity, not narrative events

Legal Debate Settings

"One sage sitting before another debating" — physical setting of debate is not a story

Simple Reports

"He visited and recited a blessing" — action without transformation

Examples

✓ This IS a Story
Ketubot 62b — The Death of Rav Reḥumi

"Rav Reḥumi would study before Rava in Meḥoza. He was accustomed to come home every year on the eve of Yom Kippur. One day he was engrossed in the halakha. His wife was expecting him: Now he is coming, now he is coming. He did not come. She was distressed. A tear fell from her eye. He was sitting on the roof. The roof collapsed under him and he died."

Why it's a story: Named character (Rav Reḥumi), multiple events in sequence, clear causation (engrossed → didn't return → wife distressed → tear → death), time passes ("one day"), describes what happened, and a dramatic transformation occurs.

✗ This is NOT a Story
Ketubot 17a — Legal Opinion

"Rabbi Shmuel bar Naḥmani quotes Rabbi Yonatan as saying it is permitted to look at the face of a bride."

Why it's not a story: The rabbis appear only to attribute a legal ruling. There are no events, no causation, no transformation. This is legal citation, not narrative.