How We Find Stories

How It Works: The 4-Stage Pipeline

Our current system (v7) decomposes detection into specialized stages. Instead of asking one AI call to do everything, each stage focuses on what it does best:

Fetch & Triage

Classify every segment

Detect

Event-guided classification

Refine

Trim boundaries

Merge

Cross-page stories

Fetch All Pages & Event Triage

First, we retrieve all pages from Sefaria with Hebrew and English aligned. Then event triage classifies every segment into one of four types:

NARRATIVE_EVENT — physical actions, changes in state
VERBAL_ACT — speech acts (blessings, rulings, questions)
DELIBERATION — legal debate, analysis, reasoning
HABITUAL — repeated customs or practices

Pages with fewer than 2 narrative events are skipped entirely—saving 50-66% of detection calls. This is the single biggest accuracy improvement in v7: by pre-filtering, the detector never sees purely legal pages that used to cause false positives.

Constrained Detection

On pages that pass triage, Google Gemini analyzes the text with segments pre-annotated with their event types. The AI sees labels like "[NARRATIVE_EVENT] Segment 3: Rabbi Yochanan went to..." which helps it distinguish narrative action from legal discussion.

The prompt includes curated examples from a Ground Truth database—real cases where our expert validator corrected the AI, organized by error type. The AI evaluates each passage against six criteria and checks for disqualifiers.

Boundary Refinement

Stories often start or end with legal material that isn't part of the narrative. Using the event triage labels, we automatically trim DELIBERATION segments from story edges—so stories begin at the first narrative event and end at the last action.

Hebrew narrative markers help pinpoint where stories begin:
מעשה (ma'aseh—"an incident") יומא חד (yoma chad—"one day") פעם אחת (pa'am achat—"one time")

Cross-Page Merge & Output

Talmud page breaks are arbitrary—they shouldn't split a story. We detect stories that span page boundaries by checking for narrative events on both sides of a break, then merge them into single entries.

Each finding is categorized: YES (definite story), HIGH (likely), LOW (borderline—one event with discussion), or rejected. Duplicate stories quoted on multiple pages are flagged so each is counted once.

The 6 Story Criteria

For a passage to be classified as a story, it should meet most of these criteria:

Identifiable Characters

Specific people appear—named rabbis like "Rav Hisda" or anonymous characters like "a certain man" or "a certain woman." Both count equally.

Multiple Events

More than one thing happens. A single action isn't a story—we need a sequence of events.

Cause and Effect

Events connect logically. One thing causes another—not just "A happened, then B happened."

Time Passes

There's a sense of before, during, and after. Time markers like "one day" or "eventually" signal this.

Actually Happened

The text describes what did happen, not what should happen or what might happen.

Something Changes

The situation at the end differs from the beginning. There's a transformation, not just a report.

What We Filter Out

These patterns look like stories but aren't—we automatically remove them:

Examples

✓ This IS a Story

Ketubot 62b — The Death of Rav Reḥumi

"Rav Reḥumi would study before Rava in Meḥoza. He was accustomed to come home every year on the eve of Yom Kippur. One day he was engrossed in the halakha. His wife was expecting him: Now he is coming, now he is coming. He did not come. She was distressed. A tear fell from her eye. He was sitting on the roof. The roof collapsed under him and he died."

Why it's a story: Named character (Rav Reḥumi), multiple events in sequence, clear causation (engrossed → didn't return → wife distressed → tear → death), time passes ("one day"), describes what happened, and a dramatic transformation occurs.

✗ This is NOT a Story

Ketubot 17a — Legal Opinion

"Rabbi Shmuel bar Naḥmani quotes Rabbi Yonatan as saying it is permitted to look at the face of a bride."

Why it's not a story: The rabbis appear only to attribute a legal ruling. There are no events, no causation, no transformation. This is legal citation, not narrative.

Built on Sefaria

How It Works: The 4-Stage Pipeline

Fetch All Pages & Event Triage

Constrained Detection

Boundary Refinement

Cross-Page Merge & Output

The 6 Story Criteria

Identifiable Characters

Multiple Events

Cause and Effect

Time Passes

Actually Happened

Something Changes

What We Filter Out

Legal Rulings

Hypothetical Cases

Habitual Actions

Mishnah Sections

Attribution Chains

Legal Deliberation

Legal Debate Settings

Simple Reports

Examples