Tutorial: Document Intake Pipeline¶
Build a pipeline that reads public comments from a CSV, classifies them, extracts structured data, and audits the classifications for bias — in three phases, each building on the last.
What you'll learn:
| Phase | What you build | Heddle concepts |
|---|---|---|
| 1 | A comment classifier you can test immediately | Worker configs, Workshop test bench, eval suites |
| 2 | A four-stage intake pipeline with a custom CSV reader | ProcessingBackend, processor workers, pipelines with mixed worker types |
| 3 | Bias detection with blind and sighted audit paths | Blind workers, parallel pipeline branches, sighted vs. blind analysis |
Prerequisites: Heddle installed and configured (heddle setup completed).
If you haven't done this yet, see the Getting Started guide.
The Problem¶
Your department receives hundreds of public comments on a development proposal. Each comment needs to be categorized by topic, tagged with a stance (support, oppose, conditional), and scanned for specific requests that require follow-up. Then you need a summary report.
Doing this by hand takes days. Using a single AI prompt loses nuance — a 50-comment batch sent to one prompt produces shallow classification and misses action items buried in long comments.
Worse: how do you know the AI isn't systematically biased? Maybe it's classifying all short comments as "general" regardless of content. Maybe it's more likely to mark comments as "oppose" when they mention certain topics. You can't check unless you explicitly audit the results.
This tutorial builds a pipeline that solves each problem in a separate, testable, auditable step.
How It Maps to Heddle¶
| What you want | Heddle concept | Why |
|---|---|---|
| "Sort these comments into categories" | Worker with a classification prompt | One worker = one job. The classifier doesn't try to extract or summarize. |
| "Read my CSV file" | Processor worker with a custom backend | Non-LLM work (reading files) gets its own worker type. Same I/O contracts. |
| "Pull out names, dates, and specific requests" | Second worker, different prompt | Extraction is a different task than classification — a separate worker keeps each prompt focused. |
| "Summarize everything" | Third worker in a pipeline | The summarizer gets structured data from prior stages, not raw text. |
| "Is the classifier biased?" | Blind worker (sees classifications but not text) | It can spot statistical patterns without being influenced by the content. |
| "Are similar comments treated consistently?" | Sighted reviewer (sees both text and classifications) | It can catch content-level inconsistencies the blind worker can't. |
| "Did my prompt change make things better?" | Workshop eval suite | Run test cases before and after. Compare scores. |
| "Run the whole thing automatically" | Pipeline config | Define stages in YAML. Heddle handles the data flow. |
Phase 1: Classify a Comment¶
By the end of this phase, you'll have a working classifier that you can test on any public comment.
What you're building¶
A single worker — comment_classifier — that takes a text comment and
returns:
- A primary topic from a fixed taxonomy (traffic, environment, housing, schools, economic, infrastructure, historic, health, community, general)
- A stance (support, oppose, conditional support, neutral, off-topic)
- Whether the comment contains actionable requests
- The specific action items, extracted from the text
- A confidence score
Step 1: Set up the example¶
Copy the example config into your Heddle project:
cp examples/document-intake/phase-1/workers/comment_classifier.yaml \
configs/workers/comment_classifier.yaml
Open the config and read through it. A few things to notice:
The taxonomy is in the prompt, not the schema. Unlike the shipped
classifier worker (which takes categories as input), this one has the
categories built in. This is a design choice — a built-in taxonomy works
better when the categories are stable and domain-specific.
stance uses an enum in the output schema. Heddle will reject any
output where stance isn't one of the five valid values. This catches
cases where the LLM invents a stance like "mostly supportive."
default_model_tier: "local". Classification is pattern-matching —
local models handle it well. This keeps the pipeline fast and free for
the classification stage.
Step 2: Validate and test¶
Open http://localhost:8080, find comment_classifier, click Test.
Paste this comment from the sample data:
I strongly support the proposed mixed-use development at the former Millbrook factory site. Our neighborhood has needed walkable retail for years, and the inclusion of 40 affordable housing units addresses a real gap. My only concern is parking — the plan shows 180 spaces for 220 residential units plus commercial.
Set context to "Mixed-use development proposal at former Millbrook
factory site" and click Run.
You should get a classification with primary_topic: "traffic" (the
parking concern) or "housing" (the affordable housing angle), stance:
"conditional_support", and actionable: true with an action item about
the traffic impact study.
Try the edge case. Paste this one:
This is just another giveaway to developers. The planning board always approves everything. Wake up people.
You should get primary_topic: "general", stance: "oppose",
actionable: false, and a lower confidence score. If the classifier
invents action items for this comment, your prompt needs tightening.
Step 3: Run the eval suite¶
The example includes a test suite with six comments covering different topics and stances, including edge cases:
- Go to Workers list →
comment_classifier→ Eval - Load
examples/document-intake/phase-1/eval/test_suite.json - Choose field_match scoring
- Click Run Suite
The eval checks three fields per comment: primary_topic, stance, and
actionable. These are the fields where the classifier most needs to be
consistent.
Set a golden baseline once you have passing results. This is your regression detector — any future prompt changes are compared against it.
What you have now¶
A tested classifier with an eval suite and golden baseline. You can paste any comment into Workshop and get structured output. But classifying one comment at a time isn't useful when you have 300 of them in a CSV file. That's Phase 2.
Phase 2: The Intake Pipeline¶
Phase 1 gave you a classifier for individual comments. Phase 2 chains four workers into a pipeline that processes a CSV of comments end-to-end.
What you're building¶
csv_reader ──► comment_classifier ──► entity_extractor ──► batch_summarizer
(processor) (classify each) (extract entities) (aggregate report)
The first stage is a processor worker — it runs Python code, not an
LLM. The CSV reader is a custom ProcessingBackend that you write (~40
lines of Python). This demonstrates that Heddle pipelines can mix LLM and
non-LLM work.
Heddle concepts introduced¶
Processor workers run a Python class instead of calling an LLM. The
worker config says worker_kind: "processor" and points to a class that
extends SyncProcessingBackend. That class implements one method:
process_sync(payload, config) → {"output": dict, "model_used": str}.
Same I/O contract validation as LLM workers — same input_schema,
same output_schema.
Custom ProcessingBackend — the CsvReaderBackend in
processing/csv_reader.py. It reads a CSV file, validates the columns,
and returns all rows as structured records. The key insight: backends
inherit from SyncProcessingBackend, which automatically offloads the
synchronous process_sync call to a thread pool. Write blocking code;
Heddle handles the async.
Batch-aware workers — the Phase 2 classifier and extractor take arrays of records instead of single items. In Phase 1 you tested individual comments. In a pipeline, the csv_reader returns all records at once, so downstream workers handle the full batch.
Input mappings — the pipeline config specifies how data flows between
stages using dot-notation paths. records: "read_csv.output.records"
means "take the records field from the read_csv stage's output."
Heddle infers dependencies from these paths — if stage B reads from stage
A's output, A must complete first.
Step 1: Understand the CSV reader¶
Open examples/document-intake/processing/csv_reader.py. The entire
backend is about 40 lines of actual logic:
- Read a CSV file from the
source_pathin the payload - Validate that the specified
text_columnexists - Return all rows as a list of dicts
- Handle encoding issues (UTF-8 with Latin-1 fallback)
The worker config that uses this backend is in
phase-2/workers/csv_reader.yaml:
name: "csv_reader"
worker_kind: "processor"
processing_backend: "examples.document_intake.processing.csv_reader.CsvReaderBackend"
Notice: worker_kind: "processor" — no system_prompt, no model tier,
no token limits. But input_schema and output_schema are still
required. The contract validation works exactly the same way.
Step 2: Copy configs and set up¶
cp examples/document-intake/phase-2/workers/*.yaml configs/workers/
cp examples/document-intake/phase-2/orchestrators/*.yaml configs/orchestrators/
heddle validate configs/workers/csv_reader.yaml
heddle validate configs/workers/entity_extractor.yaml
heddle validate configs/workers/batch_summarizer.yaml
heddle validate configs/orchestrators/document_intake.yaml
For the CsvReaderBackend to be importable, ensure the processing/
directory is on your Python path. If running from the heddle project root:
Step 3: Walk through the pipeline config¶
Open configs/orchestrators/document_intake.yaml. Four stages:
-
read_csv— processor worker, tierlocal(no LLM).input_mappingpullssource_pathandtext_columnfrom the goal context (values you provide when submitting the pipeline). -
classify— LLM worker.records: "read_csv.output.records"creates a dependency onread_csv. Heddle waits forread_csvto finish before startingclassify. -
extract— LLM worker. Reads from bothread_csv.output.recordsandclassify.output.classifications. Depends on both prior stages. -
summarize— LLM worker. Reads fromclassify,extract, andread_csv. Produces the final report.
The dependency chain is strictly sequential: read → classify → extract → summarize. Each stage must wait for the one before it. Phase 3 adds parallel branches.
Step 4: Test individual workers in Workshop¶
Before running the full pipeline, test each worker individually in Workshop. This is a key workflow pattern: test the parts, then assemble.
For the batch classifier and extractor, you can construct test inputs by pasting a few records as JSON arrays.
Step 5: Run the pipeline¶
With NATS running and workers deployed:
heddle submit "Process comments" \
--context source_path="examples/document-intake/sample-data/public_comments.csv" \
--context text_column="text" \
--context description="Mixed-use development proposal at former Millbrook factory site"
Or test in Workshop's pipeline editor.
The output from the summarize stage is a structured report: executive
summary, topic breakdown, stance distribution, deduplicated action items,
and staff recommendations — all generated from 12 comments processed
through four stages.
What you have now¶
A working intake pipeline with a custom processing backend, batch classification, entity extraction, and automated summarization. The summary is useful, but you have no way to know if the classifier is fair. That's Phase 3.
Phase 3: Bias Detection¶
Phases 1 and 2 give you automated intake and summarization. But how do you know the classifier isn't systematically biased? Maybe it's marking all short comments as "general." Maybe comments mentioning certain neighborhoods get classified differently.
Phase 3 adds two audit workers that check the classifications from two independent perspectives.
What you're building¶
read_csv ──► classify ──┬──► extract ──► summarize
│
├──► blind_bias_auditor
│
└──► fairness_reviewer
Three paths run in parallel after classification:
- The intake path (extract → summarize) produces the summary report just like Phase 2
- The blind audit checks classification patterns without seeing the original text — it catches statistical biases
- The fairness review checks consistency with full context — it catches content-level misclassifications
Heddle concepts introduced¶
Blind workers — the blind_bias_auditor has knowledge_silos: []
and its input mapping gives it only the classification results, never
the original text. It cannot evaluate whether individual classifications
are correct. But it can spot patterns: "all low-confidence items go to
the same category" or "comments classified as 'oppose' never have action
items."
Sighted vs. blind — the fairness_reviewer gets both the text and
the classifications. It can check whether similar content is treated
consistently, whether classification quality varies by author type, and
whether action items were missed. Together, the blind and sighted
reviewers catch different kinds of problems.
Parallel pipeline branches — after classify completes, three stages
run concurrently: extract, bias_audit, and fairness_review. None of
them depend on each other — they all depend only on classify (and
read_csv). Heddle infers this parallelism automatically from the
input_mapping paths.
Step 1: Create the blind bias auditor¶
Copy the Phase 3 configs:
cp examples/document-intake/phase-3/workers/blind_bias_auditor.yaml configs/workers/
cp examples/document-intake/phase-3/workers/fairness_reviewer.yaml configs/workers/
cp examples/document-intake/phase-3/orchestrators/document_intake_audited.yaml configs/orchestrators/
Open configs/workers/blind_bias_auditor.yaml. Notice the input schema:
it takes classifications (array) and total_count (integer). No text
field. No records field. This worker literally cannot see what was
classified — only how it was classified.
The prompt tells it to look for specific bias patterns: catch-all categories, confidence clustering, stance imbalance, low-confidence dumping. These are statistical checks that don't require seeing the original data.
Step 2: Understand the fairness reviewer¶
Open configs/workers/fairness_reviewer.yaml. This one takes records,
text_column, AND classifications. It can compare the text to its
classification and check for inconsistencies:
- Did two similar comments get different topics?
- Did the stance assignment match what the comment actually says?
- Were action items missed in some comments but caught in others?
The fairness reviewer produces misclassification_suspects (specific
comments it thinks were classified wrong) and an overall_fairness_score.
Step 3: Walk through the parallel pipeline¶
Open configs/orchestrators/document_intake_audited.yaml. The key
difference from Phase 2: after classify, three stages start
concurrently:
# These three all depend on classify but NOT on each other:
- name: "extract"
input_mapping:
records: "read_csv.output.records"
classifications: "classify.output.classifications"
- name: "bias_audit"
input_mapping:
classifications: "classify.output.classifications"
total_count: "classify.output.total_classified"
- name: "fairness_review"
input_mapping:
records: "read_csv.output.records"
classifications: "classify.output.classifications"
Heddle sees that extract, bias_audit, and fairness_review all
reference classify.output.* but none reference each other. It runs
them in parallel automatically.
Step 4: Compare the audit outputs¶
Run the pipeline and examine the three output streams:
summarizegives you the intake report (same as Phase 2)bias_audittells you whether the classification distribution looks suspiciousfairness_reviewtells you which specific classifications might be wrong
With 12 sample comments, the blind auditor might flag that the "general" category is underused or that confidence scores cluster suspiciously high. The fairness reviewer might catch that comment-001 (parking concern with overall support) could reasonably be classified as "housing" instead of "traffic."
These disagreements and flags are the most valuable output — they tell you where to focus your human review time.
What you have now¶
A six-worker pipeline with a custom processing backend and parallel audit branches. The intake path processes comments into a structured summary. The audit path checks whether the processing was fair and consistent. The pipeline output includes both the results and the quality assessment of those results.
What's Next¶
You now have a complete document intake system with built-in quality assurance. Here are two directions to push it further:
Idea 1: Custom Ingestor for RAG¶
Write a simple Ingestor subclass that feeds the CSV data into Heddle's
RAG pipeline vector store. This enables semantic search across all
processed comments: "Find all comments that mention traffic concerns
near the proposed site" — returning results ranked by relevance, not
just keyword matching.
This requires implementing the Ingestor ABC from
heddle.contrib.rag.ingestion and registering it as a new ingestor type.
The CSV reader backend you already built handles the file parsing — the
ingestor wraps it with normalization and chunking for vector storage.
Idea 2: Scheduled Monitoring¶
Set up Heddle's scheduler to watch a folder for new CSV drops and automatically run the intake pipeline. Each morning, planning staff would find a fresh summary report covering any new comments received overnight.
This uses Heddle's scheduler component with a cron-style trigger and file-watching logic in a custom backend. The pipeline itself doesn't change — scheduling is infrastructure, not workflow.
This tutorial uses the example configs in examples/document-intake/.
Each phase directory contains the complete working configs for that phase
— you can copy them directly or build them step by step following the
walkthrough above.