Predicate
Granite-4.0-micro + a frozen domain LoRA + a small MLP head reading one hidden state.
Pass content and a natural-language criterion; the head reads a calibrated probability in [0, 1] off the model's activations. The criterion is an input, not a trained class, so a single probe handles any criterion with no per-criterion training. The same input returns the same score every call.
// one model, any criterion — no fine-tuning per task score( content = "Comparing you to two competitors. About ready to switch.", criterion = "the customer is at risk of churning" ) → 0.97
Apache-2.0 · isotonic-calibrated · deterministic (no sampling) · KV-pop extraction at layer L28 · many criteria scored against one content prefill
Try it
One piece of content, any criteria you write — scored live in a single pass. Notice the score tracks who and how, not just the topic.
Scores will appear here.
A scoring function, not a classifier
You don't pick from a fixed label set — you write an arbitrary natural-language criterion and get back a number you can threshold, route on, monitor, and compose. That shape is the design intent: decompose a complex judgment into orthogonal axes, score each independently, combine the results.
It reads more than topic — subject (whose attribute is this?), tense (now or resolved?), and how a claim is framed (asserted, quoted, hypothetical) — not just what it's about. That's the difference between "is this about debt" and "is the user, right now, in debt." It's a small probe reading an instruction-tuned LLM's hidden state, so it inherits the model's comprehension at prefill speed with a single scalar output.
Why not just…
Three ways to score content against a rule. On plain classification a small encoder is a genuine peer, so the difference isn't raw accuracy — it's the shape of the output and the range of criteria: any criterion in, a stable calibrated number out.
Embeddings · GLiClass
fast, cheap, zero-shot
A genuine peer on plain classification. But you're limited to label strings rather than arbitrary criteria, there's no calibrated, reproducible score to build a control loop on, and context caps at 512–8K tokens.
labels only · short context
LLM-as-judge
flexible, but heavy
Flexible and good at genuine reasoning. But not reproducible — verdicts flip on prompt-format changes and shift silently on a vendor update — there's no raw calibratable score, and a large model is costly and slow for a thresholded yes/no at scale.
non-reproducible · ~$/call · ~seconds
Predicate
promptable, calibrated, reproducible
Any natural-language criterion in, a stable calibrated number out, in milliseconds. Same input → same score — safe to cache, threshold, monitor for drift, A/B, and put in a control loop; many criteria in one pass; documents tens of thousands of tokens long.
any criterion · calibrated ✓ · reproducible ✓ · ~ms
It reads the sentence, not the keywords — same content, the score moves with the subject and framing of the criterion (illustrative):
What it reads
Write any rule in English and get back a calibrated 0–1 you can threshold, cache, monitor, and compose — the same score every time. What makes that worth more than a keyword match is that it reads what a text means: not just its topic, but who it's about, when, and how a claim is framed.
- Topic, sentiment, intent, stance
- Subject / attribution — whose attribute is this? (self vs third party)
- Framing — asserted vs quoted vs hypothetical; first-party voice
- Tense & aspect — "used to" vs "still"
- Multi-clause AND / OR, deontic, temporal conditions
- Cross-language content — strong across de / es / zh
- Long documents — reads across tens of thousands of tokens in one pass (encoders cap at 512–8K)
It scores how content reads, not whether a claim is true. For fact-checking, entailment against messy evidence, or arithmetic, pair it with retrieval or explicit logic.
How it works
The probe
An instruction-tuned Apache-2.0 model (Granite-4.0-micro) + a frozen domain LoRA + a small MLP head reading the hidden state at a fixed seed-token position. The base model already comprehends the language; the head reads a calibrated decision off its activations. The criterion is an input, not a trained class.
Multi-criteria, one pass
Encode the content once, then score every criterion against that cached encoding in a single batched pass. Cost scales with the length of the content, not the number of criteria, so asking many orthogonal questions (sentiment, topic, intent, subject, frame…) is close to the cost of one.
Calibrated & reproducible
Isotonic calibration is baked in, so scores express observed frequencies. And because the whole probe — base model, LoRA, head, calibration — is one fixed, versioned artifact, reproducible means two concrete things: the same input returns the same score every call (no sampling, no temperature), and the function can't shift underneath you on a vendor model update — there's no vendor in the loop. That's what lets you cache a score, set a threshold that still means the same thing next month, A/B a change, and monitor for drift.
Canary-verified loads
Every artifact ships with reference (criterion, content) pairs and a hash of their extracted hidden states. At load the runtime re-extracts and verifies — a mismatch refuses to serve, catching LoRA / layout / kernel drift before it reaches a response.
Open source — Apache 2.0
Predicate is open source under Apache 2.0 (Granite-4.0-micro backbone). Run the Docker image, or load the weights directly.
docker run --gpus all -p 8088:8088 ghcr.io/nope-net/predicate-oss:0.1.1
curl -X POST localhost:8088/classify_multi_criteria \
-H 'content-type: application/json' \
-d '{"content":"…","criteria":["criterion one","criterion two"]}' Image: ghcr.io/nope-net/predicate-oss · Weights + model card: huggingface.co/nopenet/predicate-oss · Questions: [email protected]
Predicate is a content-scoring tool, not a predictive, diagnostic, or therapeutic system, and is not a replacement for human judgment.