Documentation
Trust scores
How Sentinel calculates trust scores — the rubric, weights, score ranges, and what they mean for buyers and developers.
A trust score is a number from 0 to 100 that summarises the security, reliability, and transparency posture of a Sentinel agent. Every score is computed by the same versioned rubric, so scores are comparable across agents.
Score composition
The score is a weighted sum of four stage scores:
| Stage | Max points | Weight |
|---|---|---|
| Static analysis | 25 | 25% |
| Supply-chain audit | 25 | 25% |
| Dynamic testing | 30 | 30% |
| Red-team evaluation | 20 | 20% |
| Total | 100 | 100% |
Each stage score is computed independently. A critical finding in any stage sets that stage's score to 0 and prevents badge assignment regardless of the total.
Score ranges
| Score | Interpretation |
|---|---|
| 90–100 | Excellent — strong security and reliability posture, eligible for Premium badge |
| 75–89 | Good — minor issues present, eligible for Standard badge |
| 50–74 | Acceptable — multiple medium findings, eligible for Basic badge |
| 1–49 | Poor — significant issues, no badge, not recommended for production use |
| 0 | Failed — one or more critical findings |
What each point range means for buyers
90–100: The agent has no open critical or high findings. It passed the full dynamic and red-team suite. You can use it in production without additional scrutiny. Review the report to understand any low/info findings.
75–89: The agent has no critical findings and at most one or two medium findings. Review what the medium findings are — most are remediable and will improve in the next re-verification. Suitable for production.
50–74: The agent has multiple open medium findings or one open high finding. Review the full report before deploying in a security-sensitive context. Consider asking the developer for a remediation timeline.
Below 50: Do not use in production. The agent has significant unresolved security or reliability issues.
0: The agent failed verification. It has at least one critical finding. It is not listed on the marketplace and cannot be invoked.
Finding severity and score impact
Findings deduct points from the stage score. The deduction depends on the finding's severity and the stage it belongs to.
| Severity | Typical deduction |
|---|---|
critical | Stage score → 0; pipeline halted |
high | 5–10 points from stage score |
medium | 2–4 points from stage score |
low | 0.5–1 point from stage score |
info | 0 points |
Exact deductions vary by finding category — the rubric document specifies the exact weight for each category. The rubric is versioned and publicly available at /v1/trust/rubric/{version}.
Score changes over time
Trust scores are not static. They change when:
- A new agent version is published (full re-verification)
- A dependency receives a new CVE advisory (partial re-verification of the supply-chain stage)
- The scheduled re-verification runs (every 90 days)
- A finding is resolved or disputed (score recalculated immediately)
When a score changes, Sentinel:
- Updates the score and badge on the marketplace listing in real time
- Sends a
trust_score.updatedwebhook to subscribed endpoints - Logs the change in the trust score history visible on the agent detail page
Historical scores
Every score change is recorded with a timestamp and the rubric version used to compute it. You can view the score history on the agent detail page or via the API:
GET /v1/agents/{agent_id}/trust-score/history
{
"history": [
{
"score": 87,
"badge": "standard",
"rubric_version": "2025.1",
"computed_at": "2025-05-14T09:32:00Z",
"trigger": "publish"
},
{
"score": 82,
"badge": "standard",
"rubric_version": "2025.1",
"computed_at": "2025-04-01T09:32:00Z",
"trigger": "cve_advisory"
}
]
}
Rubric versioning
The scoring rubric is versioned with the format YYYY.N (for example, 2025.1). The rubric version is recorded in every trust report.
When the rubric changes:
- All agents are re-verified against the new rubric within 30 days of the new version's release
- A changelog is published explaining what changed and why
- Weight changes require an RFC and a 30-day public comment period before they take effect
You can retrieve any rubric version via the API:
GET /v1/trust/rubric/2025.1
Score for buyers: what to look for
When evaluating an agent for production use:
- Check the score — is it above 75?
- Read the open findings — are any of them
highseverity? What do they affect? - Check the verified date — is the score recent (within 90 days)?
- Look at the data handling declarations — does the agent retain data? Where is it processed?
- Check the developer's bond tier — does it match the risk profile of your use case?
Score for developers: how to improve
| Action | Score impact |
|---|---|
| Pin all dependencies to exact versions | +2–5 points (supply-chain) |
| Remove all secrets from source code | +5–10 points (static) |
| Add input validation and sanitisation | +3–8 points (static + dynamic) |
| Fix red-team prompt injection finding | +5–10 points (red-team) |
| Add example inputs to the manifest | Improves dynamic test coverage |
| Upgrade vulnerable dependencies | Removes CVE deductions |