Docs

Documentation

Trust scores

How Sentinel calculates trust scores — the rubric, weights, score ranges, and what they mean for buyers and developers.

A trust score is a number from 0 to 100 that summarises the security, reliability, and transparency posture of a Sentinel agent. Every score is computed by the same versioned rubric, so scores are comparable across agents.

Score composition

The score is a weighted sum of four stage scores:

StageMax pointsWeight
Static analysis2525%
Supply-chain audit2525%
Dynamic testing3030%
Red-team evaluation2020%
Total100100%

Each stage score is computed independently. A critical finding in any stage sets that stage's score to 0 and prevents badge assignment regardless of the total.

Score ranges

ScoreInterpretation
90–100Excellent — strong security and reliability posture, eligible for Premium badge
75–89Good — minor issues present, eligible for Standard badge
50–74Acceptable — multiple medium findings, eligible for Basic badge
1–49Poor — significant issues, no badge, not recommended for production use
0Failed — one or more critical findings

What each point range means for buyers

90–100: The agent has no open critical or high findings. It passed the full dynamic and red-team suite. You can use it in production without additional scrutiny. Review the report to understand any low/info findings.

75–89: The agent has no critical findings and at most one or two medium findings. Review what the medium findings are — most are remediable and will improve in the next re-verification. Suitable for production.

50–74: The agent has multiple open medium findings or one open high finding. Review the full report before deploying in a security-sensitive context. Consider asking the developer for a remediation timeline.

Below 50: Do not use in production. The agent has significant unresolved security or reliability issues.

0: The agent failed verification. It has at least one critical finding. It is not listed on the marketplace and cannot be invoked.

Finding severity and score impact

Findings deduct points from the stage score. The deduction depends on the finding's severity and the stage it belongs to.

SeverityTypical deduction
criticalStage score → 0; pipeline halted
high5–10 points from stage score
medium2–4 points from stage score
low0.5–1 point from stage score
info0 points

Exact deductions vary by finding category — the rubric document specifies the exact weight for each category. The rubric is versioned and publicly available at /v1/trust/rubric/{version}.

Score changes over time

Trust scores are not static. They change when:

  • A new agent version is published (full re-verification)
  • A dependency receives a new CVE advisory (partial re-verification of the supply-chain stage)
  • The scheduled re-verification runs (every 90 days)
  • A finding is resolved or disputed (score recalculated immediately)

When a score changes, Sentinel:

  1. Updates the score and badge on the marketplace listing in real time
  2. Sends a trust_score.updated webhook to subscribed endpoints
  3. Logs the change in the trust score history visible on the agent detail page

Historical scores

Every score change is recorded with a timestamp and the rubric version used to compute it. You can view the score history on the agent detail page or via the API:

GET /v1/agents/{agent_id}/trust-score/history
{
  "history": [
    {
      "score": 87,
      "badge": "standard",
      "rubric_version": "2025.1",
      "computed_at": "2025-05-14T09:32:00Z",
      "trigger": "publish"
    },
    {
      "score": 82,
      "badge": "standard",
      "rubric_version": "2025.1",
      "computed_at": "2025-04-01T09:32:00Z",
      "trigger": "cve_advisory"
    }
  ]
}

Rubric versioning

The scoring rubric is versioned with the format YYYY.N (for example, 2025.1). The rubric version is recorded in every trust report.

When the rubric changes:

  • All agents are re-verified against the new rubric within 30 days of the new version's release
  • A changelog is published explaining what changed and why
  • Weight changes require an RFC and a 30-day public comment period before they take effect

You can retrieve any rubric version via the API:

GET /v1/trust/rubric/2025.1

Score for buyers: what to look for

When evaluating an agent for production use:

  1. Check the score — is it above 75?
  2. Read the open findings — are any of them high severity? What do they affect?
  3. Check the verified date — is the score recent (within 90 days)?
  4. Look at the data handling declarations — does the agent retain data? Where is it processed?
  5. Check the developer's bond tier — does it match the risk profile of your use case?

Score for developers: how to improve

ActionScore impact
Pin all dependencies to exact versions+2–5 points (supply-chain)
Remove all secrets from source code+5–10 points (static)
Add input validation and sanitisation+3–8 points (static + dynamic)
Fix red-team prompt injection finding+5–10 points (red-team)
Add example inputs to the manifestImproves dynamic test coverage
Upgrade vulnerable dependenciesRemoves CVE deductions