Skip to main content
Version: v0.25.0 (Latest)

Identities and Identifiers

A party says who or what exists. An identity is how that party is recognised, discovered, and, when it is explicitly permitted, how it signs in. The two are separate layers on purpose, and that separation is what lets one tenant hold contacts that never log in, employees that log in internally, and customers that log in on a public site, without any of it contradicting.

Parties hold identities; identities carry identifiers; identifiers drive discovery and, when permitted, authentication

Three Layers, Decoupled

A party is the directory entry, whether a person, an organization, or a service. An identity belongs to a party, and a party can hold several: a work persona and a separately discovered contact record can both sit on the same person. The identity is the unit that carries a per-identity salt and the bindings that decide where, if anywhere, it may authenticate. An identifier in turn belongs to an identity and names it to the outside world, such as an email address, a phone number, a decentralized identifier, a federated subject, or an issuer URL, and an identity can carry several of these.

The reason the layers are kept apart is that identity is not the same thing as login. Many parties in a real tenant should be discoverable and relatable but never sign in. An imported contact, a supplier you only invoice, an organization you keep on file: each has an identity and identifiers so you can find and connect it, but no authentication capability. Being able to log in is something you grant explicitly, per application, rather than something an identifier carries on its own.

Identifiers Drive Discovery

The first job of an identifier is discovery, which is resolving a real-world value back to the party it belongs to. Given an email address, a phone number, or a decentralized identifier, the platform finds the matching identity and through it the party, so you can attach a verified contact, deduplicate an inbound record, or link a presentation to a subject you already know. None of this requires that the party ever authenticate, which is what makes discovery useful in its own right.

Identity verification is integrated with this discovery step. A verification flow that establishes a real-world value resolves it through the identifier layer to the right identity, records that the value is verified, and attaches it where it belongs. Discovery and verification therefore stand on their own in situations that are purely about knowing who a party is, independent of any future sign-in.

Protected Identifiers

Identifiers are among the most sensitive data in the system, because an email address or phone number is both personal data and a correlation key, so they are never stored in the clear. Each identifier is held as a reversible encrypted value, which an authorised reader can decrypt to display, and where matching is needed it also carries a blind index, a keyed hash you can match against without reading the value. Which protection applies is chosen per identifier type, because different identifiers have different needs.

ModeAt restSearchableProtects againstUsed for
Plaintextvalue and index in the cleartriviallynothinggenuinely public identifiers, such as an issuer URL
Searchable blind indexencrypted value plus a tenant-wide keyed hashyes, by exact matchreading the value at rest, casual scanslogin identifiers such as email, phone, DID
Salted blindedencrypted value plus a per-identity-salted hashno, verify onlyreading the value and correlating one person across rolessensitive identifiers that must never be a login key or a correlation handle

The tension this resolves is real. A tenant-wide index lets you look an email address up in a single step, but it also lets anyone with database access see that two rows share that address. A per-identity salt removes that correlation but breaks global lookup. Rather than pick one answer for everything, the platform decides per identifier type: login identifiers accept bounded, in-tenant equality so they can be looked up, while identifiers that must never correlate are salted so they cannot. This is why the same email address can sit on several identities without a database reader being able to prove they belong to the same person, and why the only way to assert that they do is the explicit, governed same_entity relationship.

Normalization is part of the protection policy rather than a separate step. Each identifier type's resolved policy couples its storage mode with a normalization profile, so a value is canonicalized identically before it is encrypted on a write and before its blind index is computed on a lookup, and the two always agree. The machinery is KMS-backed: the blind index is a keyed HMAC and the reversible value is AES-GCM ciphertext, both produced by the IdentifierProtector in com.sphereon.identity.matching.protection.

What a reader sees depends on the surface. Every surface that turns an identifier back into a claim, OIDC userinfo, the attribute pipeline's identity source, credential claim mapping, resolves it through the ProtectedClaimResolver: a plaintext or reversible value is revealed and emitted, while a non-reversible blinded value is omitted entirely, because the stored lookup column carries the blind index and a blind index must never leak as a claim value. Verified flags derived from the row, such as email_verified, are kept either way. The administrative API works the same way in the other direction: it accepts raw identifier values and protects them server-side, and its read responses expose only the protection mode and the lookup value, never ciphertext, HMAC material, or key references.

Applications Are Services

A login surface, the place a party signs in, is itself a party: a service with a login configuration attached (ApplicationLoginConfig). That configuration says which authentication methods the surface accepts (allowedMethods), which identifier types count as logins there (loginIdentifierTypes), which identity providers may back it (allowedIdpIds), and whether self-registration is allowed (selfRegistration). An OAuth client id or a public host maps to that service party, so the application a user is signing in to is a participant in the model rather than a separate registry.

Resolving the application from a request is direct: the OAuth client_id resolves to the service party carrying it as its OAuth client id, and a tenant can name a fallback application with the oauth2.application.default-id configuration key for requests that arrive without a distinct client id. Application-bound login is the platform default, governed by auth.login.require-application (default true); setting it to false permits credential login without an application, a development convenience rather than a production mode.

An identity authenticates at a service only through an explicit application binding. The binding is the relation between an identity and an application, and it means one thing: this identity may sign in here, using these methods. That is all a binding has to carry. An identity that should sign in to two applications simply has two bindings, and an identity with no binding at all, like a contact-list entry, cannot log in anywhere. A binding can also carry a role label such as employee or customer (its specializationSubtype), for the case where an application wants the login to say which role the person is acting as, but that label is optional and most bindings do not need it. The identity may carry such a subtype of its own as well; when both are set, the binding's value wins for that application. Granting the ability to sign in is therefore an explicit act, scoped to a single application.

Modelling Identities Across Applications

How a person spreads across several applications is a modelling choice, not a fixed rule, and the model supports three shapes side by side. You pick the one that matches how separate the personas really are, and a single tenant can use all three at once.

The first shape is one identity used everywhere. A person has a single identity, with one set of identifiers and one credential, and a binding to each application it may use. They sign in to every surface with the same email and the same password, and the application they reached decides what they can do there. This needs no roles at all; two bindings to two applications is the whole of it. When an application also wants the role, the binding to that application carries the label, and a person can be an employee at one and a customer at another while still being one identity.

The second shape is a separate identity per application. Sometimes the logins should share nothing. A person can hold a distinct identity for each application, each with its own identifiers and its own credential, and each can even be backed by a different identity provider. The work login and the customer login are then genuinely separate accounts on the same person. You can still tie them together for reporting and reasoning with a same_entity link, but neither shares a credential or an identifier with the other.

The third shape is identifiers per role. Between the two sits the case where a person plays distinct roles that each need their own identifiers or credentials. Here you give the person one identity per role, each scoped to its specialization and carrying the identifiers and credential that belong to that role. The employee identity and the student identity are separate logins on one person, split along the roles rather than along the applications.

None of these is more correct than the others. A contact that never logs in has an identity and no binding. An employee who is also a customer might be one identity with two bindings, or two identities, depending only on whether those logins should share a credential. The choice is yours to make per person, and you can change it as a relationship deepens.

Authentication: Fan Out, Then Filter

Login resolves by identifier and application together, never by identifier alone, and this is what makes the headline case work: one email address shared by an employee, a customer, and a contact in the same tenant.

A login email fans out to three candidate identities and is filtered down to the one with an authenticable binding to this application

Given the identifier value, its type, the application that the OAuth client id or host resolves to, and the method, resolution happens in two moves. First it fans out: the platform computes the blind index for the value and finds every identity in the tenant that holds it, so the shared email correctly surfaces all three. Then it filters down: it keeps only the identities that have an active, in-validity binding to this application granting authenticable and allowing the requested method. The contact has no binding and drops out, the customer is bound to a different surface and drops out, and the employee remains.

The result must be exactly one identity. Zero permitted identities is a clean rejection, and a uniqueness guard prevents more than one. Every way a login can fail is rejected with its own machine-readable reason (LoginRejectionReason in com.sphereon.data.store.party.login): an identifier type whose protection policy is not searchable cannot be a login key at all, an unknown application, an identifier type the surface does not accept, a method the surface does not allow, no authenticable identity, and an ambiguous match are each distinct outcomes, so a caller can tell a misconfigured surface from a missing binding.

Only once a single identity is resolved does the method-specific check run against it: a password is verified against that identity's credential, a one-time code or magic link is confirmed through a verification flow, a federated login maps the upstream subject back through the same fan-out, and a wallet presentation resolves the presented decentralized identifier the same way. The authenticated user id is the resolved identity's id; the session and the tokens minted from it identify the identity, not the party or the identifier. The effect is that where a party may log in is a property of its binding, not of its identifier, so you can hand the same person a customer login on your public site and an employee login on your internal application, keep a third record as a pure contact, and each one resolves correctly because the application decides which identity is permitted.

Federated Login and Self-Registration

Federated login follows the same resolution. The application's login configuration lists the identity providers that may back login there, and a login through any other provider is refused before the upstream subject is considered. What happens to a subject the tenant has never seen depends on the surface's selfRegistration flag. On a surface with self-registration enabled, the first federated login materializes the identity, attaches the verified identifiers from the provider, and creates the authenticable binding in the same step, so the user is signed in at the end of their first round trip. On a surface with self-registration disabled, the upstream subject must resolve to an existing identity that already holds a binding to this application; an unknown subject is rejected and nothing is created.

Identity verification feeds the same machinery. A completed verification materializes its verified identifiers as protected correlation identifiers under the tenant's protection policies. When the verification targeted an application, the resolved or created identity is also bound authenticable to that application, with the allowed methods taken from the application's login configuration. A verification with no application in scope is a contact-only materialization: the identity and its identifiers are persisted, but no binding is created, so verification alone never grants the ability to log in.

Password Credentials at Rest

A password credential never stores its login handle in the clear. The username is held as a deterministic HMAC blind index, which is the only lookup key the credential store has, plus a reversible ciphertext kept for administrative and recovery surfaces (UsernameProtection in com.sphereon.identity.auth). The derivation inherits the tenant's normalization for email identifiers but always uses a searchable blind-index mode regardless of the tenant's email policy, because the credential lookup that precedes identity resolution has nothing else to match on. The password itself is hashed with Argon2id in PHC string format, hash comparison is constant-time, and repeated failures drive a time-bound lockout that escalates after repeated lockout cycles.

Where to Next

The parties that hold these identities are covered across Persons and Organizations and Organization Units. The link that lets you assert two identities are the same human, intentionally and auditably, is on the Relationships page. The login configuration and application binding endpoints are on the Party REST API page.