Entity Resolution

Your company has 2 million customers. Or 1.4 million. Nobody knows.

Kavuka Entity Resolution discovers when different records are the same real-world entity, builds the golden record per entity — with provenance — and connects the graph of relationships. Matching that reasons beyond text, trained on Brazilian data, in weeks, not years.

Run the resolution on a sample See how it works

Beyond text: matching that reasons
Golden record: one per entity, with provenance
Native graph: real relationships connected
Batch + real time: on the same engine

It is the technology behind the Kavuka link graph — proven in production powering the anti-fraud, AML and risk engines, now offered as data infrastructure for any large enterprise.

Data that does not know who is who lies in everything it touches.

The customer across five systems

The same customer registered five times, with five IDs and three spellings. Rule-based matching breaks on every exception and the promised 360 view never arrives.

The third-year MDM

The master data project in its third year without a golden record delivered — rigid schemas, deterministic rules and the invoice running while the base stays duplicated.

The sanctioned party that slipped through a spelling

Aggregate exposure invisible across systems, credit granted three times to the same group and the sanctioned party approved because the name came with one letter changed.

Cost Entity resolution is not a data project — it is the prerequisite for every other one. The customer counted three times, credit granted twice to the same economic group, the sanctioned party that slips in through a spelling and the AI initiative built on the sand of unresolved identity: every decision made on data that does not know who is who is born wrong.

How it works

From dirty sources to a single truth, in one pipeline.

01
Ingest

Sources as they are — disparate schemas, dirty data, legacy systems — ingested once and served to multiple use cases.
02
Resolve

Intelligent matching pairs what rules cannot: spellings, abbreviations, transliterations, incomplete data and even deliberate manipulation.
03
Consolidate

The golden record per real entity: the best values from each source consolidated, with provenance tracked and auditable.
04
Connect and serve

The graph links the relationships — owners, addresses, phones, accounts, devices — and use cases consume without duplicating data.

Coverage

The engine that truly resolves identity

A single platform ingests disparate sources, reasons about the real identity behind the records and returns the golden record and the graph — ready to feed any use case.

Intelligent matching

Pairing that reasons beyond text equality

Golden record

One single record per entity, with provenance

Entity graph

People, companies, addresses, phones and accounts

Flexible ingestion

Disparate schemas and dirty data, once

Dual operation

Batch and real time on the same engine

Brazilian-native engine

Name spellings, CPF/CNPJ anchor, federal ownership data

Aggregate exposure

Total relationship per economic group

Explainable matches

Fields, weights and confidence, reviewable and auditable

Segments

Who resolves identity with Kavuka

Risk

Banks & Insurers

The single customer view and aggregate exposure by economic group — credit summed correctly, only once.

Multi-brand

Groups & Retail

The customer recognized across companies, channels and systems — the real customer count of the group, finally known.

Data

MDM programs

The engine that delivers the golden record the master data initiative promised — in weeks, not years.

Anti-fraud · AI

AML & AI readiness

The graph that links accounts, owners and signals; and the base with resolved identity — the prerequisite of AI models.

Legal shield

Unification as a governance upgrade

Resolving identity is not only data efficiency — it is compliance. Under data-protection law, the data subject gets one record, not five, with the provenance of each datum tracked and every match explainable. Unification is a direct governance improvement.

One subject, one record: unification reduces duplication and improves the data subject's rights (access, correction, deletion).
Explainable matches: every pairing carries the evidence behind it — fields, weights and confidence — reviewable and auditable.
Tracked provenance: the golden record records which source each consolidated best value came from.
Full audit trail of the resolution cycle and of match and merge decisions.
Public or legally permitted sources; encryption in transit and at rest; DPA available for enterprise clients.

Already operating this way

We discovered we had 30% fewer customers than the dashboard claimed — and 30% more clarity to invest where it matters.

Chief Data Officer · multi-brand retail group

The golden record MDM promised us for three years came out in weeks. It was the short path that did not exist.

CTO · financial institution

The graph summed up exposure that was scattered across four systems. The sanctioned party that slipped through spelling no longer does.

Risk Director · insurer

How many customers do you really have?

Run the resolution on a sample of your data and discover the real number — in one meeting, not a three-year project.

For businesses only. No purchase commitment.
Data used solely for commercial contact.
Enterprise leads answered within 1 business day.

What entity resolution is and why it comes before everything

Entity resolution is the technology that discovers when different records refer to the same real-world entity — the same person registered three times with different spellings, the same company across five systems with diverging legal names, the same address written ten ways. The result materializes in two layers: the golden record, the single, complete and trustworthy record of each entity — the holy grail that traditional MDM promised for decades — and the graph, the resolved entities connected by their real relationships: owners, addresses, phones, devices and transactions.

The difference from traditional MDM is structural. MDM relies on rigid schemas and deterministic match rules — it pairs only what is identical — and therefore takes years to ingest and match and breaks on every exception. Modern entity resolution ingests sources as they are, with disparate schemas and dirty data, and pairs them with models that reason: they recognize spellings, abbreviations, transliterations, incomplete data and even deliberate manipulation, finding incremental matches even when quality is poor. The golden record comes out in weeks, not years, with more connections found — and each pairing stays explainable: the fields, weights and confidence behind it remain reviewable and auditable.

One architectural principle defines the category: ingest once, serve multiple use cases without duplication. Instead of each initiative — customer view, risk, AML, anti-fraud, analytics — keeping its own copy of the data and its own match logic, identity is resolved once and served to all. Operation is dual: batch mode resolves the entire base; real-time mode resolves the new record on arrival — the registration that never duplicates again, and the query that arrives already linked. It is the argument the CTO buys, because it eliminates the fragmentation that makes each data project more expensive than the last.

Kavuka Entity Resolution is native to the pathologies of Brazilian data: the variations of Brazilian names, the CPF/CNPJ as a strong identity anchor, addresses in the national standard and the federal ownership registry as a source of relationships — a structural advantage over generic engines. And it is proven in production: it is the infrastructure of the Kavuka platform's own link graph, the engine that sees the facilitator, the account factory and indirect exposure in anti-fraud, AML and risk cases. Resolving identity, therefore, is no longer one data project among others — it is the prerequisite for all of them. Without knowing who is who, the customer count lies, aggregate exposure vanishes, the sanctioned party slips through a spelling and the AI initiative runs on sand. With identity resolved, the company finally knows how many customers it truly has, sees real exposure per economic group, closes the variant-spelling hideout and gains the clean base that AI models require.

FAQ

What is entity resolution?

It is the technology that determines when different records refer to the same real entity — person, company, address, account — even with diverging spellings, incomplete data or manipulation. The result comes in two layers: the golden record (the single trustworthy record of each entity) and the graph (the entities connected by their real relationships).

What is the difference from traditional MDM?

Traditional MDM relies on rigid schemas and deterministic match rules — that is why it takes years and breaks on exceptions. Modern entity resolution ingests sources as they are and pairs them with models that reason: the golden record comes out in weeks, with more connections found.

How do you handle Brazilian data?

The engine is native to local pathologies: Brazilian name variations, CPF/CNPJ as a strong identity anchor, addresses in the national standard and the federal ownership registry as a source of relationships — the structural advantage over generic engines.

Are the matches explainable?

Yes. Every pairing carries the evidence behind it — the fields compared, the weights assigned and the confidence level — reviewable and auditable: the transparency the risk team and data-protection law require.

Does it work in real time?

It is dual: batch mode resolves the entire base at once; real-time mode resolves each new record on arrival — the registration that never duplicates again, and the query that arrives already linked.

Why is entity resolution a prerequisite for AI?

Because AI models learn from the data they receive. If the base counts the same customer three times and does not know who is who, the model learns the lie. With identity resolved, AI runs on a clean base, with each entity represented only once.

How does entity resolution connect to the rest of the Kavuka portfolio?

It is the infrastructure underneath everything: the link graph that powers the anti-fraud, AML and risk engines; the base on which Data Enrichment enriches the golden record; and the Lakehouse foundation where the resolved data lives. Resolving identity first makes every subsequent solution more accurate.

Your next high-impact decision starts with the right data.

Talk to a GUÉP specialist and find where applied intelligence creates the most value in your operation.

Talk to an expert