Computer Society Webinar - Large scale Entity Resolution: A Practical Blueprint from Noisy Records to Trustworthy Entities
Entity resolution is the backbone of any data platform that aims to present a single, trustworthy view of an organization across noisy, overlapping sources. This talk shares a practical, system-oriented blueprint for companies entity resolution that you can adapt to your stack. We’ll begin with upstream data preparation—standardization, canonicalization, and normalization of names, websites, addresses, and phones—to reduce ambiguity before matching. We’ll then cover signature construction (e.g., relaxed/collapsed variants), blocking to avoid N² explosion, and a match function that combines exact agreement on one core attribute (website, name, or address) with a second fuzzy signal to balance precision and recall. You’ll see how constraints (e.g., unique primary website; unique name+HQ address), attribute scoring/selection, and separation of company vs. location resolution improve quality and explainability. We’ll discuss pre‑merge signals from authoritative linkages, human‑in‑the‑loop controls for edge cases, and governance patterns—provenance (“why this value”), rollovers for stable IDs, and reproducibility. Finally, we’ll outline evaluation and monitoring tactics (drift checks, audits) and deployment considerations for both batch and streaming environments. Attendees leave with a clear set of building blocks to move from noisy inputs to reliable, auditable entities.
Date and Time
Location
Hosts
Registration
-
Add Event to Calendar
Loading virtual attendance info...
Speakers
Rohit Muthyala
Biography:
Rohit Muthyala is a Principal Software Engineer who architects real‑time data and machine learning platforms and leads large‑scale entity resolution initiatives. His work focuses on streaming data processing, data quality and observability (SLIs/SLOs), and privacy‑by‑design. He has authored papers presented at IEEE venues (Big Data, ICDM, ICSC) and mentors engineers on platform engineering and ML‑in‑production. Rohit’s approach emphasizes simple, explainable patterns—data prep, blocking, robust matching, guardrails, and governance—that translate into trustworthy, repeatable outcomes.
Email:
Address:Texas, United States