Benn, As Barry noted, identifying a record in one system as
belonging to the same physical person as a record in a different
system is difficult and often relies on heuristics and best guesses.
In my experience, SSNs may be mistyped or "borrowed," names may be
misspelled, changed (e.g. by marriage), or otherwise represented
differently, addresses change, etc. and databases may contain few
overlapping elements. We were lucky to see an error rate of less
My strong advice is to create a Person Registry as the authoritative
source of basic personal information about every individual of any
relevance to your campus. If a physical person is not in the
registry, s/he must be entered there before appearing in any other
system. Entry can be made by designated staff in any department or
functional unit. Ideally, any system developed in this century would
simply link to this registry for information rather than creating a
copy for itself but legacy apps could receive periodic downloads from
In this model, the only need for matching is when a person shows up
and claims to be in the registry already, or when a cursory search
produces one or more possible hits. In that case, a human can
interact with the person in question to disambiguate the reference.
Of course, it helps if primary documents and images have been
recorded for each person in the first place.
At 3:10 PM -0400 on 6/8/06, Benn Oshrin wrote:
>One of the issues we're about to (re)examine here is matching people
>who come from multiple sources.
>A typical case is a student who we already know about from the
>student system gets hired as a casual or work study employee, and we
>want to make sure their information from the personnel system gets
>attached to their existing identity rather than have a new (second)
>identity created for them.
>We already perform weak matching that catches most cases, but we are
>looking to signficantly improve our handling of these individuals.
>We would prefer to get a better idea of what others are doing before
>we make our plans, and so I'd like to throw out a few questions to
>1. Did you write your own matching algorithms or do you use a
> vendor solution?
>2. If you wrote your own, what criteria do you match on?
>3. If you use a vendor solution
> a. Which one do you use?
> b. Is it a full vendor implementation, or do you just call
> hooks from your own existing applications?
> c. Who implemented your solution? (vendor, consultants, staff,
>4. What has been your success rate with your implementation?
>5. What are your procedures for handling close/multiple potential
>6. What are your procedures for recovering incorrect matches?
>7. If not otherwise covered above, what are the interfaces to your
> system? (manual data entry via web, batch feeds, real time api,
>I will summarize off-list replies.