๐ค๐ค Imports Creating Duplicate Records
Problem
Records were not matching on First + Last + DOB when importing, but the incoming data were an exact match. The client was puzzled, the Service Desk was puzzled, ReWorkflow was puzzled! These were high-quality imports from campus systems; there was no reason for thousands of records to simply not match. The client was wasting time in Consolidate Records.
Cause
Upload Dataset does not compare First + Last + Birthdate directly during the import process. Instead, it computes hashes on the imported rows and compares them to pre-baked hashes in theย [hash]
. This is much quicker than direct string comparisons.
The problem is that somehow this particular Slate database had roughly 20k missing hashes. We can only speculate that when Technolutions implemented the hashing system a few years ago and initially populated the hashes, the operation failed to complete on this particular database.
You can check for this problem in your database with a suitcase query:ย 4e5a7b48-104c-481d-8857-b5ce1148e9e0:rwf
(Import only the query Persons Missing first_last_birthdate Hash). Theoretically, other hash types could also be missing; checking that is left as an exercise to the reader.
Solution
The solution was simple, if brute-force. Using Upload Dataset, change first names to something else, then change them back immediately. Missing hashes are immediately calculated when the name changes. Changing other components of the hash like Last or Birthdate would likely also work:
- Original: Wyatt
- New: Wyatt_HASH
- Back to Original: Wyatt
Tip: Change the Field Mappings and use Retroactive Refresh to quickly revert the changed names without a second import.
This operation will cause the Updated timestamp on these person records to change and also put them in the rules queue. This may have adverse impacts on SIS integration pipelines that depend on timestamps or update queues. If that's the case, creating a Source Format that does not fire rules may help.
No Comments