Duplicate Checker

Duplicate Checker

Instantly find duplicate records by cross checking all structured records.

Background of duplicate checker

When importing data from APIs or batch uploading data, how do you check if the record already exist? Probably most developers who are familiar with relational databases will say "SELECT * FROM ... WHERE ..." and yes it may work only if you have small data-set with simple relational model.

At DUGAA we had a serious challenge, incoming data was at 1000 objects per import and we had to check if the duplicates exist in databases that has 10 million + records and most importantly we could not lock the tables or halt the live system while the queries are performed.
Now here comes the extra layer of complication each object contained 10 to 30 sub-components and we had to check these sub-components against all the subcompacts in the databases. You never know when the human error will strike until you find out serial number of a component is duplicated into a Batch ID of completely different component, and that's where we created DUGAA Reflex.

With DUGAA Reflex our system not only spots the smallest duplicates in the system within milliseconds, but also generates a fully human readable report which advises the administrators how to fix the data, plus a direct link to where the duplicate has occurred. All happens in real-time, non-blocking operation on the live data.

Some people refer to it as DUGAA Reflex others may call it magic, both may be right.

At DUGAA, duplicate check is part of our core features. All our systems out of the box are equipped with duplicate checker; because data quality matters to us.

When you need to check duplicates, just click the button "Check Duplicates". The response is instant and human readable.

When the data is synced automatically, duplicate check will happen automatically under the hood with no human interaction. If the data required manual correction the report will be available for the administrator.

Why duplicate checker is important?

It's all about data-constancy and data-accuracy. If there are duplicate records or the debris of same-data is scattered across multiple records in your databases, inevitability the system will run into bugs. These bugs can be very difficult to pin-point but it can cause business operational damage.

Our experience shows many ERP, MES systems they do not care about duplicates and they leave all the responsibility to the user; Other enterprise level systems might offer duplicate check which normally requires license upgrade and contacting third party consultants. After all even if they do support duplicate check, the process of querying the databases will take far too long to make it a viable solution for real-time applications.