Recently with the growth of data in organizations, the linking of different data sources together to discover valuable information has gain an enormous interest in the society. Use of data mining methodologies on large collections of data helps the organizations to discover valuable information but introduce a critical and vital concern of preserving the privacy of personal data. So many researches have been carried out searching for an improved mechanisms in linking data sources with privacy preserved but still it is problematic that can they provide the required level of accuracy and the quality which makes the process more effective.
Linking of records in different data sources is a challenging task due to the differences in the database schemas which does not provide an unique identifies for individual records that needs to be matched. Therefore the linking of records generally rely on the availability of the common attributes in different databases. This created three major challenges which need through of
- Linkage quality
- Scalability
- Privacy and confidentiality
As the first step in the record linkage process, data pre-processing will improve the quality of the linking process by removing noisy, incomplete and inconsistent data or transforming them to well defined consistent forms. Indexing forms the second step of the process which assists the linkage of records by removing the record pairs which is unlikely to be matched which in turn will reduce the number of comparisons that will occur.