Lucene Duplicate Remover


Introduction

Java class to remove duplicate documents present in any Lucene based indexes.

Features:
-Remove duplicate documents by searching on any field specified by the user

Sites Using this class:

1) http://www.edoctors.in/search/ (Nicknamed as 'MedSea')

Changes History

1st june 2006 Version 0.1
	initial Release

Download ldr-0.1.tar.gz (12 Kb) (Link)

Licence & CopyRight

Duplicate Remover is released under the GNU LESSER GENERAL PUBLIC LICENSE (LGPL) a copy of which is included in the zip file.

License

System Requirement

The Minimum Requirements are...

  • jre1.5 and above(link)

  • lucene jar file(link)

  • Remove Duplicate jar File(rem-Dup-1.0.jar)

Installation

-See Readme file

Credits

- Rahul Singh <rahulrahul2007@yahoo.co.in> (Coding, Debugging & Testing)
- Sudipto Sarkar <sudipmail4u@yahoo.co.in> (Coding, Debugging & Testing)
- Vinay Yadav <vinay@vinayras.com> (Concept & Debugging)


Contribute

Please download & test the application. Report bugs, enhance it.
email us your experiences at bugs@vinayras.com


NOTE: Please do NOT send index files as attachement.

Tested With

- JRE 1.5+

-Lucene 1.4.x