Irudiko is a library written in C++ which implementates Locality-Sensitive Hashing techniques i norder to generate sketches from any kind of textual documents, notably HTML page. It also allows to optimize a document according to the language in which it was written (currently it supports English and Italian).
The name derives from a Basque word, "Irudiko" in fact, which means "similar". It was first written as a project for a Master's course in Web Mining at the University of Pisa, and then expanded for a subsequent dissertation work.