Lukáš Civín, Miroslav Pich, Jaroslav Tykal, Petra Vaníčková, Jiří Vitinger
RNDr. František Mráz, CSc, RNDr. Iveta Mrázová, CSc
Main purpose of this document is to introduce you an application called Knocker. Knocker was developed as an open data mining tool with focus on visualization of selected data mining methods.
Data mining is the process of data analysis connected with searching of relationships inside the data.
Every day, amount of available data rises and correct understanding of its meaning becomes a bigger problem. Data mining utilities, such as Knocker, are suitable solution for this problem.
Comparison of Knocker with other existing data mining applications:
“Data mining is a future”, you can hear from all database worlds. So it is not surprising that there are lots of solutions, systems or applications for data mining available. But most of them (and also the best of them) are commercial and very expensive tools. We can mention probably the best data mining tools – SAS Enterprise Miner, SPSS with Clementine, or progressive Statistica Data Miner and others (e.g. special mining modules for Oracle, IBM DB2, and Microsoft SQL server). But these solutions are not for free. We can find tools for free, but they have usually some disadvantages:
We find the most interesting in our Knocker tool these features:
There are two most important Knocker functionalities:
First, user can implement his/her own data mining method and use this application as a data transporter. User has benefit from this approach because data interface of Knocker application is simple to use and brings many preprogrammed data processing features.
Data are stored in so called versions which can be reused by multiple methods and keep both information about data and the way the version was created. Data mining methods can store their result in versions too. You can also use session mechanism which allows you to work with selected versions in “project manner”. Each session keeps information about its versions and can be saved, reloaded and deleted.
Second, user can use four implemented methods: Decision Trees, Neural Networks, Kohonen Maps and Market Basket Analysis.
All those methods not only execute algorithm but also give opportunity to explore how the algorithm actually works.
Advantage of this project is that all methods have similar control system and user can compare results of one method with another. Data obtained from one method can be reused by others.
This part should work as a portal to other files of documentation.
There is one base application:
User can add data-mining methods via modules (.dll libraries) into the main application. Some modules have already been created:
There are two approaches to Knocker programmer documentation:
Philosophical documentation is a supplement of exported comments that helps reader to orient in a big amount of classes and methods.
Both documentation approaches are compiled together into one html file (index.html). The diagrams try to describe the main features of program parts – at the low-level, there are commented classes.
Knocker solution fulfilled main tasks for
There are still advanced features which could be part of Knocker e.g. more method variations and efficiency optimizations, wider spectrum of possible data inputs. But the core parts have already been implemented.
Knocker is an application that is prepared to be a base for future projects. Lots of universities around the world have their own data mining tools, which are improved for years. Knocker could be the first step to a MFF UK data mining tool.
It was interesting to code data mining methods, but after all, it was even more an entertainment to work with them - mine experiences from data. We tried to work with bigger amounts of data (hundreds of thousand rows) and Knocker made out it without problems. We are satisfied with our work.
Knocker can be successfully used both as a mining tool and as a teaching utility.