1 Introduction

Main purpose of this document is to introduce you an application called Knocker. Knocker was developed as an open data mining tool with focus on visualization of selected data mining methods.

Data mining is the process of data analysis connected with searching of relationships inside the data.

Every day, amount of available data rises and correct understanding of its meaning becomes a bigger problem. Data mining utilities, such as Knocker, are suitable solution for this problem.

Comparison of Knocker with other existing data mining applications:

“Data mining is a future”, you can hear from all database worlds. So it is not surprising that there are lots of solutions, systems or applications for data mining available. But most of them (and also the best of them) are commercial and very expensive tools. We can mention probably the best data mining tools – SAS Enterprise Miner, SPSS with Clementine, or progressive Statistica Data Miner and others (e.g. special mining modules for Oracle, IBM DB2, and Microsoft SQL server). But these solutions are not for free. We can find tools for free, but they have usually some disadvantages:

WEKA – very famous and big solution written in JAVA needs a special data format and do not communicate other way – it is very difficult to work with the experiences and findings more. Sometimes, it isn’t very friendly, because it crashes for the lack of memory and it hasn’t free documentation – difficult to use for beginners
Lisp-Miner – Czech solution, only GUHA method is implemented and it is only one part of data mining.
Tanagra – very similar to Knocker (was released after the start of Knocker project). Has a good base in previous “light” version of mining application Spinika. It also needs its special format of the file.

We find the most interesting in our Knocker tool these features:

Simple visualization of basic methods, especially for students of data mining.
Data mining is usually done on huge amounts of data, and the data should be in a database. Knocker does not use any special format of data; it works with databases through ODBC drivers or works with common csv files. So it is easy to apply all methods to any data.
Result findings/data can be saved into the database.
Documentation is accessible for free – both user and programmers documentation
Everybody can make his own model and connect it to Knocker main application, so it is not locked application, but it is prepared to simple adding new modules

There are two most important Knocker functionalities:

First, user can implement his/her own data mining method and use this application as a data transporter. User has benefit from this approach because data interface of Knocker application is simple to use and brings many preprogrammed data processing features.

Data are stored in so called versions which can be reused by multiple methods and keep both information about data and the way the version was created. Data mining methods can store their result in versions too. You can also use session mechanism which allows you to work with selected versions in “project manner”. Each session keeps information about its versions and can be saved, reloaded and deleted.
Second, user can use four implemented methods: Decision Trees, Neural Networks, Kohonen Maps and Market Basket Analysis.

All those methods not only execute algorithm but also give opportunity to explore how the algorithm actually works.

Advantage of this project is that all methods have similar control system and user can compare results of one method with another. Data obtained from one method can be reused by others.

2 Documentation

This part should work as a portal to other files of documentation.

2.1 User documentation

There is one base application:

Knocker.exe - user documentation

User can add data-mining methods via modules (.dll libraries) into the main application. Some modules have already been created:

Market Basket Analysis (MBA.dll) - user documentation
Decision trees (DecisionTree.dll) – user documentation
Self-Organizing Maps (SOM.dll) - user documentation
Back-Propagation network / multi-layer perceptron (DM_BPnetwork.dll) – user documentation
Simple transformations (SimpleTransforms.dll) – user documentation

2.2 Programmer documentation

There are two approaches to Knocker programmer documentation:

export comments from C# code – leading to a Class diagrams with variables and methods with comments
philosophical documentation – describing logical processes and data structures

Philosophical documentation is a supplement of exported comments that helps reader to orient in a big amount of classes and methods.

Both documentation approaches are compiled together into one html file (index.html). The diagrams try to describe the main features of program parts – at the low-level, there are commented classes.

3 Conclusion

Knocker solution fulfilled main tasks for

Modularity
Possibility of simple application expansion
Universal access to the different data types
Implementation and presentation of basic data mining methods

There are still advanced features which could be part of Knocker e.g. more method variations and efficiency optimizations, wider spectrum of possible data inputs. But the core parts have already been implemented.

Knocker is an application that is prepared to be a base for future projects. Lots of universities around the world have their own data mining tools, which are improved for years. Knocker could be the first step to a MFF UK data mining tool.

It was interesting to code data mining methods, but after all, it was even more an entertainment to work with them - mine experiences from data. We tried to work with bigger amounts of data (hundreds of thousand rows) and Knocker made out it without problems. We are satisfied with our work.

Knocker can be successfully used both as a mining tool and as a teaching utility.

Knocker

…a data-mining tool for selected Data Mining Methods visualization

1 Introduction

2 Documentation

2.1 User documentation

2.2 Programmer documentation

3 Conclusion