Evaluation of High-End Data Mining Tools for Fraud Detection
Data mining tools are used widely to solve real-world problems in engineering, science, and business. As the number of data mining software vendors increases, how-ever, it has become more challenging to assess which of their rapidly-updated tools are most effective for a given application. Such judgement is particularly useful for the high-end products due to the investment (money and time) required to become proficient in their use.
Reviews by objective testers are very useful in the selection process, but most published to date have provided somewhat limited critiques, and haven’t uncovered the critical benefits and shortcomings which can probably only be discovered after using the tool for an extended period of time on real data. Here, five of the most highly acclaimed data mining tools are so compared on a fraud detection application, with descriptions of their distinctive strengths and weaknesses, and lessons learned by the authors during the process of evaluating the products.
(October 1998) Dean W. Abbott, I. Philip Matkovsky, and John F. Elder IV, from the 1998 IEEE International Conference on Systems, Man, and Cybernetics, San Diego, CA, October 12-14, 1998.
Download Paper:
Abbott, Matkovsky, and Elder(
60KB PDF).
Note: The five tools evaluated were the best five of more than 40 candidate data mining tools, given a specific set of user criteria, and the most current versions of each tool available at the time of the writing of the paper (July, 1998). The five tools were (in alphabetical order of vendor name):
Evaluation of Fourteen Desktop Data Mining Tools
Fourteen desktop data mining tools (or tool modules) were evaluated. The tools employed Decision Trees, Rule Induction, Neural Networks, or Polynomial Networks to solve two binary classification problems, a multi-class classification problem, and a noiseless estimation problem.
Twenty evaluation criteria and a standardized procedure for assessing tool qualities were developed and applied. The traits were collected in five categories: Capability, Learnability/Usability, Interoperability, Flexibility, and Accuracy. Performance in each of these categories was rated on a six-point ordinal scale, to summarize their relative strengths and weaknesses. This information should be useful to analysts selecting data mining tools to employ, as well as to developers aiming to produce better data mining products.
The tools evaluated were (by category):
| Tree |
Rule |
Neural Net |
Poly Net |
- CART
- Scenario
- See5
- S-Plus
|
|
- NeuroShell 2
- PeOLPARS
- PRW
|
- MQ Expert
- NeuroShell 2
- Gnosis
- K'Miner
|
(October 1998) Michel A. King, John F. Elder IV, et al, from the 1998 IEEE International Conference on Systems, Man, and Cybernetics, San Diego, CA, October 12-14, 1998.
Download Paper:
King and Elder (
60KB PDF).
A Comparison of Leading Data Mining Tools
Tools Evaluated
- Clementine
- Darwin
- Data Cruncher
- Enterprise Miner
|
- GainSmarts
- Intelligent Miner
- MineSet
- Model 1
|
- ModelQuest
- PRW
- CART
- NeuroShell
|
- OLPARS
- Scenario
- See5
- S-Plus
- WizWhy
|
(August 1998) John F. Elder IV and Dean W. Abbott, from the Fourth Annual Conference on Knowledge Discovery & Data Mining , New York, New York, August 28, 1998.
Download KDD Presentation (updated 10/19/98):
Elder/Abbott (
300 KB PDF), no screen captures
Elder/Abbott (
3.7 MB PKzip)
Black and White versions for clearer printing:
Elder/Abbott, (
215 KB PDF), no screen captures
Elder/Abbott (
1.8 MB PKzip)
Machine Learning, Neural and Statistical Classification
(Mar. 1996) John F. Elder, A review of Machine Learning, Neural and Statistical Classification (eds. Michie, Spiegelhalter & Taylor; Ellis Horwood, 1994), Journal of the American Statistical Assoc. 91, no. 433: 436-437.
Download Review:
Elder (
125 KB PDF)