The recent Federal Government initiatives in high performance computing saw the establishment of the Australian Partnership for Advanced Computing (APAC) and its Victorian arm (VPAC). Monash University is founding member of and a substantial contributor to VPAC. APAC/VPAC aims not only to provide hardware support but also educational and training to the scientific community, and to raise the profile of high performance computing within the Australian business community.Monash now proposes to position itself as a major provider of education and training for the APAC project and the wider scientific and technical community in Victoria. This proposed unit provides an integral part of a Graduate Certificate in Computational Science to be offered jointly by the Faculty of Science and the Faculty of Computing and Information Technology.The current and increasing prevalence of high-speed computers and large data-sets has seen the important fields of machine learning, statistics and econometrics gradually creating common field of study in computational sciences. This common area is fashionably known to many as data mining.This units builds on CSE5310, which introduces the student to the advanced computing, parallel programming paradigms and the associated programming tools. Based on those tools, this unit similarly provides an introduction to statistical and probabilistic methods to mine information from very large data sets and databases.The unit covers the following major areas of data mining and associated statistical methods:
- Bayesian Nets and Causal Nets,
- Clustering Methods (using for example Snob),
- Decision Trees (using for example C5 and DtreeProg),
- Support Vector Machines (using for example SVM-light), and
- Neural Networks (using for example Matlab)
Evaluation will be based on- Artificial and real-world data
- Training and test data
- "Right"/"wrong" prediction and probabilistic prediction
- Kullback-Leibler distance