This work is to build a machine learning system to categorise one of the UCI digit tasks. You should develop the system on your own from scratch. You should then run a two-fold test, and report your results.
The data is from the University of California at Irvine's Machine Learning Repository. It's the Optical Recognition of Handwritten Digits Data Set. This gives you two data sets, training set and a test set. I've converted them to two data set data set1 and data set2 that should be used by your system.
You should write all of your code. If you use an existing algorithm, you should reference that algorithm in your code and in your report. The code should be written in Java, and should run in eclipse.
You should write a brief (1-2 page) report on your system. This should describe the algorithm you used, and why you chose this algorithm. It should also show the results of a twofold test using the provided data; a brief discussion of data usage would be useful.
Quality of code and algorithm are important for good marks. The code should be well commented and structured. Selection of a good algorithm is also important. Simple algorithms may be effective, but a relatively complex algorithm may get you more points just for effort.
Note for scraping by: the base line reported on UCI website is nearest neighbour using Euclidean distance. You should be able to implement this quite easily (and might want to start with this). This should be enough to pass (10 report, 20 running, 10 code, and 5 results).