05s1: COMP9417 Machine Learning and Data Mining Course Introduction March 3, 2005 Aims As a result of successfully completing this course students will be able to set up well-defined learning problems, apply effective algorithms to such such problems and use the relevant theory to interpret and evaluate the results. By the end of the subject, students should be able to: • set up a well-defined learning problem for a given task • select and define a representation for data to be used as input to a machine learning algorithm • select and define a representation for the model to be output by a machine learning algorithm • compare algorithms according to the properties of their inputs and outputs COMP9417: March 3, 2005 Course Intro: Slide 1 • describe and develop algorithms in terms of the computational methods used • relate different algorithms in terms of similarities and differences in the computational methods used • express key concepts from the foundations of computational and statistical learning theory and demonstrate their applicability • use algorithms in applications to real-world data sets and collect results to enable evaluation and comparison of their performance COMP9417: March 3, 2005 Course Intro: Slide 2 Assumed knowledge/prerequisites Note: change from previous years Old – COMP3411 or COMP9414 Artificial Intelligence New – COMP9024 Data Structures and Algorithms or COMP2011 Data Organisation Waivers granted where applicable. In practical terms, some knowledge of basic statistics and logic will be helpful, but not essential. Ability to program in some language, preferably Java, is assumed. COMP9417: March 3, 2005 Course Intro: Slide 3 Course Web Pages http://www.cse.unsw.edu.au/~cs9417/ COMP9417: March 3, 2005 Course Intro: Slide 4 Staff Staff Name Role Email Extension Mike Bain Lecturer & mike@cse.unsw.edu.au 56935 Course Convenor Plus one or two exciting guest lecturers, to be arranged. COMP9417: March 3, 2005 Course Intro: Slide 5 Syllabus A summary of the topics to be covered. More details will be available on the course web pages as the course progresses. Module 1: Fundamentals of machine learning and data mining. Weeks 1-5. Introduction to and overview of machine learning and data mining. Decision tree learning. Rule learning. Numerical prediction. Instance-based learning. Genetic algorithms. Reinforcement learning. Module 2: Computational and statistical foundations of machine learning. Weeks 6-8. Computational learning theory. Probabilistic foundations and methods. Evaluating hypotheses. Module 3: Advanced Machine Learning Techniques. Weeks 9-14. SVMs and Ensemble methods. Hidden Markov models. Bayes classifiers. Unsupervised learning and Clustering. Logical and Relational Learning. COMP9417: March 3, 2005 Course Intro: Slide 6 Lectures and Labs Lecture timetable Day Time Location Thursday 6-9pm Matthews B Problem with the enrollment system means you had to enrol for laboratories. However, labs will not run every week. Practical work is designed to be done in your own time. But in order to provide help for the assignments there will be labs arranged on an “as needed” basis before the assignments are due. COMP9417: March 3, 2005 Course Intro: Slide 7 Assignments Assignments will involve the process of applying and modifying or implementing machine learning software, using the tools and techniques described in lectures. The first assignment will involve Weka, while the second assignment will be a more open-ended machine learning application project involving implementation of machine learning methods. Assignment Description Due 1 Weka toolkit Week 5 2 Project Week 13 In keeping with requirements of the Academic Board regarding post- graduate courses, post-graduates will complete a version of assignment 2 which will have additional requirements. COMP9417: March 3, 2005 Course Intro: Slide 8 Plagiarism All work submitted for assessment must be your own work. Assignments must be completed individually. We regard copying of assignments, in whole or part, as a very serious offence. We use sophisticated plagiarism detection software to search for unreasonable similarities in submitted work. • Submission of work derived from another person, without their consent, will result in automatic failure for the course with a mark of zero. • Submission of work derived from another person with their knowledge, or jointly written with someone else, will result in zero marks for the submission. • Allowing another student to copy from you will, at the very least, result in a reduction in the mark awarded for your own assignment or lab COMP9417: March 3, 2005 Course Intro: Slide 9 exercises. Do not provide your work to any other person, even people who are not UNSW students. You will be held responsible for the actions of anyone you provide your work to. • Severe or second offences constitute academic misconduct, and will result in automatic failure, or exclusion from the University. COMP9417: March 3, 2005 Course Intro: Slide 10 Exams There will be a one-hour mid-term exam and a two-and-a-half closed-book written exams. The written exams contribute 55% of the overall mark for the course. Exam timetable Exam Date Mid-term Week 7 Final Exam period In keeping with requirements of the Academic Board regarding post- graduate courses, post-graduates will be required to obtain a pass in both exams to pass the course. COMP9417: March 3, 2005 Course Intro: Slide 11 Trial of extended version This year we are trialling an extended version of the course. Learning objectives - to introduce additional wider and deeper coverage of topics in the area, thereby making them available to students who can cover the full course content relatively quickly due to previous exposure to core concepts and other aspects of the material. Extended students will study more and may achieve bonus marks based on that. Bonus marks will be available only in assignments 1 & 2. Students will self-select for the extended version. This is how it will work: • lectures will be accompanied with extra papers or links for downloading • everyone is welcome to download and read these materials • there will be additional questions available for bonus marks in assignment 1 COMP9417: March 3, 2005 Course Intro: Slide 12 • there will be additional project topics available for bonus marks in assignment 2 • these topics will be more demanding, and should be based on the extended reading materials or equivalent alternatives So it is quite simple: extended students are those who nominate to do the extended versions of the assignments ! COMP9417: March 3, 2005 Course Intro: Slide 13 Assessment Assessment Marks assignment 1 15 mid-term exam 15 assignment 2 / project 30 final exam 40 NOTE: course mark is total of component marks ! COMP9417: March 3, 2005 Course Intro: Slide 14 Reference Books Textbook: Machine Learning, Tom Mitchell, (1997), McGraw-Hill Reference books: Data Mining*, Ian Witten and Eibe Frank, (2000), Morgan Kaufmann Classification and Regression Trees, Breiman, Friedman, Olshen and Stone (1984), Kluwer C4:5: programs for Machine Learning, J. R. Quinlan (1993), Morgan Kaufmann Pattern Classification (2nd ed.), Duda, Hart and Stork, (2001), Wiley Elements of Statistical Learning, Hastie, Tibshirani and Friedman, (2001), Springer Pattern Recognition and Neural Networks, Brian Ripley, (1996), Cambridge COMP9417: March 3, 2005 Course Intro: Slide 15 Software * WEKA machine learning toolkit in Java http://www.cs.waikato.ac.nz/ml/weka/ COMP9417: March 3, 2005 Course Intro: Slide 16