DIFFUSE - Machine Learning Swinburne Home Courses Faculties & Departments International Campuses Research Staff Current Students Search Index Centre closure As part of a broader organisational restructure, data networking research at Swinburne University of Technology has moved from the Centre for Advanced Internet Architecture (CAIA) to the Internet For Things (I4T) Research Lab. Although CAIA no longer exists, this website reflects CAIA's activities and outputs between March 2002 and February 2017, and is being maintained as a service to the broader data networking research community. General CAIA Links CAIA News FAQ Seminars Funding Sources Contact Us Research Published Papers Technical Reports External Talks Tools Current Programs Project Ideas External Impact Capability Statement (pdf) People Staff and Visitors Scientific Advisory Board Expert Consulting Collaboration Jobs Scholarships Undergraduate Internships Other SSEE Home FSET Home DIFFUSE - Machine Learning Overview Architecture Machine Learning Papers and Interim Results Downloads Usage Examples Other Links IntroductionMachine Learning (ML) usually refers to systems performing tasks associated with Artificial Intelligence (AI). Such tasks involve recognition, diagnosis, planning, prediction etc. [Mitchell 1997] defines Machine Learning as follows: "A computer program is said to learn from experiment E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E". [Witten and Frank 2000] state: "things learn when they change their behavior in a way that makes them perform better in the future". Machine Learning can be viewed as general inductive process that automatically builds a model by learning the inherent structure of a dataset depending on the characteristics of the data instances. Over the past decade, Machine Learning has evolved from a field of laboratory demonstrations to a field of significant commercial value. Machine Learning techniques have been very successful in the areas of Data Mining, speech and voice recognition, text recognition, face recognition etc. A good model should be descriptive (describe the training data), predictive (generalise well for unseen test data) and explanatory (provide a plausible description). TerminologyThe following terms are widely used in the field of Machine Learning: Instance: An instance is an example in the training data. An instance is described by a number of attributes. One attribute can be a class label. Attribute/Feature: An attribute is an aspect of an instance (e.g. temperature, humidity). Attributes are often called features in Machine Learning. A special attribute is the class label that defines the class this instance belongs to (required for supervised learning). Classification: A classifier is able to categorise/classify given instances (test data) with respect to pre-determined/learned classification rules. Training/Learning: A classifier learns the classification rules based upon a given set of instances (training data). Please not that some algorithms have no training phase but do all the work when classifying an instance (so called lazy learning algorithms). Clustering: The process of dividing an unlabeled data set (no class information given) into clusters that contain similar instances. Supervised vs. Unsupervised LearningThe basic notion of supervised learning is that of the classifier. A teacher helps to construct a classification model by defining classes and providing positive and negative examples of instances belonging to these classes. The system then determines the commonalities and differences between the various classes in order to generate classification rules for unknown instances. The resulting rules assign a class label to an instance based on the values of the instance’s attributes. In unsupervised learning there are no classes defined a-priori. The algorithm itself divides the instances into different classes and finds descriptions for these classes. This process is often also referred to as clustering. The resulting rules will be a summary of some properties of the instances in the database: which classes are present and what differentiates them. However, this will only be what the algorithm has found. There may be many other ways of dividing the instances into classes and of describing each class. Semi-supervised learning is a combination of supervised and unsupervised learning. In this approach a user defined amount of supervision is imposed on the algorithm. The advantage is that only part of the data must have been classified a-priori, requiring less teaching effort, but the classification accuracy is usually higher than for purely unsupervised approaches. Feature SelectionFeature selection involves selecting a subset of the available feature set to be used in learning and classification. The effectiveness of a system depends on the number and the types of features. It is important to minimise the number features to increase the performance of the learning and future classifications, in terms of processing time and memory required. It is also important to remove irrelevant features to increase the classification accuracy, as some Machine Learning algorithms cannot cope well with irrelevant features. ExampleAn example that is often used to illustrate Machine Learning is to determine whether an unspecified sport can be played based on the current weather condition. Consider the following data (taken from Weka 3: Data Mining Software in Java) that includes the outlook, the temperature, the humidity, whether it is windy or not and the information whether playing is possible (this is the class attribute). Table 1: Weather data Outlook Temperature Humidity Windy Play sunny 85 85 false no sunny 80 90 true no overcast 83 86 false yes rainy 70 96 false yes rainy 68 80 false yes rainy 65 70 true no overcast 64 65 true yes sunny 72 95 false no sunny 69 70 false yes rainy 75 80 false yes sunny 75 70 true yes overcast 72 90 true yes overcast 81 75 false yes rainy 71 91 true no The task for the Machine Learning algorithm is to find the rules inherent in this data set. Applying the C4.5 (named J48 in Weka) tree induction algorithm we get the following decision tree. This decision tree is a perfect classifier for our training data in the table. No data instance is misclassified and therefore the accuracy of the classifier on the training data is 100%. Figure 1: Decision tree for weather data Each node of the tree resembles an attribute/feature. The branches below each node divide the attribute value space using the test associated with each branch. The leaves of the tree contain the class and the number of instances. Tree-based algorithms such as C4.5 already perform some kind of feature selection. The order in which the attributes appear in the tree (from top to bottom) depends on their ability divide to the feature space most quickly into sets of instances that can be accurately mapped to a single class. C4.5 uses a metric called information gain to determine the best sequence of attributes and the optimal splits of the attribute value space. Attributes with low information gain appear in the bottom of the tree or are not part of the tree at all. For instance, the temperature attribute is not present in the above tree because it does not have a strong correlation with the class. The decision tree can be easily translated into the following set of classification rules: if outlook=sunny and humidity<=75 then play=yes if outlook=sunny and humidity>75 then play=no if outlook=overcast then play=yes if outlook=rainy and windy=true then play=no if outlook=rainy and windy=false then play=yes ReferencesThe following references provide nice introductions into the topic of Machine Learning: Ian H. Witten, Eibe Frank, "Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)", Morgan Kaufmann, June 2005. Tom M. Mitchell, “Machine Learning”, McGraw-Hill Education (ISE Editions), December 1997. Nils J. Nilsson, "Introduction to Machine Learning", 1996 C Swinburne | CRICOS number 00111D | Contact Us | Copyright and disclaimer | Privacy; | Feedback | Accessibility Information | - Smaller Font | + Larger Font Last Updated: Tuesday 26-Jul-2011 08:28:34 AEST | Maintained by: Sebastian Zander (szander@swin.edu.au) | Authorised by: Grenville Armitage ( garmitage@swin.edu.au)