AI Workshop

About

Artificial Intelligence (AI) is already part of our daily lives. For example, you may be assisted by AI when shopping, travelling or finding information online. As digital technologies improve and AI becomes more advanced, AI is likely to play an even bigger part in our daily lives. It is important that we help students understand what AI is, how it is used and give them the opportunity to create their own AI.

This workshop aims to be a practical and accessible introduction to Machine Learning, which is one of the large topics within AI. Machine Learning refers to the way that computers ‘learn’ from data. In this workshop, we will use the free Machine Learning for Kids website, which does not require any prior knowledge of Machine Learning or AI.

In this workshop you will learn:

You can see the schedule for the Workshop on the Sessions page.

Essential Steps

In this workshop we have identified 5 essential steps in the process of developing Machine Learning solutions. There are other lists of steps that other authors have created, such as Yufeng Guo’s 7 Steps of Machine Learning and Aurélien Géron’s Machine Learning Project Checklist, which are similar to the steps we have listed below.

It is important to note that these steps should generally be followed in sequential order. However, you could find that you go back and forward between the steps when creating a Machine Learning solution. For example, you may create an app that tells that you whether a photo contains a cat or a dog. The app may incorrectly recognise that a photos that contains a small dog, when it actually contains a cat. You could fix this problem by retraining the model used by the app with photos of small dogs so that it does not keep making those mistakes.

The essential steps of developing Machine Learning solutions are listed below. You can click the More Details button to show more information about each of the steps. We have included some notes in the details sections that you would probably want to be aware of during each of the steps as well.

1. Identifying a Problem

First, we identify a problem that we can solve with Machine Learning. The problem may not be something that needs fixing, it could be a way of making an improvement on an old system. For example, some shopping centres use a ticket machine when you enter and exit their car parks. Some shopping centres have started using Machine Learning to capture and record cars' licence plates on entry and exit instead and the ticket machines at the entrances have been replaced by cameras.

Note: Machine Learning is not always the best way to solve a problem. For example, weather prediction systems currently work very well without the use of Machine Learning and using Machine Learning would be unlikely to improve those systems.

2. Modelling the Problem

Once we have identified a problem, we need to figure out how to model that problem. Creating a model for a Machine Learning solution involves identifying the relevant factors (or attributes) that impact what we are predicting and deciding what type of Machine Learning algorithm to use (supervised, unsupervised or reinforcement). For example, if we were using Machine Learning to predict people's life expectancy, then we would identify factors that have some relationship with people's life expectancy (for example, whether someone is a smoker). Then we would choose a supervised learning algorithim for predicting a person's life expectancy because we would be using data from health records to train the model.

Note: Modelling a problem can be challenging because real-word problems are complex. However, researching the different factors relevant to the problem can help. The statistician George Box famously said: "All models are wrong, but some are useful". When we model a problem, we do the best to understand what factors are important and explain the particular outcomes we are interested in.

3. Collecting the Data

After we have created a model of the problem, we know what factors are important and the variables in the data that we need to collect. We can collect data with many different approaches, including surveys and sensors (such as cameras or microphones). The cheap cost of storage and processing power allows for a huge amount of data to be collected. In 2012, Facebook shared that they would process more than 500 terabytes of data each day.

Note: Collecting lots of data can help our Machine Learning make accurate classifications and decisions but these data should also be a wide range of examples. It is important to collect a variety of examples when collecting data that include 'unusual' cases. For example, if we are creating a Machine Learning model that identifies whether a photo contains a dog or a cat, we should give the model examples that are photos of small dogs (rather than just big dogs) because the model could mistakenly identify small dogs as cats.

4. Training the Model

We then use the data that we have collected for training our Machine Learning model. Before training the model, we may need to do some data cleaning if the data is 'messy' or some pre-processing to extract attributes from the data. If we are using a supervised learning algorithm, we may have to label the data by sorting the data into different buckets. The approaches for training models are different for unsupervised learning and reinforcement learning algorithms.

Note: When training the model, we may find that the data is not in a suitable format or that there are observations where the data is incomplete and has a negative impact on the model's accuracy. For example, someone responding to a survey may have only put in silly answers and their response could be removed from the training set. During the training step, we do some cleaning to put the data in an appropriate format or to remove observations that are nonsensical. Similarly, if we are training a model with images and there are problems with the images that reduce the accuracy of the model (such as noise), we would find ways to reduce those problems (for example, by applying filtering). It is important to have clean data in a suitable format for training a model because the data we use impacts how accurate the model is.

5. Evaluating the Model

It is important to evaluate the Machine Learning model that we have trained before putting it in 'production' (for example, releasing an app that includes the model). It is common to test the accuracy of model and re-train it by changing or adding more training data. In supervised learning models, the data that has been collected is split up into two sets: a training set and a validation set. We use the training set for training the model and then we run the model on the validation set to test its accuracy. It is possible for models to be tweaked if the results are inaccurate or could be improved. For example, there is an unsupervised learning algorithm called K-Means clustering that splits the data up into groups (clusters) that have similar features. When using that algorithm, you specify the number of clusters that you would like the model to create. You may find that you start off with four clusters but then find that is more useful to increase the number of clusters to six.

Note: Models can always be improved and be updated. It is important to continue to evaluate the models that we have created over time and re-train them if they are not as accurate as they were when originally trained. It is possible for models to become less accurate over time, due to changes in the data being given to the model. For example, a model that identified animals in photos may have worked very well in the past but the model could decrease in accuracy as photos increase in quality.