Syllabus: Machine Learning for Managers
This picture explains the difference between machine learning and classical program Instead of figuring out the underlying mechanism of any phenomena (such as what causes cancer and what makes it show up in X-ray), the machine-learning approach just requires us to label the data (which x-rays came from cancer patients, and which ones came from healthy people). In this picture, this labeling is what is called "Answers". Computer is very good at number-crunching. If we provide sufficient data, it can detect all kind of correlations and patterns (i.e. "Rules"). Those rules can then be used in classical programming way to look at unlabeled data, and make a fairly-accurate prediction.
We'll start with training basic regression models in Excel. Did you know that most of us suck at Excel? This is almost an hour-long video but worth a watch for anybody who spends significant amount of time in Excel.
Let's train our first machine-learning model, using nothing but Excel. Here's a good article that touches only a little bit of math.
That's right! Linear regression is an ML technique! If you'd like to go deeper into it, there are some resources on our Linear Regression topic.
Linear Regression is a simple model: It will only work in situations where the output variable is directly proportional to all input variables separately. No squares, or sines, or even reciprocals. This is why its utility is quite limited. So, if you have y as the output variable that you suspect that it depends on 3 input variables, x1, x2, and x3, the model you are training is y = ax1 + bx2 + c*x3 + d. Excel will help you find the values of a, b, c, and d that represents the straight line that best fits the given data.
As long as the relationship is linear, the same technique will work - even for hundreds, thousands or even millions of variables. You'll just a more powerful machine to do the regression though.
If you've done your first regression model in Excel, you should download RapidMiner Studio
This is a free tool that will let you visually build data pipelines for reading, cleaning, re-structuring data and training plus testing your models. The free version has a limit of 10,000 rows but we're recommending this because our goal is to understand ML, not get lost in the whole programming rabbithole.
You should now try to create a "Decision Tree" model in RapidMiner. They have a catalog of video lessons here.
By the time you reach the Create a model -> Apply a model -> Test a model -> Validate a model, you would have finished your first non-trivial machine learning problem.
This article will give you a very nice high-level, but not vague overview of machine learning.
Once you've completed the Decision Tree model in RapidMiner, you can try the other built-in samples exercises it has which cover a wide range of techniques. MadeWithML has good introductions to many of these topics.
Here are some other interesting datasets to play with.
Now you're ready to get introduced to neural networks - which are all the rage these days. Here's an interactive playground to start with.
Added by: eshnil