Machine Learning and AI Curriculum

I am nearly finished with my coursework in machine learning and artificial intelligence. As I wrap up, I thought I would recommend a curriculum for others that are just beginning a similar journey - incorporating the benefit of hindsight. My guiding philosophy is to build strong fundamental understanding. This leads to intuition, and ultimately the ability to creatively solve new problems in multiple areas.

I completed about 25 MOOCs. The courses listed below are the best of the best. I left off survey courses that that didn’t promote deep understanding. Also, there are a lot of duplicate courses out there because this is a hot area right now. While it can be useful seeing the same material presented from different perspectives, I chose the best presentation below.

I’d recommend taking the courses in the order I’ve listed them. In some cases, the order is to avoid struggles due to missing prerequisites. In other cases, the ordering will give you a better perspective on what is to come. I also avoided front-loading all the math courses - it is important to have variety.

Important Note: It takes a high level of motivation and discipline to learn this material. I worked many extra problems with pencil and paper for a fuller understanding. So, dig in when you don’t understand something. Don’t guess and check the multiple choice problems and move on. Your goal is not to earn a bunch of MOOC certificates, your goal is to learn these subjects.

Machine Learning (Stanford / Coursera)

This is the famous Andrew Ng course. In fact, if you’re reading this, it is quite likely you’ve already completed this one. This is an excellent introduction to Machine Learning, but it is not a university level course like Stanford’s on-campus CS229. Nevertheless, I still think this course is a great place to start.

The lectures are fantastic with plenty of practical advice and a bit of theory. The programming assignment all use MATLAB or Octave. MATLAB tutorials are provided at the start, with no prior experience required.

In the assignments, you build the classic machine learning algorithms almost from scratch using basic matrix operations. The course covers all the greatest hits: linear regression, logistic regression, gradient decent, regularization, neural networks, support vector machines, bias vs. variance and so on.

This course has no serious math or programming prerequisites. You will leave with incredible practical advice on applying machine learning. And you will gain skills that you can apply right away in your current job or studies. This course lays a great foundation, but you will need to take a more intense machine learning course later in the curriculum.

Computer Science and Programming Using Python (MIT / EdX)

Python is the language for AI and Machine Learning. This isn’t a Python course, but you will learn Python well enough to succeed in the upcoming courses. The course develops a basic foundation in algorithmic thinking. You will finish with a solid understanding of object-oriented programming.

This is a high-quality MOOC. It uses the EdX platform beautifully. The lectures are excellent, and the programming assignments drive understanding. After taking this course you will be ahead of many of your classmates in the MOOCs to come.

Calculus 1A: Differentiation, and 1B: Integration (MIT / EdX)

If you are just a couple years out of college, you can probably skip these two courses. If not, and your calculus is rusty, you will need to take these. Otherwise, you will struggle with parts of probability, and other courses that follow. Several courses take for granted a level of mathematical maturity. So, getting comfortable manipulating equations again will pay off. These two courses will sharpen the skills you will need.

Like most of the MIT EdX courses, these two are excellent. Each course is 1/3rd of MIT’s first-semester calculus. These are original MIT lectures that have been sliced into the EdX format. Professor David Jerison is an engaging lecturer. I enjoy learning from someone so at ease with the material, who is teaching only with their voice and a piece of chalk.

Introduction to Probability (MIT / EdX)

Until the 1980s, Artificial Intelligence was hampered by its reliance on boolean rule-based systems. There have been many breakthroughs in AI, but the introduction of probabilistic reasoning may be one of the most important. This article on Professor Judea Pearl’s work and his ACM Turing Award does a nice job explaining the importance of introducing uncertainty to AI.

You are going to see a lot of probability in the coming courses. It takes time, but you will need to become very comfortable with random variables, variance, expectations, Bayesian inference, Markov chains and so on. This one is not easy, but it will pay big dividends. John Tsitsiklis’ lectures are incredible and you should own a copy of the text book.

Artificial Intelligence (UC Berkeley CS188x)

Not long after completing this course, I decided to change careers and focus on machine learning and AI. Suffice to say, I thought it was incredible. I have linked to the version that was taught by Professor Dan Klein and Professor Pieter Abbeel. Although this course was once offered on the EdX platform, it is now only available as self-study. While the deadlines and structure of a MOOC are nice, if you made it this far, you have the discipline required to complete this course.

I remember watching in awe the result of wrapping a reinforcement learning engine around a simulated one-armed robot. This robot flopped its arm around and, every once in a while, it got lucky and dragged itself forward. The robot earns a reward when it stumbled forward. It was like watching a child learn how to crawl. Eventually, this block robot, with a single hinged arm, was dragging itself almost gracefully across my computer screen.

The true highlight of the course are the Pacman-themed programming assignments. As the course goes on you endow your Pacman agent with increasing levels of intelligence. These labs are tough but rewarding. I usually lost track of time while completing them, often staying up late into the night.


Linear algebra is an essential tool for machine learning. Again, this is not a MOOC with deadlines, discussion forums and quizzes every 10 minutes. You need to watch the lectures, do the same problem sets as the MIT undergraduates, and take the exams. But the resources are available to make this very doable. Professor Strang’s lectures are great - I enjoyed them more and more as the course progressed. And Strang’s text book is excellent and fully worked solutions are available for all the problems.

No longer will you think of matrix multiplication as a series of repeated dot-products. You will start to think in the row- and column-space. You will understand the beauty and importance of eigenvalues and eigenvectors. Projections and their relationship to linear regression will make perfect sense. The course closes with a beautiful treatment of singular value decomposition (SVD).

Algorithms (Stanford / Coursera)

To bring your ideas to life, you need a solid understanding of Algorithms. I chose the Stanford course taught by Professor Tim Roughgarden and I was pleased with my choice. The professor had a contagious enthusiasm for the subject. The course was rigorous, with most important algorithms proved for correctness and performance (running time and space) in lecture. The programming assignments were well thought out and the Coursera forums were fairly active in the offering I took.

By the end, you will have a solid ability to implement most of the algorithm “Greatest Hits” (a phrase Professor Roughgarden used frequently).

* I also completed the first half of MIT’s 6.006 algorithms course on OCW. I thought this was very good too, with terrific lectures and thoughtful assignments. So, 6.006 is a good alternative if the Stanford course doesn’t suit you.

Learning from Data (CalTech / EdX)

Back to machine learning. I mentioned you would eventually need to take a more rigorous, theoretical course on machine learning. This course the same as the on-campus CalTech course taught by Professor Abu-Mostafa. From the first lecture, it is obvious how much care and thought the professor has put into choosing exactly what to teach and how to teach it.

The course begins, appropriately, by answering the question “Is Learning Feasible?” This question is fundamental to what we are trying to accomplish with machine learning. It is important at some point to address this question head-on. Then, to quote from the course text, “From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know.” The course doesn’t try to cover every learning algorithm or recipe. But rather Professor Abu-Mostafa has carefully chosen what to teach with a clear purpose. When the course ends you are prepared to go off in many directions with a solid foundation.

The course does an in-depth treatment of linear and logistic regression, support vector machines (SVM), neural networks, and clustering. Take the time to read the e-Chapters from the book, they are all very important. Complete the exercises, and take the time to understand the EM algorithm derivation.

* I will mention a 2nd option. Professor Ng’s full CS229 course is available through Stanford Engineering Everywhere. This includes all the assignments and actual on-campus lectures. I have no doubt this course is excellent.

Information Theory, Inference and Learning Algorithms (University of Cambridge)

This course by Professor David MacKay is a gem. I was saddened to find out that Professor MacKay has passed away - much too early. The course consists of the recorded lectures and the accompanying text. It is an advanced course and builds nicely on everything you have learned so far. With this course, you begin to set yourself apart from the crowd.

Professor Mackay takes a unique and inspired approach to teaching machine learning. He begins with a review of probability, entropy, inference and information theory. Chapter 4 closes with a beautiful treatment of Shannon’s Source Coding Theorem. Work every problem presented by the cartoon rat! This will cement several topics you have learned so far.

After Chapter 4, you can proceed directly to Chapters 19-46. These are short, beautifully written chapters. Each is only 12-14 pages including the exercises and solutions. Do all the exercises that have solutions. Note: Not all of these chapters have corresponding lecture videos.

Here are a few things you will understand deeply after completing this course:

  • Bayesian model comparison and the Occam factor
  • Variational methods
  • Monte Carlo methods:
    • Metropolis method, Gibbs sampling, Hamiltonian Monte Carlo, Overrelaxation
  • Hopfield Networks and Boltzmann Machines
  • Addressing high-dimensionality

The return on your effort in this course is very high.

Neural Networks for Machine Learning (University of Toronto / Coursera)

This is an advanced course on Neural Networks taught by Professor Geoff Hinton. There are only a few programming assignments, and by now these will be very easy for you. So, to get the most out of this course you will need to invent a few side projects. Here are some ideas:

  • Implement a Restricted Boltzmann machine
  • Implement a RNN
    • With both gated (e.g. LSTM or GRU) and non-gated cells
  • Experiment with dropout
  • Build a deep net and pre-train layers

I found the book Deep Learning to be an excellent companion to this course. I recommend that you read this book cover-to-cover. Also, read the papers that Professor Hinton has attached to the course material.

Probabilistic Graphical Models (Stanford / Coursera)

This is the only course I was uncertain about recommending. It is an important topic, and this is one of only a few graduate-level MOOCs available. This particular course textbook is difficult to read and the lectures can leave gaps in understanding. There are very few students participating in the forums and no official course TAs or mentors.

All that said, I am very glad I completed it. It forced me to better understand several topics: Markov Networks, variational methods, EM algorithm, and energy-based probability models. You should be better prepared for this course than I was: you will have completed David MacKay’s course.

To make the course worthwhile, you must take it with the “Honors” option. The honors programming assignments are critical to learning the material. They are challenging, with at least one taking me 15 hours to complete.

Your Foundation is Built. What Next?

At this point, you have built the foundation you need to head off in many different directions and excel. From here, you may want to choose a specialization and take a couple of additional courses. Are you more interested in machine learning or AI? The distinction between the two being fuzzy sometimes. Here are some suggestions:

Most importantly, put your knowledge into practice. This is where real learning takes place: solving problems where a professor hasn’t carefully planned your path. Better yet, find a job where you can work with experts in the field. While you are looking for a job, do some challenging projects to highlight your abilities. Document your work and post your code.

This surely seems intimidating, but make forward progress each day and you’ll be there before you know it. I hope you enjoy your journey as much as I have enjoyed mine.