Machine Learning Systems
- New to machine learning? Not sure how ML works in production? You're welcome to follow and learn from this graduate-level course.
When we talk about Machine Learning (ML) or Artificial Intelligence (AI), we typically refer to a technique or an algorithm that gives the computer systems the ability to learn and to reason with data. However, there is a lot more to ML/AI than just implementing an algorithm or a technique. In this course, we will learn the fundamental differences between ML/AI as a technique versus ML/AI as a system in production. A ML/AI system involves a significant number of components and it is important that they remain responsive in the face of failure and changes in load. This course covers several strategies to keep ML/AI systems responsive, resilient, and elastic. ML/AI systems are different than other computer systems when it comes to building, testing, deploying, delivering, and evolving. ML/AI systems also have unique challenges when we need to change the architecture or behavior of the system. Therefore, it is essential to learn how to deal with such unique challenges that only may happen when building real-world production-ready ML/AI systems (e.g., performance issues, memory leaking, communication issues, multi-GPU issues, etc). The focus of this course will be primarily on deep learning systems, but the principles will remain similar across all ML/AI systems.