Register for the workshop!

The 3-week boot camp of  in-depth, hands-on training

Call us at 1.855.LEARN.AI for more information.

START DATE Monday, March 20th, 2023
DURATION Three weeks
LAB SESSION DAYS Every Monday, Tuesday, Wednesday, and Thursday
LAB SESSION TIMING 10 AM – 5 PM (includes one hour of lunch break) PST
CLINIC/HELP SESSIONS (Optional, if you need help) Sunday morning PST

Workshop Overview

The Spring Session Starts Monday Morning, March 20th, 2023

This workshop is a gentle but in-depth and comprehensive introduction to the field of Machine Learning. It gives equal emphasis on understanding the theoretical foundations and hands-on experience with real-world data analyses on the Google cloud platform. The workshop spans over 100 hours of in-person training, spanning 12 all-day sessions spanning across three weeks. It comprises in-depth lecture/theory sessionsguided labs for building AI models, quizzes, projects on real-world datasets, guided readings of influential research papers, and discussion groups.

The teaching faculty for this workshop comprises the instructor, a supportive staff of teaching assistants, and a workshop coordinator. Together they facilitate learning through close guidance and 1-1 sessions when needed.

You can attend the workshop in person or remotely. State-of-the-art facilities and instructional equipment ensure that the learning experience is invariant of either choice. Of course, you can also mix the two modes: attend the workshop in person when you can, and attend it remotely when you cannot. All sessions are live-streamed, as well as recorded and available on the workshop portal.

NOTE: This comprehensive workshop comprises of, and merges, the previously given two separate data science workshops: ML100 (Introduction to ML), ML200 (Intermediate ML).
Register for the workshop!

Lectures

10 lecture sessions covering the foundational concepts.

Labs

15 guided programming & AI modeling labs.

Quizzes

20 quizzes covering the theory and the lab material.

Projects

10 hands-on AI modeling data projects.

Papers

Weekly guided research paper readings.

Overview of weekly activities

Each week will focus on one outcome: mastering a specific topic in data science. To achieve this outcome, we will cover the relevant theory in simplified but extensive depth. We will follow it with hands-on labs. There will also be a guided reading of an important research paper on the topic.

Finally, we will assess our progress with a quiz that covers the topic, as well as a hands-on data science project, applying what we have learned to various real-world datasets.

To summarize, every day will include

  • 10 AM to 1 PM Theory/lecture session (3 hours)
  •  1 PM to 2 PM Lunch break
  •  2 PM to 5 PM Hands-on Guided Lab (3 hours)
  • SATURDAY, 10 AM Weekly Project released. A hands-on AI project based on the topic of the week, where the teaching assistants will provide technical help and support
  • THURSDAY’s MORNING Quiz will be released online to review the theory and the practical aspects learned.
  • SUNDAY NOON (Optional) Guided reading of an important research paper in the field
  • SUNDAY, 4 PM (Optional) Review of the quiz; help with the data science projects

Capstone Project

Each participant starts on a capstone project that spans the ten weeks duration of the workshop. The teaching staff will work closely in providing you guidance in pursuing the milestones of this project.

A capstone project can be done in groups of no more than 4 participants. The capstone project must fulfill the following criteria:

  • Original Work It must present either a machine learning modeling of a new dataset or present a new model or approach to an existing dataset. ​It may contain the group’s own code implementation of an algorithm or approach suggested in a recent research paper.
  • Blog Optionally, a blog describing the work and the experience of the project.
  • Presentation End of workshop technical presentation to the batch of participants.

Target Audience

This workshop is specifically targeted toward those who are aspiring to be expert data scientists. Data science has rapidly emerged as the new essential literacy. Every sphere of our 21st century is being deeply molded by data science/machine learning.
Therefore, this workshop is deliberately crafted to address a diverse audience. The conceptual foundations would be greatly relevant to managers and aspiring data scientists alike, whereas the hands-on data labs will inculcate in-depth expertise in those wanting to pursue data science as a career.

Theory/Lecture Topics

2021-07-14T04:21:59-07:00

Covariance, Correlation & Causation

We study the covariance between two variables, and its geometrical intuition. Next, we learn about feature standardization, Pearson correlation, and its relationship to linear regression for a single predictor. We also study the phenomenon of regression towards the mean. Correlation does not imply causation: though it is a common fallacy to fall into. We will delve more deeply into this.

2021-07-14T04:20:28-07:00

Regression

We will study linear regression, the concept of least-squares, and gradient-descent to minimize the sum-squared errors. Then we will study the ordinary least squares linear regression, polynomial regression and the Runge phenomenon, nonlinear least-squares, and box-cox transformations. We will also learn about residuals analysis and other model diagnostic techniques, and get introduced to alternate loss functions.

2021-07-14T04:19:30-07:00

Regularization

Regularization reduces overfitting and high variance. We look at an additional penalty term to the regression loss function as a regularization hyperparameter. Additionally, we see a geometric interpretation of this term along with Minkowski distance. Lasso (L1), Ridge (L2), and Elastic-Net regularizations are covered in the data-science lab exercises.

2021-07-14T04:22:59-07:00

Classification

We will study classification and the learning of decision-boundaries in the predictor space. In particular, we will study the Logistic Regression Classifier and the Linear and Quadratic Discriminant Analyses classifiers. We will study the goodness of fit diagnostic measures such as the confusion matrix, precision, recall, accuracy, the ROC curve, and the area under the ROC curve.

2021-07-14T04:24:44-07:00

Clustering

We study three different approaches to clustering of data in the feature space: K-Means clustering and its close variants; selecting the optimal number of clusters through the scree plots Agglomerative (Hierarchical) clustering, dendrogram and various linkage functions Density-based clustering techniques such as dbscan, optics and denclue

2021-07-14T04:26:34-07:00

Dimensionality Reduction

We will study a few techniques for dimensionality reduction. Primarily we will focus on Principal Component Analysis and understand the geometrical interpretation of it. Along with this we will relate it to the Covariance Matrix, and discuss the class of datasets that PCA works best with. We will also cover such simpler approaches as backward and forward selection and Lasso.

2021-07-14T04:29:38-07:00

Art of Feature Engineering

The predictive efficacy of the machine-learning algorithms is greatly amplified by feature engineering from the raw input feature-space. A meticulous process of exploratory data analysis (EDA) in search of clues towards meaningful feature extraction often can transform a simple algorithm which was performing very poorly on the original feature-space into an extraordinarily effective one in the new feature space where these extracted features are present.

2022-09-22T02:08:44-07:00

Approximate & Kernel k-NN

Approximate & Kernel k-NN is a lazy-learning, non-parametric method that proves remarkably effective in certain situations. We will learn about the interesting notions of distances or similarities that underly the search for nearest neighbors. We will learn about the Curse of Dimensionality, its origin, and the methods to deal with it. We will study how the choice of the "k" provides the bias-variance tradeoffs. Finally, we will learn about a rich collection of distance kernels that mitigate the need for hyperparameter "k" tuning.

2021-07-14T04:39:24-07:00

Decision Trees

Decision Tree provides a versatile means to adapt to non-linearities in the feature space. It is employed for classification as well as regression. The strength of decision trees resides in their very intuitive representation as a tree structure in the way it iteratively partitions the feature space. Thus it is valued for its interpretability, which at the same time being remarkably powerful.

2021-07-14T04:41:56-07:00

Ensemble Methods: Bagging, Boosting & Stacking

Among the most powerful concepts to have emerged in machine learning is that of ensemble learning. Instead of using only one learner, and making it as powerful as possible so that it comes up with a good hypothesis, the approach here is different. Instead, a crowd of weak learners is taken, each forming its own hypothesis. These hypotheses are methodically synthesized to generate predictions in a manner that there is much greater overall predictive power.

2021-07-14T04:47:57-07:00

Gradient Boosting Methods

Gradient Boosting is a powerful ensemble method that has proven remarkably effective in recent years. As a result, there is a lot of research and activity in creating continually better performing and more predictive implementations.

2021-07-14T04:54:48-07:00

Support Vector Machines

Support Vector Machines provide a systematic way to linear the decision boundary by transforming the original feature space into another where the various classes are separated by a linear maximal margin classifier.

2021-07-14T04:57:19-07:00

No Free Lunch Theorems

The No Free Lunch theorems -- despite their frivolous sounding name -- are a fundamental guideline that no one algorithm is more "powerful" than any other, on the average. Different algorithms make different underlying assumptions of the ground truth and therefore are well suited to different datasets. On average, they all perform equivalently.

  1. Google Cloud Platform basics for data-science exploratory notebooks
  2. Linear, Polynomial, and Non-Linear Regression, along with power-transforms and basis changes to linearize datasets, Principal Components Regression
  3. Classifiers: Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis
  4. Sampling, Bootstrapping, and Cross-Validation
  5. Regularization: Ridge and Lasso for Regression

6. Clustering: Agglomerative, K-Means family of clusterers, Density-based clusterers (DBScan, Optics, Denclue), and Expectation Maximization

7. Dimensionality Reduction with Principal Components Analysis, and Matrix Factorization, t-SNE and UMAP

8. Geometrical background and intuition behind the major algorithms

9. Bias-Variance Trade-off, model vs data complexity, and hyperparameter tuning through grid-search

10. Getting started with Deep Neural Networks

11. Kernel Methods and Support Vector Machines

12. Ensemble Methods: RandomForest, Decision Tree and XGBoost

12. Interpretability of machine-learning models

13. Hyperparameter optimizations and an introduction to automated machine learning.

Guided Labs and Projects

The Basics

  1. Setting up the AI development environment in the Google cloud (Jupyter notebooks, Colab, Kubernetes)
  2. Introduction to Pandas and Scikit-learn for data manipulation and model diagnostics, respectively
  3. Creating  interactive data-science applications with Streamlit
  4. Data visualization techniques
  5. Kubeflow: Model development life cycle
  6. Models as a service

Google Cloud AI Platform

  1. GKE (Google Kubernetes Engine)
  2. Selecting the right compute-instances and containers for deep learning
  3. Colab and Notebooks in GCP
  4. Going to production in GCP
  5. Recommendations AI (if time permits)

Core Topics

  1. Exploring Numpy and SciPy
  2. Linear regression with Scikit-Learn
  3. Model diagnostics with Scikit-Learn and Yellowbrick
  4. Residual analysis
  5. Power transforms (Box-Cox, etc.)
  6. Polynomial regression
  7. Regularization methods
  8. Classification with Logistic Regression
  9. LDA and DQA
  10. Dimensionality reduction
  11. Clustering algorithms (k-Means, EM, hierarchical, and density-based methods)

Explainable AI

  1. Interpretability of AI models
  2. LIME (Locally Interpretable Model Explanations)
  3. Shapley additive models
  4. Partial dependency plots

Ensemble Methods

  1. Decision trees and pruning
  2. Bagging and Boosting
  3. RandomForest and its variants
  4. Gradient Boosting, XGBoost, CatBoost, etc.

Hyperparameter Optimization

  1. Grid search
  2. Randomized search
  3. Basic introduction to Bayesian optimization

AI Recommendation Systems

(Using Surprise, etc.)

  1. Memory based recommenders
  2. Model based recommenders

Teaching Faculty

Asif Qamar

[Univ. of Illinois at Urbana Champaign (UIUC)]

About the instructor

Background

Over more than two decades, Asif’s career has spanned two parallel tracks, as a deeply technical architect, and as a passionate educator. While he primarily spends his time technically leading research and development efforts, he finds expression for his love of teaching in the workshops he offers, over the weekends. Through this, he aims to mentor and cultivates the next generation of great technical craftsmen.

Educator

He has also been an educator, teaching various subjects in Programming, Machine-Learning, and Physics for the last 29 years. He has taught at the University of California, Berkeley extension, at the Univerity of Illinois, Urbana- Champaign (UIUC), and Syracuse University. Besides this, he has given a large number of workshops, seminars, and talks at technical workplaces.

He has been honored with various excellence in teaching awards, in the universities and technical workplaces.

Teaching assistants

There will be a staff of teaching assistants helping and guiding you with the labs, as well as helping understand the concepts. They will be monitoring the discussion groups; many will be available on campus to answer your questions. You can also reach out to them individually.

Kate Amon

[Univ. of California, Berkeley]
Kunal Lall

Kunal Lall

[Univ. of Illinois, Chicago]

Harini Datla

[Indian Statistical Institute]
Dennis Shen

Dennis Shen

[UC, Santa Barbara]

Shefali Qamar

[UC, Santa Cruz]

Schedule

The workshop starts on Saturday, September 18th, 2021 at 10 AM Pacific Standard Time.

Saturday attendance is essential; Sunday activities are optional.  The schedule for each week:

  • Theory: Saturday Morning, 10 AM to 1 PM
  • Guided Lab: Saturday Afternoon, 2 PM to 6 PM
  • (Optional attendance) Paper reading, quiz review, and project presentations: Sunday noon onwards.

For in personal participation

Venue

SupportVectors Big-Data AI Lab
46540 Fremont Blvd, Suite 506
Fremont, CA 94538

Prerequisites

It would help if you have basic fluency with Python. If you do not have the necessary Python background, you should attend the (free and optional) Python programming sessions at SupportVectors before this workshop starts. We will use Python as the primary programming language, and use R language as optional labs for those who would like to master data science in both languages.

It is not required to have any other programming or mathematical background, though the latter can give you a better appreciation for some of the things you will learn in the workshop.

Financial Aid and Tuition Discounts

Financial aid is available to three students as a work-study program. Reach out to the teaching staff if you are interested, and to see if you qualify.

Participants with Disability: Between 25% to 100% discount, based on the disability.
Living in a developing nation: $500 discount for participants living in any developing nation such as India, Sri Lanka, Bangladesh, China, African nations, etc.
Veterans or currently serving members of the US Military: $500 discount.

In-Person vs Remote Participation

Participants local to Silicon Valley are encouraged to attend in person at SupportVectors so that they can form strong learning teams and participate in the dynamic learning community. However, it is equally possible to attend all the sessions remotely. The workshops are designed in such a manner as to ensure that remote participants are equally a part of the learning experience.
Each session is live over zoom, and very interactive. A group of teaching assistants will monitor the stream of questions from students, and moderate their participation in the live discussions with other students and the professor during the sessions. Each session is also live broadcast over a dedicated private YouTube link, so that participants may choose to participate live on their TV.
Through the dedicated course portal, the participants will find a wealth of educational resources associated with the subject and the workshop, including recordings of all the sessions, the solutions to the labs and quizzes, etc.

What workshop participants have to say…

The instructor is exceptionally well versed in topics and has the best didactic approach of any teacher/instructor I have had in 30+ years of post-graduate studies . . . reminds me of Richard Feynman and his great books on physics. Asif’s geometric approach is profoundly … illuminating and prod to learn more.

Dan Cashman, Researcher in Medicine
…explained from a geometric perspective that is not easily found in books…It is a big facility that can seat about 50 people. Breakfast, lunch, and snacks are provided. I think the greatest part of this (or any other class that Asif offers for that matter) is that Asif makes all complex math behind algorithms look extremely intuitive by going into the geometry… I would recommend this course or any course by Asif to anyone.
Subhash Gaur, Senior Director of Engineering, BI @ Oracle
…offers a very thorough, intensive, and yet remarkably beginner-friendly way for students of varying expertise to study Machine Learning.

Asif Qamar, with his decades of experience in both Machine Learning and mathematics, masterfully teaches difficult and intricate topics in a way that students can easily understand complex real-world applications. Asif reaches this due to his inextinguishable passion for both the field and academia. He has created resources that are excellent for reference far beyond the scope of the class. There is very little doubt in that I would recommend Asif to a friend and will definitely be continuing on to the higher course.

Abel John, Platform Data Engineer @ CrowdStrike

Asif is very good in teaching and explanation with respect to geometry linking to maths was very impressive. I wish I could have his guidance for all Eng. math teachers in India.

Kedar Erande, Engineer @ Bizmatics

Chapterwise, the content covered a lot of spectrum which was good. It gave me a strong background in Machine Learning, making my fundamentals strong.

The facility is amazing hands down.

Note to Instructor: You have so much knowledge about every field that gives me a lot of motivation and inspiration. Machine Learning was covered in good depth. I loved the way of teaching.

I am definitely coming back. Yes, I will recommend it to friends.

Chaitanya Sarda, Engineer @ Google
Excellent facility, unlimited access, very neat and clean and quiet training room, and labs env. Course material: very comprehensive and excellent mix and coverage of theory and labs.

Instructor: very knowledgeable and passionate about teaching and makes sure that one really understands the concept … very good analogies and examples to explain the subject. Overall the best instructor I have ever experienced.

Sudhir Nagesh, Project Manager, Wells-Fargo