Introduction to Python
  • Functions, if-statements, loops
  • Lists, sets, tuples, dictionaries
  • Math, string functions
  • Reading files in Python
  • Iterating over files in Python
  • Creating arrays & matrices
  • Intro to linspace
  • Series & dataframes overview
  • Filtering & joining dataframes
Machine Learning Overview

Machine learning uses statistical techniques to give computer systems the ability to ‘learn’ rather than being explicitly programmed. By learning from historical inputs. we’re able to achieve far greater accuracy in our predictions & constantly refine the model with new data. Click here to read more.

Supervised learning is where we provide the model with the actual outputs from the data. This let’s it build a picture of the data and form links between the historic parameters (or features) that have influenced the output. To put a formula onto supervised learning, it would be as below, where, Y is the predicted output, produced by the model and X is the input data. So, by executing a function against X, we can predict Y. Click here to read more.

A decision tree builds a model in the form of a tree structure – almost like a flow chart. In order to calculate the expected outcome, it uses decision points and based on the results of those decisions, it’ll bucket each input. In this article, we’ll talk about classification and regression decision trees, along with random forests.Click here to read more.

In the table above, A is the constant (the Y intercept), also known as B0. X is the X multiplier (also known as B1). So in our equation Y = B0 + B1 (X); we can substitute B0 and B1 for the values in the coefficient column of the table.Click here to read more.

Machine Learning Practicals
  • Data standardisation and encoding
  • High level data exploration
  • Scikit learn machine learning library
  • Seaborn plotting library
  • Scikit learn machine learning library
  • Seaborn plotting library
  • Scikit learn machine learning library
  • Matplot lib plotting library
  • Pandas get dummies data-prep (mon, tue becomes 1, 2...)
  • Feature importances
  • Creating spark dataframe with schema definition
  • Converting to Pandas & running Python script
  • Cross validation scores
  • Hyper parameter tuning
End to End Machine Learning Projects

This data set consists of 100 variables and approx 100 thousand records. This data set contains different variables explaining the attributes of telecom industry and various factors considered important while dealing with customers of telecom industry. The target variable here is churn which explains whether the customer will churn or not. We can use this data set to predict the customers who would churn or who wouldn’t churn depending on various variables available. Click here to read more.

This template runs through multiple models and tunes them. Click here to read more.