ANALYTICS PATH.

Introduction to Python

Python Basics

• Functions, if-statements, loops
• Lists, sets, tuples, dictionaries
• Math, string functions

Working with files in Python

• Iterating over files in Python

Introduction to Numpy

• Creating arrays & matrices
• Intro to linspace

Introduction to Pandas

• Series & dataframes overview
• Filtering & joining dataframes
Machine Learning Overview

How does machine learning work?

Machine learning uses statistical techniques to give computer systems the ability to ‘learn’ rather than being explicitly programmed. By learning from historical inputs. we’re able to achieve far greater accuracy in our predictions & constantly refine the model with new data. Click here to read more.

Types of machine learning

Supervised learning is where we provide the model with the actual outputs from the data. This let’s it build a picture of the data and form links between the historic parameters (or features) that have influenced the output. To put a formula onto supervised learning, it would be as below, where, Y is the predicted output, produced by the model and X is the input data. So, by executing a function against X, we can predict Y. Click here to read more.

What are decision Trees & Random Forests

A decision tree builds a model in the form of a tree structure – almost like a flow chart. In order to calculate the expected outcome, it uses decision points and based on the results of those decisions, it’ll bucket each input. In this article, we’ll talk about classification and regression decision trees, along with random forests.Click here to read more.

Understanding Regression Tables

In the table above, A is the constant (the Y intercept), also known as B0. X is the X multiplier (also known as B1). So in our equation Y = B0 + B1 (X); we can substitute B0 and B1 for the values in the coefficient column of the table.Click here to read more.

Machine Learning Practicals

ML Data Preparation

• Data standardisation and encoding
• High level data exploration

Implementing Linear Regression

• Scikit learn machine learning library
• Seaborn plotting library

Implementing K-Means Clustering

• Scikit learn machine learning library
• Seaborn plotting library

Implementing K-Nearest Neighbors

• Scikit learn machine learning library
• Matplot lib plotting library

Implementing a random forest model

• Pandas get dummies data-prep (mon, tue becomes 1, 2...)
• Feature importances

Combining Pandas & Spark for scale

• Creating spark dataframe with schema definition
• Converting to Pandas & running Python script

Validating and tuning our model

• Cross validation scores
• Hyper parameter tuning
End to End Machine Learning Projects

Churn Prediction Logistic Regression & Random Forest

This data set consists of 100 variables and approx 100 thousand records. This data set contains different variables explaining the attributes of telecom industry and various factors considered important while dealing with customers of telecom industry. The target variable here is churn which explains whether the customer will churn or not. We can use this data set to predict the customers who would churn or who wouldn’t churn depending on various variables available. Click here to read more.