Stats & Math path.

Module: Statistics

When you’re conducting any kind of analysis, you need data. That data could be the results of a survey, for example. If we imagine that we are surveying every student in a UK university, you would need to distribute, collect and analyse 15,000+ surveys to students. It’s very time consuming, costly and quite impractical. Click here to read more.

First, we need to state our hypothesis. For example, we may say ‘Air Pollution Causes Asthma in kids in urban settings’. This is the hypothesis that we wish to prove correct or incorrect as part of our statistical study. Click here to read more.

In this article, we’re going to cover the basic measures of central tendency. You’re probably familiar with mean, median and mode already, but it is work re-visiting them before continuing through our statistics articles. Click here to read more.

Variance is a key statistical topic. It lets us know how close to the mean the data is clustered. Consider, if 2 classes have a mean exam score of 40%; does that mean that all the students in both classes performed equally as well/badly? Click here to read more.

Percentiles are 100 equal groups into which a population can be divided according to the distribution of values. A percentile can be between 1 and 99 – whatever number you pick, X% should fall below that number. For example, if you’re in the 60th percentile, you should be greater than 60% of all other observations. Click here to read more.

Tech target defines correlation as: “a statistical measure that indicates the extent to which two or more variables fluctuate together”. There are two major components to correlation: strength and type of correlation.Click here to read more.

Linear regression provides a rule which enables us to make predictions of Y based on the X. Effectively, it fits a line of best fit to a scatter plot, where the sum of the squares (the space between the datapoints & the line is least). Click here to read more.

A normal distribution looks a little bit like the below. It’s where the mean, median and mode are on top of one another. Click here to read more.

We’ve discussed normal distributions in previous articles & today, we’re going to talk about standard normal distributions. By standardising distributions, we can compare two different distributions directly. Remember, a normal distribution has no skew; is symmetrical and has the mean at the highest point. Click here to read more.

When we look at a standard normal distribution, we see something like the below. Here, we have the mean at the highest point, the mean, median and mode on top of one another and we have our z-scores along the bottom. Having a standard normal distribution helps us to compare two different normal distributions, which have different means & standard deviations.Click here to read more.

A confidence intervals defines a range of values that we’re fairly sure our true value lies in. Confidence intervals enable us to mitigate the impact of sampling error (where the sample mean is not equal to the true mean and each sample mean is different). Click here to read more.

Hypothesis testing enables us to validate or test whether a given hypothesis is true. For example, let’s say that we have a engine component designed for a Formula One car. Click here to read more.

Module: Linear Algebra

A vector is effectively a single column table (an array). When we use it in data science, that column will store a row of data in a traditional table. For example, all the details about one customer. We would use this notation to show that X is a vector:Click here to read more.

Vectors have two defining qualities. Length & direction. We can use vectors to describe almost anything. For example if I were to kick a soccer ball; the length of the arrow would show the magnitude (the distance I kicked it or how hard) and the direction in which I kicked it. We can also use it to describe natural phenomena too such as wind. Wind has a strength/speed (magnitude) and a direction.Click here to read more.

Unit vectors are used to express other vectors by combining the magnitudes of those vectors with i and j. In our case, we have a vector of (2,3) , so our vector can be defined as 2i + 3j.

Click here to read more.

In the below, we have our vector (2,1). As we know, if we multiply that vector by a scalar unit, it will change the magnitude (length) of the vector, but not the direction. This is great, but, we need to create a ‘set’ which can define the entire line that would be created by multiplying our vector by many numbers. Click here to read more.