2 Introduction

In the era of large scale data collection we are trying to make meaningful intepretation of data.

There are two ways to meaningfully intepret data and they are

Mechanistic or mathematical modeling based
Descriptive or Data Driven

We are here to discuss the later approach using machine learning (ML) approaches.

2.1 What is machine learning?

We use - computers - more precisely - algorithms to see patterns and learn concepts from data - without being explicitly programmed.

For example

Google ranking web pages
Facebook or Gmail classifying Spams
Biological research projects that we are doing - we use ML approaches to interpret effects of mutations in the noncoding regions.

We are given a set of

Predictors
Features or
Inputs

that we call ‘Explanatory Variables’

and we ask different statistical methods, such as

Linear Regression
Logistic Regression
Neural Networks

to formulate an hypothesis i.e.

Describe associations
Search for patterns
Make predictions

for the Outcome Variables

A bit of a background: ML grew out of AI and Neural Networks

2.2 Aspects of ML

There are two aspects of ML

Unsupervised learning
Supervised learning

Unsupervised learning: When we ask an algorithm to find patterns or structure in the data without any specific outcome variables e.g. clustering. We have little or no idea how the results should look like.

Supervised learning: When we give both input and outcome variables and we ask the algorithm to formulate an hypothesis that closely captures the relationship.

2.3 What actually happened under the hood

The algorithms take a subset of observations called as the training data and tests them on a different subset of data called as the test data.

The error between the prediction of the outcome variable the actual data is evaulated as test error. The objective function of the algorithm is to minimise these test errors by tuning the parameters of the hypothesis.

Models that successfully capture these desired outcomes are further evaluated for Bias and Variance (overfitting and underfitting).

All the above concepts will be discussed in detail in the following lectures.