


WHAT WE DO

Objectives
Provide the student with the correct intuition behind data science problems and some of the algorithms to solve them, including:
The geometric interpretation
Both theoretical and practical limitations
Comparison with other algorithms
Provide the student with the necessary language to translate fluently:
The problems of Catho science to the mathematical language used in machine learning.
The algorithms exposed in the literature -either in scientific articles or textbooks- to the specific problems.
Syllabus
Block one is focused on two main objectives:
Using three algorithms (perceptron, linear regressions and logistic regressions) invite the student to the methods and language of Data Science.
Make an accurate diagnosis of the student in order to offer a better planned program for the rest of the blocks.
1. Perceptron (Classification)
Statement of a binary classification problem.
Stages of a learning problem.
Geometric interpretation of linear classification
Algebraic formulation of linear classification
2. Linear regressions (Forecasting)
Statement of a regression problem
Linear regressions
Correlation
Exact solution and matrix algebra
Approach using the gradient method
Stochastic noise
Polynomial regressions
3. Logistic regression (Bayesian inference)
Binary classification using logistic regression
Bayes' theorem
Sigmoid function and interpretation
Likelihood Maximization
Approach algorithms
Block two
The main objective is to continue the two algorithms we studied in block one, as well as to introduce the first non-parametric and unsupervised algorithms.
On the one hand, the decision trees generalize the perceptron by allowing non-linear classification, and with them we will begin the study of non-parametric algorithms.
The PCA method will be the first example of an unsupervised algorithm that we will study, in addition to reinforcing the idea of correlation studied in the previous block.
Finally, we will begin the study of proximity algorithms, which in addition to being the second unsupervised and non-paramedical example will allow us to introduce the idea of clusterization.
Syllabus
Decision trees
What is not your decision tree?
Geometric interpretation
ID3
Entropy and Gini function
2. Principal component analysis (PCA)
Interpretation in terms of variance
Interpretation in terms of distance
Relationship to linear algebra
Enigenvalues
Singular value decomposition
QR-decomposition
Usual algorithms
3. Closeness and clusterization algorithms
Euclidean distances and other metrics
K-nearest neighbors
1-NN
General algorithm
The curse of dimension
K-means
Clustering
Block three
There are three objectives of block three:
Firstly, we seek to introduce the concept of regularization in machine learning, which is essential to compare algorithms through their generalization capacity.
The second objective is to expand the palette of algorithms that the student understands by means of two fundamental techniques for classification and forecasting: neural networks and time series.
Finally we begin the presentation and analysis of another family of useful and common algorithms in machine learning, the so-called stochastic algorithms, we will focus on their relationship with neural networks, linear regressions and decision trees. We will complement this block with an invitation to boosting.
Syllabus
1. Regularization in Machine Learning
Fitting vs overfitting
In linear regressions
Ridge
Lasso
Elastic
In decision trees: pruning
Perceptron: support vector machines
2. Invitation to Deep learning
Activation functions
Back-propagation algorithm
Neural network architectures
Convolution and its interpretation: CNN
3. Stochastic algorithms
Stochastic gradient descent (regressions and neural networks)
Random forests (decision trees)
Boosting
4. Invitation to time series
Components of a time series
White stochastic noise
Moving-average
ARIMA