## Ciencia de Datos

## WHAT WE DO

Objectives

Provide the student with the correct intuition behind data science problems and some of the algorithms to solve them, including:

The geometric interpretation

Both theoretical and practical limitations

Comparison with other algorithms

Provide the student with the necessary language to translate fluently:

The problems of Catho science to the mathematical language used in machine learning.

The algorithms exposed in the literature -either in scientific articles or textbooks- to the specific problems.

Syllabus

Block one is focused on two main objectives:

Using three algorithms (perceptron, linear regressions and logistic regressions) invite the student to the methods and language of Data Science.

Make an accurate diagnosis of the student in order to offer a better planned program for the rest of the blocks.

1. Perceptron (Classification)

Statement of a binary classification problem.

Stages of a learning problem.

Geometric interpretation of linear classification

Algebraic formulation of linear classification

2. Linear regressions (Forecasting)

Statement of a regression problem

Linear regressions

Correlation

Exact solution and matrix algebra

Approach using the gradient method

Stochastic noise

Polynomial regressions

3. Logistic regression (Bayesian inference)

Binary classification using logistic regression

Bayes' theorem

Sigmoid function and interpretation

Likelihood Maximization

Approach algorithms

Block two

The main objective is to continue the two algorithms we studied in block one, as well as to introduce the first non-parametric and unsupervised algorithms.

On the one hand, the decision trees generalize the perceptron by allowing non-linear classification, and with them we will begin the study of non-parametric algorithms.

The PCA method will be the first example of an unsupervised algorithm that we will study, in addition to reinforcing the idea of correlation studied in the previous block.

Finally, we will begin the study of proximity algorithms, which in addition to being the second unsupervised and non-paramedical example will allow us to introduce the idea of clusterization.

Syllabus

Decision trees

What is not your decision tree?

Geometric interpretation

ID3

Entropy and Gini function

2. Principal component analysis (PCA)

Interpretation in terms of variance

Interpretation in terms of distance

Relationship to linear algebra

Enigenvalues

Singular value decomposition

QR-decomposition

Usual algorithms

3. Closeness and clusterization algorithms

Euclidean distances and other metrics

K-nearest neighbors

1-NN

General algorithm

The curse of dimension

K-means

Clustering

Block three

There are three objectives of block three:

Firstly, we seek to introduce the concept of regularization in machine learning, which is essential to compare algorithms through their generalization capacity.

The second objective is to expand the palette of algorithms that the student understands by means of two fundamental techniques for classification and forecasting: neural networks and time series.

Finally we begin the presentation and analysis of another family of useful and common algorithms in machine learning, the so-called stochastic algorithms, we will focus on their relationship with neural networks, linear regressions and decision trees. We will complement this block with an invitation to boosting.

Syllabus

1. Regularization in Machine Learning

Fitting vs overfitting

In linear regressions

Ridge

Lasso

Elastic

In decision trees: pruning

Perceptron: support vector machines

2. Invitation to Deep learning

Activation functions

Back-propagation algorithm

Neural network architectures

Convolution and its interpretation: CNN

3. Stochastic algorithms

Stochastic gradient descent (regressions and neural networks)

Random forests (decision trees)

Boosting

4. Invitation to time series

Components of a time series

White stochastic noise

Moving-average

ARIMA