Artboard 7 copy 20.jpg
Statistics for Developers & Engineers

According to our research, since 1982, engineering (particularly computer engineering) is taught in our universities with a very large deficiency in Mathematics and Statistics. With the rise (and subsequent irrational exuberance) of Data Science and Machine Learning, it is common to observe cases where the lack of these rigorous bases result in failed projects, spurious and inoperative conclusions, and unethical recommendations with negative impact on society. . The Mexican Society of Data Science, together with The Data Pub, has launched this course focused on Computer Engineering graduates already incorporated into the labor market, so that they can acquire the language and mental structures to start in Data Analysis and collaborate effectively with Mathematicians, Physicists and Statistics to scale their products to millions of records.

Objectives

1. Invite the capable student in programming to use statistical methods.
2. Familiarize the student with the mathematical language used in Data Science methods that involve statistics.
3. Provide the student with the theoretical restrictions and practical benefits that mathematical formalization implies.

Syllabus

1. Statistical thinking

The engineer or developer will receive the beginnings of the mental structures associated with statistical thinking and the use of the scientific method. It is not only doubting and questioning without method or arguments, but knowing how to observe and read the evidence, make use of centrality metrics and connect the state of reality with them.

2. Descriptive Statistics

The engineer or developer will learn how to use arithmetic tools to describe a population, then move on to graphing tools, ending with tools for calculating probabilities of single events and related events.

3. Distributions as a tool to describe reality

The engineer or developer will transition from histograms to distributions, the mathematical form of which allows to express even more aspects of the population and different phenomena. You will learn the implications of assuming that some of the axioms behind some distributions are true, and review the concept of random numbers.

4. Continuous distributions

The engineer or developer will learn to model phenomena using different exponential family distributions, such as situations of nature, manufacturing, logistics and retail movements, financial and economic phenomena, and how these distributions can be used to simulate phenomena of the same kind.

5. Probability

The engineer or developer will examine some apparently random phenomena, but rigorously quantifying both the uncertainty and the possible results, ending with the Bayes theorem, which underlies sophisticated current data science themes.

6. Operations on distributions

The engineer or developer will go from adjusting distributions to observed phenomena, to modifying the distribution to explore different alternatives and paths of the same phenomenon, or adjusting the distribution, and therefore reality, for better results of its analysis.

7. Hypothesis testing

The engineer or developer will be introduced to one of the hardest topics of the course, and which makes 8 out of 10 students of the Datara specialization of Coursera + Johns Hopkins thunder this subject and must take it 3 or 4 times. Hypothesis testing is the cornerstone of the scientific method. Do you remember hearing that something is not "statistically significant"? This is where we learn to obtain that "statistical significance".

8. Estimation

The engineer or developer will approach the task immediately prior to the prediction. You will learn that estimation is not just getting closer to a value, but also quantifying variance and trying to explain it. You will learn different estimation techniques and the difference between concepts like truth, fact, plausibility. It will also approximate the Bayesian estimation, which privileges uncertainty over point values.

9. Correlation

The engineer or developer will learn to measure and graph relationships between variables of different types, and interpret the implications of those relationships in reality. It will also approach the fundamentals of linear regression, which constitutes the 1st machine learning algorithm covered in the entire literature.