2019-BigData_master
The whole set of essays produced during each master's subject
Project maintained by santibreo
Hosted on GitHub Pages — Theme by mattgraham
Just an excuse to learn how to host your own website using GithubPages.
I have tried to organize and show the work that I have done during a Big Data course at the Universidad Complutense in Madrid.
Warning: All the essays available here are written in Spanish!
Big Data & Business Analytics
This master course consisted in 10 h/week of in-person classes, and more than 20 modules that took between 1 and 6 weeks. Not all these modules have asked for a final essay about its subject, but most of them did, so here I have decided to present all of them. Modules that won’t be shown:
- Introduction - Business Intelligence
- Python
- DataScience workflow
- Text Mining
- Open Data
- Distributed Computation - Spark
- DevOps
SQL Databases:
- Entity - relationship diagram, from scratch.
- Run an SQL Server on your machine.
- Create your own SQL database.
- From
SELECT *
to User Defined Functions.
- Normalization and normal forms.
During this module I was still using Latex to make my documents, and compiling them to pdf
.
DOWNLOAD FINAL ESSAY
NoSQL Databases:
- Document-based databases as an alternative.
- Databases theory. CAP diagram.
- Mongo. How does it work?.
- Run Mongo server. Cloud & local.
- Mongo syntax.
During this module I was still using Latex to make my documents, and compiling them to pdf
.
DOWNLOAD FINAL ESSAY
Google Cloud Computing:
- Cloud-based computing. SaaS, PaaS and IaaS.
- Google Cloud main services.
- Google Cloud Big Data services.
- Big Data codelabs.
- Real life examples.
During this module I was still using Latex to make my documents, and compiling them to pdf
.
DOWNLOAD FINAL ESSAY
R:
- Introduction. Statistical language.
- Data structures. How to manipulate them.
- Libraries. How to load and understand them.
- Functions. Use and define. Plotting tools.
- Further steps.
I decided to start using Rmarkdown to generate my documents, and compile them to html
(if allowed).
VISIT FINAL ESSAY
Statistics:
- Descriptive Statistics. Types of variables.
- Centrality, dispersion and shape measures.
- Bidimensional distributions. Correlation.
- Usual probability distributions. Confidence intervals.
- Hypothesis testing.
I used Rmarkdown and compile the essay to html
.
VISIT FINAL ESSAY
Lineal and Logistic regressions:
- Data understanding.
- Test for independence.
- Lineal regression.
- Logistic regression.
- Variable selection.
I used Rmarkdown and compile the essay to html
.
VISIT FINAL ESSAY
Principal Component Analysis:
- Multivariational data techniques.
- Theory.
- Application with R.
- Visualization with R.
- Interpretation.
I used Rmarkdown and compile the essay to html
.
VISIT FINAL ESSAY
Timeseries:
- Descriptive Initial Analysis. Timeseries decomposition.
- Correlation vs causality. Cointegration test.
- Univariate Dynamic Models: ARIMA.
- Dynamic Models with static regressors: ARIMAX.
- Multivariate Dynamic Linear Models: VAR.
I used Rmarkdown and compile the essay to html
.
VISIT FINAL ESSAY
Machine Learning:
- Artificial Neural Networks.
- Tree-based models. Decision trees - bagging - random forest - gradient boosting.
- Support Vector Machines.
- Ensemble techniques.
- Deep Learning.
I used Rmarkdown and compile the essay to pdf
.
DOWNLOAD FINAL ESSAY
Competition:
This module make us participate in this
competiton
in teams of 5 or 6 students, and apply the agile workflow to get the best model.
I used Rmarkdown and compile the essay to html
.
VISIT FINAL ESSAY
Graph Theory:
- Application fields.
- Degree distribution and categorization.
- Distance analysis.
- Centrality measures. Closeness and Betweenness.
- Closest nodes and paths. Dijkstra algorithm.
I used Rmarkdown and compile the essay to pdf
.
DOWNLOAD FINAL ESSAY
Final Project:
Last module was a collaboration with San Carlos Hospital to make 3 ML
models that detects reentries, duration of the hospitalisation and type of
hospital discharge. We made it in groups of 6 students.
I used Rmarkdown and compile the essay to html
. I have ommited some
content.
VISIT FINAL ESSAY