Original work

These projects comprise either extended analysis of published data, or completely independent work.

2018

This report consisted of my deliverable for the technical test required by the Data Analytics company Datrik Intelligence to access their position as Junior Data Scientist. Its CEO is a well-known data scientist who has been #1 in Kaggle worldwide. The test consisted of a Explorative Data Analysis (EDA) coupled with the development of two predictive models.

What I learned:

  • Multiple Correspondence Analysis (MCA) in FactoMiner
  • Tree-based modelling in scikit-learn

(I) Compilation of an open-source free (cost and license) pipeline for the analysis and quantification of DDA proteomics datasets in a GNU/Linux environment. Makes extensive use of the tools developed at the Compomics group at UGhent (Belgium).

(II) Writing of a statistical model for the probabilistic quantification of protein ratios between samples using the PyMC3 framework, focusing on the assessment of uncertainty. Carried out at Advanced analytics in Novozymes.

What I learned:

  • Probabilistic programming in PyMC3
  • DDA Proteomics analysis
  • Pipelining in bash

What I learned:

  • Data preprocessing in Python
  • Vanilla NNs with TensorFlow
  • Classifier evaluation with scikit-learn

2017

2016

  • Bioinformatics and Genomic Analysis final exam consisting of 2 parts: -Extended ChIP-seq analysis for prediction of PIF1 and PIF4 interaction in A. thaliana. -RNA-seq de novo analysis of early and late embrionary states in X. tropicalis using edgeR.

  • Bachelor Thesis, entitled “Caracterización de la proteína All1873” (in Spanish).

Replicates of previously published scientific works

The projects below are just replicates of the published computational analysis carried out by other scientists. Thus, despite there might be minor differences, the bulk of the creation is not my own original work. Reference to the scientific article projects are based on may be found in the references section.

Microarray data analysis

Data for 5 conditions in mice lung tissue

What I learned:

  • Bioconductor and R packages: limma, affy
  • Quality control of samples
  • Graphics for results interpretation: scatter and volcano plots, Venn diagrams, heatmaps
  • Microarray tech pros and cons
  • GEO database

RNA-seq data analysis

Data for 3 watering conditions in A. thaliana

What I learned:

  • Tuxedo protocol: bowtie, cufflinks, cuffmerge, cuffdiff and cummeRbund
  • Working on a Sun Grid Engine cluster through the command line
  • Gene set enrichment analysis (GSEA)
  • Gene ontology (GO)
  • SRA database

Gene coexpression networks

Studying correlations in wine yeast gene expression

What I learned:

  • Network theory: scale-free and small-world networks
  • Biological networks motifs: AR, DFL, IFFL and CFFL
  • Cytoscape software and igraph package
  • Correlation analysis

Cell physiology simulations

Simulation on Ca2+ spontaneous peaks

What I learned:

  • Implement biological networks and motifs
  • COPASI and Cell Designer software
  • Effect of network motifs on its system's dynamics

RNA-seq and ChIP-seq integral analysis

Development of a workflow for data integral analysis. Gene Set Analysis was carried on subsets of interesting genes as found by RNA-seq and ChIP-seq

What I learned:

  • Shell scripting and blackboard messaging
  • ChIP-seq analysis software: MACS and PeakAnalyzer
  • Motif finders: HOMER
  • KEGG Database
  • ggplot2 and other "Hadley verse" R packages

Synteny study on Python

Analysis of M. pneumoniae and M. genitalium

What I learned:

  • Python scripting and Biopython module
  • Download script here
  • The time Linux utility

RNA-seq de novo analysis

De novo genome assembly on frog genome and differential expression analysis on 2 different development states

What I learned:

  • Genome assembly and assessment methods: Trinity
  • Transdecoder
  • KAAS

Variant call analysis

Discovering differences between an arbitrary A. thaliana genome and a reference genome

What I learned:

  • bwa
  • GATK software suite
  • Download shell script here