General Info

Teacher

Teaching Assistants

Lectures/exercise sessions

  • Tuesdays 18.00-22.00.
  • Location and TA coverage

    Slides: Are made available as Notebooks, which can be inspected (and run) either via Colab or on your local environment. Download a notebook via File > Download As > Notebook (.ipynb).

    Exercises and projects: Are made available as Notebooks. For some exercises and projects, you will need to run the notebooks locally; i.e. cannot rely on using Colab. Download a notebook via File > Download As > Notebook (.ipynb).

    Materials

    The lectures are backed reading material from various sources. To solve the exercises, you will need to both attend lectures and read up on these materials. Supplementary materials are optional. Observe that there's a huge community behind the tools we are working with in this course, so online search can also bring you valuable information and guidance.

    Stream and videocast

  • Live Stream
  • Videocasts (Unfortunately, the videos from the first two lectures are audio-desynced and without projector content.)
  • Weekplan

    Week Topics Slides Exercises Materials
    W1 / Aug 31
  • No lecture
  • Python brush-up
  • Setting up python and jupyter on your local environment
  • Working with Jupyter notebooks/Colab
  • Self-study

    Exercise notebook

    Local installation: Install miniconda and ipython-notebook, Launch the jupyter notebook
    A Whirlwind Tour of Python; Python Data Science Handbook, Ch. 1 IPython: Beyond Normal Python;
    Supplementary: learnpython.org
    W2 / Sep 7
  • Numerical Computing with NumPy.
  • Getting started with Jupyter and Google Colaboratory.
  • Slides notebook Exercise notebook, Solution notebook Python Data Science Handbook, Ch. 2 Introduction to NumPy
    W3 / Sep 14
  • Manipulating Tabular Data with Pandas.
  • Exploratory Data Analysis with Pandas.
  • Benchmarking and profiling.
  • Slides notebook Exercise notebook, Solution notebook Python Data Science Handbook, Ch. 3 Data Manipulation with Pandas;
    Supplementary: Kaggle Pandas tutorials; Python for Data Analysis Book, from Ch. 5
    W4 / Sep 21
  • Pandas profiler and apply
  • Visualisation with matplotlib and pandas; Pandas profiler; plotly
  • Statistical analysis and machine learning with scikit-learn.
  • Slides notebook Exercise notebook, Solution notebook Python Data Science Handbook, Ch. 4-5.
    W5 / Sep 28
  • Pandas Merge, join and correlation
  • Presentation of Project 1
  • Slides notebook
  • Exercise session on Project 1
  • W6 / Oct 5
  • Exercise session on Project 1.
  • W7 / Oct 12
  • Distributed Computing with Apache Spark.
  • Slides notebook Exercise notebook, Solution notebook Learning Spark 2.0, Ch. 1-3; Supplementary: A Neanderthal’s Guide to Apache Spark in Python
    Holiday Holiday week
    W8 / Oct 26
  • Optimising and tuning Spark applications.
  • Slides notebook Exercise notebook, Solution notebook Learning Spark 2.0, Ch. 7
    W9 / Nov 2
  • Structured Query Language (SQL) essentials
  • Using SQL queries in Pandas.
  • Presentation of Project 2.
  • Slides notebook Exercise notebook, Solution notebook Kaggle's intro to SQL; Kaggle's advanced SQL; Supplementary: W3School SQL tutorial
    W10 / Nov 9
  • Exercise session on Project 2.
  • W11 / Nov 16
  • Exercise session on Project 2.
  • W12 / Nov 23
  • Exercise session on Project 2.
  • W13 / Nov 30
  • Intro to Data Streaming Algorithms.
  • Presentation of Project 3
  • Exercise session on Project 3.
  • Slides notebook Work on Project 3 Reservoir sampling, Boyer-Moore algorithm
    Exam Exam period

    Mandatory assignments

    Project Released Due Project notebook Contribution to final grade
    Project 1 Tuesday, September 28 Monday, November 1, 20:00 Project 1 37.5 %
    Project 2 Tuesday, November 2 Monday, November 29, 20:00 Project 2 37.5 %
    Project 3 Tuesday, November 30 Wednesday, December 22, 20:00 Project 3 25 %