General Info

Teacher

Teaching Assistants

Lectures/exercise sessions

  • Tuesdays 18.00-22.00.
  • Location

    Live streaming and recordings of the lectures

    Physical and virtual attendance to lectures and exercise sessions

    The capacity of classrooms has been reduced this year due to the covid-19 pandemic. As a result, students will need to take turns participating in lectures and exercise sessions physically and virtually. To find out which weeks you should attend physically, and which weeks you should attend virtually, see the info provided in the welcome page of the course on CampusNet.

    Materials

    The lectures are backed reading material from various sources. These should be seen at suggestions. There's a huge community behind the tools we are working with in this course. Suggested reading materials can be found in the Weekplan below.

    Lecture slides and exercise

    Lecture slides and exercises are made available as Colab notebooks. See the Weekplan below.

    Weekplan

    Further information and materials will be posted soon. In the first 4 weeks, we'll introduce the basic computational tools for data science with Python. In weeks 5-12, we will cover more advanced topics such as streaming, parallel computation and relational databases.

    Week Topics Slides Exercises Materials
    1: Sept 1
  • Python brush-up. (No lecture)
  • Self-study Self-study A Whirlwind Tour of Python, learnpython.org
    2: Sept 8
  • Numerical Computing with NumPy.
  • Getting started with Jupyter and Google Colaboratory.
  • Colab notebook Colab notebook Python Data Science Handbook, Ch. 2
    3: Sept 15
  • Manipulating Tabular Data with Pandas.
  • Exploratory Data Analysis with Pandas.
  • Benchmarking and profiling.
  • Colab notebook Colab notebook Python Data Science Handbook, Ch. 3, Kaggle Pandas tutorials, Python for Data Analysis Book, from Ch. 5
    4: Sept 22
  • Data Visualisation with Matplotlib, Pandas profiler, plotly.
  • Statistical analysis and machine learning with scikit-learn.
  • Colab notebook Colab notebook Python Data Science Handbook, Ch. 4-5.
    5: Sept 29
  • Presentation of Project 1
  • Exercise session on Project 1
  • Colab notebook for Project 1
  • 6: Oct 6
  • Exercise session on Project 1.
  • 7: Oct 13 Holiday week
    8: Oct 20
  • Distributed Computing with Apache Spark.
  • Colab notebook Colab notebook, Solutions Learning Spark 2.0, Ch. 1-3, A Neanderthal’s Guide to Apache Spark in Python
    9: Oct 27
  • Optimising and tuning Spark applications.
  • Colab notebook Colab notebook Learning Spark 2.0, Ch. 7
    10: Nov 3
  • Intro to SQL.
  • Using SQL queries in Spark and Pandas.
  • Presentation of Project 2.
  • Colab notebook Exercises on Kaggle's intro to SQL (or work on Project 2) Kaggle's intro to SQL, Kaggle's advanced SQL, W3School SQL tutorial
    11: Nov 10
  • Exercise session on Project 2.
  • Notebook on Downloading and Sampling Data for Project 2
  • 12: Nov 17
  • Exercise session on Project 2.
  • 13: Nov 24
  • Exercise session on Project 2.
  • 14: Dec 1
  • Data Streaming.
  • Intro to Streaming Algorithms.
  • Presentation of Project 3.
  • Exercise session on Project 3.
  • Colab notebook Work on Project 3 Reservoir sampling, Boyer-Moore algorithm

    Mandatory assignments

    Project Released Due Problem file Contribution to final grade
    Project 1 Tuesday, September 29 Monday, November 2, 20:00 Project 1 37.5 %
    Project 2 Tuesday, November 3 Monday, November 30, 20:00 Project 2 37.5 %
    Project 3 Tuesday, December 1 Wednesday, December 23, 20:00 Project 3 25 %

    Frequently Asked Questions

    Can I skip lectures/classes due to conflicting courses, travelling, ...? The is no requirement for attendance, but we recommend attending for support and coaching.