General Info

Teachers

Teaching Assistants

Lectures

  • Tuesdays 18:00-19:45.
  • Exercise sessions

  • Tuesdays 20:00-22:00.
  • Location and TA coverage

    Materials

    Most topics will be based on the book "Mining of Massive Datasets" (Third edition) - see book homepage. However, further additional materials will be used. Observe that there is a huge community utilizing algorithms and further developing the tools we are considering within this course, so online search can also bring you valuable information and guidance.

    Videocast

  • Videocasts
  • Project work

    See information (including important dates) about the project in this PDF file. You can also look at the following example reports: Example 1, Example 2, Example 3, Example 4, Example 5.

    Weekplan

    THIS SCHEDULE IS TENTATIVE AND SUBJECT TO CHANGE

    Week Topics Slides Exercises Materials
    W1 / Sep 02
  • Introductory lecture
  • What is Data Mining?
  • Bonferroni's Principle
  • Tf.idf measure
  • Hash functions
  • Slides

  • Exercises: 1.2.1, 1.2.2, 1.3.1, 1.3.2 and 1.3.3
  • Ch. 1 of MMDS;
    W2 / Sep 09
  • No lecture
  • Python recap
  • Setting up Python and Jupyter Notebook on your local environment
  • Working on tutorial tasks
  • Tutorial tasks for NumPy, SciPy and Numba packages
  • Tutorial tasks for Pandas package
  • Exercise sheet
    W3 / Sep 16
  • MapReduce
  • Distributed File Systems
  • Cluster Computing
  • Slides Exercise sheet
    Ch. 2 of MMDS
    Test Files
    W4 / Sep 23
  • Similar Items
  • Minhashing
  • Locality Sensitive Hashing
  • Slides Exercise sheet
    Ch. 3 of MMDS
    Data and Template
    W5 / Sep 30
  • Frequent itemsets
  • Market-Basket Model
  • Association Rules
  • A-Priori Algorithm
  • PCY Algorithm (+ refinements)
  • Slides Exercise sheet
    Ch. 6 of MMDS
    W6 / Oct 07
  • Clustering
  • Hierarchichal algorithms
  • Point assignment algorithms (k-means algorithm)
  • DBSCAN algorithm
  • CURE algorithm
  • Evaluating (e.g. Davies-Bouldin index)
  • Slides Exercise sheet
    Ch. 7 of MMDS
    Holidays Holiday week
    W7 / Oct 21
  • Mining Social-Network Graphs
  • Betweenness centrality
  • Girvan-Newman algorithm
  • Modularity
  • Louvain Algorithm
  • Spectral clustering
  • Slides Exercise sheet
    Ch. 10 of MMDS
    Survey on Spectral Clustering Stanford lecture notes on community structure in networks by Leskovec
    W8 / Oct 28
  • Project Work
  • W9 / Nov 04
  • Project Work
  • W10 / Nov 11
  • Project Work
  • W11 / Nov 18
  • Project Work
  • W12 / Nov 25
  • Project Work
  • W13 / Dec 02
  • Project Work
  • Exam period THERE IS NO EXAM IN THIS COURSE