General Info

Teachers

Teaching Assistants

Lectures

  • Tuesdays 18:00-19:45.
  • Exercise sessions

  • Tuesdays 20:00-22:00.
  • Office hour (Karl Heuer)

  • Wednesdays 13:00-14:00 (see DTU Learn announcements for exceptions)
  • Building 322, room 008
  • Location and TA coverage

    Materials

    Most topics will be based on the book "Mining of Massive Datasets" (Third edition) - see book homepage. However, further additional materials will be used. Observe that there is a huge community utilizing algorithms and further developing the tools we are considering within this course, so online search can also bring you valuable information and guidance.

    Videocast

  • Videocasts
  • Project work

    See information (including important dates) about the project in this PDF file. You can also look at the following example reports: Example 1, Example 2, Example 3, Example 4, Example 5.

    Weekplan

    THIS SCHEDULE IS TENTATIVE AND SUBJECT TO CHANGE

    Week Topics Slides Exercises Materials
    W1 / Sep 03
  • Introductory lecture
  • What is Data Mining?
  • Bonferroni's Principle
  • Tf.idf measure
  • Hash functions
  • Slides

  • Exercises: 1.2.1, 1.2.2, 1.3.1, 1.3.2 and 1.3.3
  • Ch. 1 of MMDS;
    W2 / Sep 10
  • No lecture
  • Python recap
  • Setting up Python and Jupyter Notebook on your local environment
  • Working on tutorial tasks
  • Tutorial tasks for NumPy, SciPy and Numba packages
  • Tutorial tasks for Pandas package
  • Exercise sheet
    W3 / Sep 17
  • MapReduce
  • Distributed File Systems
  • Cluster Computing
  • Slides Exercise sheet
    Solutions
    Ch. 2 of MMDS
    Test Files
    W4 / Sep 24
  • Similar Items
  • Minhashing
  • Locality Sensitive Hashing
  • Slides Exercise sheet
    Solutions
    Ch. 3 of MMDS
    Data and Template
    W5 / Oct 01
  • Frequent itemsets
  • Market-Basket Model
  • Association Rules
  • A-Priori Algorithm
  • PCY Algorithm (+ refinements)
  • Slides Exercise sheet
    Solutions
    Ch. 6 of MMDS
    W6 / Oct 08
  • Clustering
  • Hierarchichal algorithms
  • Point assignment algorithms (k-means algorithm)
  • DBSCAN algorithm
  • CURE algorithm
  • Evaluating (e.g. Davies-Bouldin index)
  • Slides Exercise sheet
    Solutions
    Ch. 7 of MMDS
    Holidays Holiday week
    W7 / Oct 22
  • Mining Social-Network Graphs
  • Betweenness centrality
  • Girvan-Newman algorithm
  • Modularity
  • Louvain Algorithm
  • Spectral clustering
  • Slides Exercise sheet
    Solutions
    Ch. 10 of MMDS
    Survey on Spectral Clustering Stanford lecture notes on community structure in networks by Leskovec
    W8 / Oct 29
  • Project Work
  • W9 / Nov 05
  • Project Work
  • W10 / Nov 12
  • Project Work
  • W11 / Nov 19
  • Project Work
  • W12 / Nov 26
  • Project Work
  • W13 / Dec 03
  • Project Work
  • Exam Exam period