The Elements of Statistical Learning II

The Elements of Statistical Learning II (WS 2013/14)

News

  • 2013-10-15 Exam registration is possible from now until Nov 17th, 2013. Last date for withdrawing a registration is two weeks before the first exam.
  • 2013-10-15 The first tutorial will be held on Nov 6th at 12:30.
  • 2013-10-15 The first assignment is now available and can be found in the password protected area.
  • 2013-09-27 Please vote for the tutorial slot in the doodle poll (link is in the password protected area).
  • 2013-09-27 The first lecture will take place on Oct 16th.

General information

Lecturer Thomas Lengauer
Teaching Assistant Prabhav Kalaghatgi
Language English

Time and location

Lecture Wednesday, 10:00 - 12:00, Campus E2.1 (CBI building), Room 001
First lecture will be held on Oct. 16, 2013
Tutorial Wednesday, 12:30 - 14:00, MPI-INF, Room 023
First tutorial on Nov. 6, 2013
Office hours Thomas Lengauer: after each lecture
Prabhav Kalaghatgi: By appointment, Campus E1.4 (MPII), Room 526

Registration

In order to successfully participate, you must register for the lecture in the LSF/HISPOS system of Saarland University. Additionally, please write an e-mail to the teaching assistant:

Subject line: [SL2] Registration
Body: Last name, first name
official e-mail address*
Your major**

*this means: mail account from Saarland University, the CBI, the MPI or similar
**e.g. bioinformatics, CS

Exam Registration

In order to take the exam, (i) register in the LSF/HISPOS for the lecture and (ii) write an email to the TA.

Subject line: [SL2] Exam Registration
Body: Last name, first name
Matriculation number
Your major
Language (eng or ger)

Course material

Lecture slides, tutorial handouts and problem sets are available in the password protected area.

Overview

The course will be the second part of a two semester course on Statistical Learning. The first part (SS 2013) concentrated on chapters 1-5 and 7-10 of the book The Elements of Statistical Learning, Springer (second edition, 2009). The second part will present the remaining chapters, focusing on advanced topics in supervised and unsupervised leaning, such as kernel methods, SVMs, neural networks, random forests and clustering. The theoretical models will be illustrated with interesting applications, out of which many are challenging problems in the field of Bioinformatics. As in the previous semester, there will be two hours of lecture per week and one hour of tutorial (V2/Ü1), however, the tutorial will actually be two hours every other week.

This course covers a subject that is relevant for computer scientists in general as well as for other scientists involved in data analysis and modeling. It is not limited to the field of computational biology.

Both parts of this lecture fulfill the requirements for the curricula of computer science and bioinformatics as optional course with 5 credit points (Spezialvorlesung, 5 Leistungspunkte).

Prerequisites

The course is targeted to advanced students in math, computer science and general science with mathematical background. Students should know linear algebra and have basic knowledge of statistics. Attendance of Statistical Learning I is recommended, however not required if a student has basic knowledge in machine learning.

Requirements for the course certificate

Theoretical assignments are handed out every other week and are due two weeks after. Additionally, there will be programming assignments, possibly in the form of several smaller projects. Theoretical problem sets will involve mathematical proofs as well as testing the understanding of methods presented in the lecture and their relations. The R statistical programming language is required for the programming assignments in which the methods presented in the lecture will be applied to real-world data.

You need a cumulative 50% points for each the theoretical problem sets and the programming assignments respectively to be admitted to the oral exam.

Literature

Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer 2009. The readers of the course are encouraged to acquire this book. You can download it as a PDF file from the dedicated page on Tibshirani's web site. More information on this book, as well as a contents listing can be found on the Springer web site.
Additional literature can be found in the library; the reserve list for the lecture can be found here: library reserve list for 'Elements of Statistical Learning II'
Please keep in mind that only the book by Hastie, Tibshirani and Friedman will be covered in the lecture.

Tutorials

The tutorials focus on the problem sets. A very brief reiteration of parts of the lecture is also given. Homework assignments will cover theoretical proofs and programming excercises with roughly equal weight.

The programming language that we use is R - a language for statistical computing. It is freely available for Windows and Linux and - as a vectorized programming language - is ideally suited for the problems we will encounter. There are also many freely available packages (or libraries) to perform a variety of classification and regression tasks, or to visualize the results of statistical analyses in a convenient way.

What can I do to prepare for the lecture?

  • Refresh your knowledge on basic statistics. Basic linear algebra will also be useful.
  • Refresh your knowledge on statistical learning. E.g. by reviewing the Statistical Learning I lecture slides and your notes.
  • (Re-)familiarize yourself with the R programming language. You might find the following tutorials useful:
    • R for Beginners by Emmanuel Paradis. Especially relevant for us are chapters 1, 2, 3 and 6.
    • An Introduction to R - the standard R introduction. This is a very detailed manual; it is therefore quite lengthy.