The Elements of Statistical Learning

The Elements of Statistical Learning (WS 2018/19)

News

  • The final re-exam schedule is now available on the password protected area.
  • Please register for the re-exam by sending an email to Michael with the subject [SL][re-exam registration]. Add your full name and your matriculation number to the e-mail's body. For further information, e.g. about the doodle poll, see the password protected area.
  • The results of the exam are now available on the password protected area.
  • The final exam schedule is now available on the password protected area.
  • A preliminary exam schedule is now available in the password protected area. If your name is not on the list, please folilow the instructions below or contact the TA.

General information

Lecturers Tobias Marschall
Jilles Vreeken
Teaching Assistant Michael Scherer
Language English

Time and location

Lecture Thursday, 10:00 - 12:00, Campus E2.1 (CBI building), Room 001
First lecture will be held on Oct. 18, 2018
Tutorials Monday, 12:00-14:00, Campus E1.4 (MPI for Informatics), Room 021 or
Tuesday, 12:00-14:00, Campus E1.4 (MPI for Informatics), Room 021
Office hours Tobias Marschall: after each lecture
Jilles Vreeken: after each lecture
Michael Scherer: by appointment, Campus A2.4 (Genetics Department), Room 0.20

Registration

In order to successfully participate, you need to register for the exam in the LSF/HISPOS system of Saarland University - this will be possible as soon as the exam date has been entered into the system (this usually happens a few weeks into the semester).

Course material

Lecture slides, tutorial handouts, and problem sets are available in the password protected area

Overview

This course covers a subject that is relevant for computer scientists in general as well as for other scientists involved in data analysis and modeling. It is not limited to the field of computational biology. The course fulfills the requirements for the curricula of computer science and bioinformatics as special lecture (Spezialvorlesung, 5 credit points).

The course will convey the ability, given a data set, to choose an appropriate statistical method for analyzing it, to select the appropriate parameters for the statistical model generated by that method and to assess the quality of the resulting model. Both theoretical and practical aspects will be covered.

The course will, by and large, follow the book An Introduction to Statistical Learning with Applications in R (2013). At times the course will take additional material from the book The Elements of Statistical Learning, Springer (second edition, 2009). The former book is the more introductory text, the latter book is more advanced. Both books are available as free PDFs. We encourage you, though, to acquire at least the first book in print. There will be one lecture (90 min) and one tutorial (90 min) per week. The slot for the tutorial will be set after the first lecture.

The course will cover the following topics:
  • Linear Regression
  • Classification Methods
  • Resampling Methods
  • Model Selection and Regularization
  • Splines, Local Regression
  • Tree-based Methods
  • Support Vector Machines
  • Unsupervised Learning Methods

Prerequisites

The course is targeted to advanced students in bioinformatics, computer science, math, and general science with mathematical background. Students should know linear algebra and have basic knowledge of statistics.

Requirements for the course certificate

You need a cumulative 50% of the points in the problem sets (in both theoretical and programming exercises) to be admitted to the exam.

Literature

James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning with Applications in R (2013). The students of the course are encouraged to acquire this book.

Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer (second edition, 2009).
Both books are available online as free PDFs.

Additional literature can be found in the library; the reserve list for the lecture can be found here: library reserve list for 'Elements of Statistical Learning 1'.

Problem Sets

Problem sets will cover theoretical proofs and programming exercises with roughly equal weight. In general, they are due before the lecture (10:00 sharp); further details regarding the assignments will be announced in the first lecture.

The programming language that will be used is R - a language for statistical computing. It is freely available for Windows, Linux and Mac. As a vectorized programming language, it is ideally suited for the problems we will encounter. There are also many freely available packages (or libraries) to perform a variety of classification and regression tasks, or to visualize the results of statistical analyses in a convenient way.

Tutorials

The tutorials focus on the problem sets. A very brief reiteration of parts of the lecture is also given. If you have any questions about the lecture, write an e-mail to stat-learn-staff@mpi-inf.mpg.de. We also have a student mailing list (stat-learn-students@mpi-inf.mpg.de); to register for that, write an email with the subject ESL mailing list to mscherer@mpi-inf.mpg.de.

What can I do to prepare for the lecture?

  • Refresh your knowledge on basic statistics. Basic linear algebra will also be useful.
  • Familiarize yourself with the R programming language. You might find the following tutorials useful:
    • R for Beginners by Emmanuel Paradis. Especially relevant for us are chapters 1, 2, 3 and 6.
    • An Introduction to R - the standard R introduction. This is a very detailed manual; it is therefore quite lengthy.

Acknowledgments

This course was developed by Thomas Lengauer and we thank him for providing his lecture materials.