2nd Annual

Data Science Bootcamp

January 8–12, 2018
10am to 4pm (9am to 10am, Introduction January 8)
Design Studio, Room 430 of the Riverside Church (490 Riverside Dr, New York, NY 10027)

The Collaboratory@Columbia is pleased to announce our second annual free Data Science Bootcamp coming up over the winter break. This week-long, immersive, hands-on workshop is especially designed for Columbia University PhD students and postdoctoral scholars who are interested in extending their existing mathematical and programming skills to include a training in data science. Faculty may apply. Designed by faculty and postdoctoral scholars from Columbia University’s Data Science Institute, the curriculum includes on-line learning material, introductory lectures, hands-on laboratory experiences and a capstone project.

Click here for detailed course information, including a self-assessment for basic prerequisites.

Click here to apply to participate. Seats are limited. The deadline for applications is December 15, 2018.

Course Information:

The course is a blend of online learning experiences (about 2 hours of preparation will be required per day), in-class lectures, hands-on laboratory exercises with a variety of data sets, and a capstone project. The course will use the Python programming language. Participants are required to bring their own computer to the daily sessions. Lunch and refreshments are NOT provided.

Typical Daily Schedule (unless otherwise noted)

10:00Session 1 (1h introduction + 1h lab)
12:00Lunch Break (1h)
1:00Session 2 (30 min introduction + 1h lab)
2:30Break (30 min)
3:00Session 3 (1h lab)
Monday, January 8, 2018 (Registration at 9am)

Introduction to Data Science

  • Introduction to Data Science
  • Data Visualization
  • Probability and Regression
9:00Welcome and Registration (1h)
Tuesday, January 9, 2018

Algorithms & Classification

  • Introduction to Algorithms
  • Introduction to Machine Learning
  • Classification
Wednesday, January 10, 2018

Machine Learning

  • Model Selection
  • Probabilistic Modeling
  • Evaluation
Thursday, January 11, 2018

Advanced Topics in Data Science

  • Natural Language Processing
  • Information Retrieval
  • Neural Networks
Friday, January 12, 2018

Capstone Project

  • Lab Project Work

Prerequisites:

  • Basics of linear algebra
  • Basics of statistics (mean, variance, etc.)
  • Basic programming skills in Python (online resources)
  • Basic understanding of data structures and algorithms
  • Basic skills for working with data files (i/o operations on csv, tsv, ...)

About the instructors

Andreas Mueller Andreas Mueller (@amueller) is a lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to Machine Learning with Python”, describing a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he has been co-maintaining it for several years. He is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon. You can find his full cv here. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Tian Zheng Tian Zheng (@tz33cu) is Professor of Statistics, Associate Director for Education of Data Science Institute at Columbia University. She develops novel methods for exploring and understanding patterns in complex data from different application domains such as biology, psychology, climatology, and etc. Her current projects are in the fields of statistical machine learning, spatiotemporal modeling and social network analysis. Professor Zheng’s research has been recognized by the 2008 Outstanding Statistical Application Award from the American Statistical Association (ASA), the Mitchell Prize from ISBA and a Google research award. She became a Fellow of American Statistical Association in 2014. Professor Zheng is the receipt of 2017 Columbia’s Presidential Award for Outstanding Teaching. In 2018, she will be the chair-elect for ASA’s section on Statistical Learning and Data Science. She is on the advisory board for STATS at Sense About Science America that targets to develop a statistical literate citizenry.

About Collaboratory@Columbia

Jointly founded by Columbia University’s Data Science Institute and Columbia Entrepreneurship, The Collaboratory@Columbia is a university-wide program dedicated to supporting collaborative curricula innovations designed to ensure that all Columbia University students receive the education and training that they need to succeed in today’s data rich world.

Data Science Bootcamp applications due by December 15, 2017

Apply Today