Training Next-Generation Health Data Scientists

Course Description

The MDSH is a two-year, 48 unit program, consisting of public health foundation (4 units), MDSH core courses (24 units), MDSH electives (16 units), and a data science capstone course (4 units).

Public Health Foundation (4 units)

  • PUBHLT C201 Fundamentals of Public Health Exploration of foundations of public health by examining public health challenges at local, national, and global levels, and current strategies for advancing population health. Analysis of current public health issues and modern public health policies and practices.

MDSH Core Courses (24 units)

  • BIOSTAT 203A,B,C Introduction to Data Science The BIOSTAT 203 3-course sequence introduces practical data science (data ingestion, data cleaning, data wrangling, data visualization and reporting, databases) and big data computing (parallel, distributed, cluster and cloud computing) skills using computer languages R, Python, SAS, and SQL. Other topics include data ethics.

  • BIOSTAT 201A Introduction to Biostatistics Principles of biostatistics.

  • BIOSTAT 212A,B Statistical Learning The BIOSTAT 212 2-course sequence lays a rigorous foundation to commonly used data analytic tools for prediction, classification, and artificial intelligence (AI), with emphasis on applications to big and complex health data.

MDSH Elective Courses (16 units)

MDSH students take at least 4 elective courses from the following list.

  • BIOSTAT 218 Observational Health Data Science and Informatics An introduction to observational research in the health data sciences. Topics include disease cohort characterization, patient-level prediction and population-level estimation using administrative claims and electronic health records. Lectures will cover an introduction of observational health databases, a common data model for representing patient trajectories through healthcare systems, tools to manipulate data while preserving patient privacy theory of patient-level prediction and casual inference from observational data, and best practices for generating reproducible and reliable observational studies. Introductory theory will demonstrate how linear and generalized linear modeling is used in observational studies. Weekly practical laboratories will demonstrate the methods discussed in lecture. Laboratories will use SQL and R software, and regular homework assignments will re-enforce theoretical work with practical application using large-scale synthetic and real-world example databases. Students will design and complete a data analysis project that reflects the best practices covered in this course and translate their results into an oral presentation and written report.

  • BIOSTAT 217 Health Decision Making The course will provide a data analytic perspective to medical decision making in contemporary clinical research and development. Students in this course will be introduced to the evidence-based and model-based approaches in decision sciences by properly harnessing the increasingly complex and large body of information. Particular emphasis will be placed on quantitative data analysis within the Bayesian and frequentist paradigms of statistical modeling and their connections to medical decision making. The course will adopt a hands-on approach to data analysis and medical decision making by incorporating a rich and diverse set of examples from actual clinical trials and other areas of medical research.

  • BIOSTAT 215 Survival Analysis Data science methods for survival and life time data.

  • BIOSTAT 231 Statistical Power and Sample Size Methods for Health Research Sample size and power analysis methods for common study designs, including comparisons of means and proportions, ANOVA, time-to-event data, group sequential trials, linear regression, cluster randomized trials and multilevel data, with emphasis on designing randomized trials. Discussion also of multiple endpoints.

  • BIOSTAT M234 Applied Bayesian Inference Bayesian approach to statistical inference, with emphasis on biomedical applications and concepts rather than mathematical theory. Topics include large sample Bayes inference from likelihoods, noninformative and conjugate priors, empirical Bayes, Bayesian approaches to linear and nonlinear regression, model selection, Bayesian hypothesis testing, and numerical methods.

  • BIOSTAT M236 Longitudinal Data Analysis of continuous responses for which multivariate normal model may be assumed. Students learn how to think about longitudinal data, plot data, and how to specify mean and variance of longitudinal response. Advanced topics include introductions to clustered, multivariate, and discrete longitudinal data.

  • BIOSTAT 410 Clinical Trials Design of studies to assess anti-tumor response; randomization, historical controls, p-values, size of study, and stratification in human experimentation; various types of controls; prognostic factors, survivorship studies, and design of prognostic studies; organization of clinical trials – administration comparability, protocols, clinical standards, data collection and management.

MDSH Capstone (4 units)

  • BIOSTAT 401 Data Science Capstone A capstone project that consists of an original written analysis and an oral presentation that addresses an applied health-related data science topic and advances existing skills and techniques in healthcare or public health. Communication skills for professionals. Data ethics training.