Introduction to NYC Health Disparities Using Data Science

Mary Beth Terry, Professor Mailman School of Public Health Dept of Epidemiology

Abigail Greenleaf, Lecturer Mailman School of Public Health Heilbrunn Dept of Population, Family and Reproductive Health


Data provide the empirical evidence for public health policies and programs, and are increasingly expansive in depth, content, and geographical scope. The proposed course,  “Introduction to NYC Health Disparities Using Data Science”,  aims to use an experiential, place-based approach to teach undergraduate students about health disparities in New York City (NYC) by introducing them to the seven steps of data science.

This semester-long course is intentionally accessible to students who do not have the typical foundational data science skills, such as coding and analyses, in the hopes that groups which are under-represented in STEM will participate in the course. The course will have no prerequisites. We will teach the data science process to students using the tenets of authentic learning: learning by applying knowledge to real-life problems. Students will simulate public health data scientists’ work by applying the data science steps to a dataset of their choice. Instructors will use class time to model the metacognitive processes used during the steps.

The course will use publicly available NYC health data to learn the seven steps of data science: 1) writing a research question; 2) obtaining data to address the question; 3) data cleaning; 4) data exploration; (5) analysis; 6) replication and validity evaluation; and (7) presentation and summary.

The course will introduce students to R, a freely available statistical and data visualization software. Assignments will be mapped to each step and the course will culminate in presentations to NYC Department of Health officials from the Center for Health Equity. The course, designed by instructors in the Epidemiology and Population & Family Health departments at the Mailman School of Public Health, will introduce students to data science and public health in the hopes that the data science workforce will become more diverse and better prepared to answer pressing public health questions of our time.