Foundations of Data Science JupyterHub in Data Science Instruction John DeNero, Ryan Lovett, Jeff Anderson-Lee, et al Cloud-Hosted Browser-Based Software as a Service for Data Science Instruction Jupyter Notebooks Foundations of Data Science Other Features Notebooks are delivered to students is via nbinteract, a notebook extension that clones notebooks from a git repository into the students' accounts. Students initiate this process by clicking "INTERACT" buttons inside the Data 8 online textbook at inferentialthinking.com. Data 8 is using okpy for grading, in conjunction with gradescope. We have also developed a handful of other tools: nbserverproxy, a generic web service proxy written to accommodate pycortex's WebGL-based views; nbrsessionproxy, to proxy RStudio Server sessions launched from Jupyter; nbgdrive, to backup notebooks to Google Drive. Foundations of Data Science is an innovative introduction to core concepts of computer programming and statistics using Jupyter notebooks and a custom Python library. The Foundations course is complemented by other “connector courses”, introducing diverse subjects through the lens of data science: CIV ENG 88B • COGSCI 88 • CS 88 • ESPM 88A • ESPM 88B • GEOG 88 • HIST 88 • INFO 88 • L&S 88-5 • Legal Studies 88 • MCB 88 • STAT 88 • STAT 89A • CS/Stat C100 • Stat 140 • Stat 28 “Notebook documents [contain] the inputs and outputs of a interactive session as well as additional text that accompanies the code but is not meant for execution. In this way, notebook files can serve as a complete computational record of a session, interleaving executable code with explanatory text, mathematics, and rich representations of resulting objects. These documents are internally JSON files and are saved with the .ipynb extension. Since JSON is a plain text format, they can be version-controlled and shared with colleagues.” JupyterHub Future Directions Please put in notes here to explain poster so that someone else could summarize what your poster is about Hosting notebooks in JupyterHub eliminates the need to replicate software installation on personal or lab devices. Students only need to visit a website to be granted access to their own pre-built instance of Jupyter. It also reduces the demand on physical lab resources, gives each student the same starting point, provides results that are more reproducible, and more importantly, frees up time for both learning and teaching. Scaling up (and down) to support more students using auto-scaling in Kubernetes. Enable instructors to more easily create custom class hubs. More Google Drive or Box integration for notebook storage and distribution Roster-based account provisioning via okpy and Canvas APIs Your Courses Deployments Are you interested in using Jupyter notebooks for your course, workshop, or intensive? We'd like to hear about it! Please contact: ds-infrastructure@lists.berkeley.edu The pilot course in Fall 2015 had 80 students using Intel-donated hardware. In Spring 2016, 450 students were hosted on Microsoft-donated Azure nodes. These deployments were based on Jessica Hamrick's JupyterHub for COGSCI 131. In Spring 2017 we migrated from Docker Swarm to Kubernetes on Google Compute Engine. The new software stack lets us easily scale, and we quickly deployed three hubs for 900 students. Further Reading http://data.berkeley.edu http://data8.org/ https://www.inferentialthinking.com/ https://github.com/data-8/jupyterhub-k8s