Once students have completed the certificate's four courses (DATA 601, 602, 603 and 604) they are able to enter the diploma program, with a specialization in Data Science, and complete the following courses:
Introduces deeper tools, skills, and techniques for collecting, manipulating, visualizing, analyzing, and presenting a number of different common types of data. With a data life-cycle perspective, looks into data elicitation and preparation as well as the actual usage of data in a decision-making context. Introduces techniques for visualizing and supporting the interactive analysis and decision making on large complex datasets. Focus on critical thinking and good analysis practices to avoid cognitive biases when designing, thinking, analyzing, and making decisions based on data.
Design of surveys and data collection, bias and efficiency of surveys. Sampling weights and variance estimation. Multi-way contingency tables and introduction to generalized linear models with emphasis on applications.
Advancement of the linear statistical model including introduction to data transformation methods, classification, model assessment and selection. Exposure to both supervised learning and unsupervised learning.
Provides advanced coverage of tools and techniques for big data management and for processing, mining, and building applications that leverage large datasets. Addresses database and distributed storage design for both SQL and NoSQL systems, and focuses on the application of distributed computing tools to perform data integration, apply machine learning, and build applications that leverage big data. Students will also examine the security and ethical implications of large-scale data collection and analysis.