DATA - Data Science

DATA-50000 Mathematics for Data Scientists

Study of mathematical concepts used in data science applications. Topics include differentiation and integration of functions, optimization techniques, matrix operations, eigenvalues and eigenvectors, curve fitting, and discrete mathematics.

3

DATA-50100 Probability and Statistics for Data Scientists

This course covers aspects of probability theory and statistical analysis used in data science. Students will study elementary probability theory, basic combinatorics, conditional probability and
independence, Bayes’ rule, random variables, mathematical expectation, discrete and continuous distributions, estimation theory, and tests of hypotheses. This course requires the use of statistical
computing with the R programming language for solving sample problems.

3

Prerequisites

DATA 50000 or prior coursework in Calculus

DATA-51000 Data Mining and Analytics

This course covers techniques for knowledge extraction in very large-scale data. Students will learn how to analyze real-world datasets using different data mining techniques like document similarity detection, association rule mining, clustering, link analysis, and predictive modeling. Topics also include applications for e-advertising and recommendation systems.

3

Prerequisites

CPSC 50200 or DATA 50000, and CPSC 50100, DATA 51100, or prior programming experience, or an undergraduate degree in Computer Science

DATA-51100 Statistical Programming

Programming structures and algorithms for large-scale statistical data processing and visualization. Students will use commonly available data analysis software packages to apply concepts and skills to large data sets and will also develop their own code using an object­oriented programming language.

3

Prerequisites

CPSC 50100 or prior programming experience

DATA-51200 Multivariate Data Analysis

This course explores statistical techniques for analysis of multivariate data. It covers exploratory factor analysis, multiple regression analysis, multiple discriminant analysis, logistic regression, multivariate analysis of variance and covariance, general linear models, and cluster analysis. Extensive use of statistical software is required. 
3

Prerequisites

DATA 50100

DATA-53000 Data Visualization

The theory and practice of visualizing large, complicated data sets to clarify areas of emphasis. Human factors best practices will be presented. Programming with advanced visualization frameworks and practices will be demonstrated and used in group programming projects.

3

Prerequisites

CPSC 50100, DATA 51100, or prior programming experience

DATA-54000 Large-Scale Data Storage Systems

The design and operation of large-scale, cloud-based systems for storing data. Topics include operating system virtualization, distributed network storage; distributed computing, cloud models (IAAS, PAAS, and SAAS), and techniques for securing cloud and virtual systems.

3

Prerequisites

CPSC 50100, DATA 51100, or prior programming experience

DATA-55000 Supervised Machine Learning

This course covers methods and theory related to generating predictive models from labeled datasets. Students will get introduced to computational learning theory, study algorithms for generating predictive models, perform feature selection and hyperparameter tuning, and learn how to evaluate model performance. Examples of supervised machine learning techniques covered in the course include naïve Bayes learning, logistic regression, decision tree induction, support vector machines, and deep neural networks. Other, recent developments and state-of-the art methods related to supervised learning may also be covered. Students will be required to write programs that demonstrate machine learning techniques on real-world datasets.
3

Prerequisites

CPSC 50200 or DATA 50000, and CPSC 50100, DATA 51100, or prior programming experience

DATA-55100 Unsupervised Machine Learning

This course will survey leading algorithms for unsupervised learning and high dimensional data analysis. The first part of the course will cover clustering algorithms and generative models of high dimensional data, such as distance/similarity measures, k-means clustering, hierarchical clustering, Fuzzy C-Means (FCM), Possibilistic C-Means (PCM), Principal Components Analysis (PCA), and Linear Discriminant Analysis (LDA). The second part of the course will cover spectral methods for dimensionality reduction, including multidimensional scaling, spectral clustering, and manifold learning. The third part of the course will cover self-organizing maps (SOMs) as well as an introduction to semi-supervised learning. Other, recent developments and state-of-the art methods related to unsupervised learning may also be covered. 
3

Prerequisites

CPSC 50200 or DATA 50000, and CPSC 50100, DATA 51100, or prior programming experience

DATA-55200 Semantic Web

Expressing relationships among items in a way that enables automated, distributed analysis in an application-independent way; text mining to derive meaning from semantic networks; algorithms for processing semantic networks; developing a web of things.

3

Prerequisites

CPSC 50100, DATA 51100, or prior programming experience

DATA-56600 Digital Image Processing

This course provides an introduction to basic concepts, methodologies, and algorithms of digital image processing focusing on the following two major problems concerned with digital images: image enhancement and restoration for easier interpretation of images, and image analysis and object recognition. Some advanced image processing and computer vision techniques (e.g., object detection and tracking or camera models and stereo vision) might also be studied in this course. The primary goal of this course is to lay a solid foundation for students to study advanced image analysis topics such as computer vision systems, biomedical image analysis, and multimedia processing and retrieval. 
3

Prerequisites

DATA 50000 and CPSC 50100

DATA-59000 Data Science Master's Project

The capstone experience for students pursuing the Computer Science concentration in Data Science. Students will develop a solution for a real-world problem in data mining and analytics, document their work in a scholarly report, and present their methodology and results to faculty and peers.

3

Prerequisites

A minimum of 24 hours earned in the MS Data Science program.

DATA-59500 Data Science Master's Thesis Research

In this course, students will work with a faculty advisor on research in the field of Data Science or its applications. The student will research open problems in data science, select a topic for their thesis, and implement novel solutions, which will be documented in a formal thesis. The course will require students to form a thesis committee and defend their thesis before graduating from the program. This course is meant to be repeated three times to fulfill the concentration requirements.

3

Prerequisites

Permission from Data Science Program Director.

DATA-61000 Advanced Data Mining and Prescriptive Analytics

In this course, students will learn how to utilize advanced data mining techniques for use in improving decision making. The topics covered include generation of predictive models, optimal decision making, computational simulation systems, expert and recommendation systems.
3

Prerequisites

DATA 51000 and DATA 51100

DATA-62500 Data Mining for Cyber Security

The application of Data Science techniques is of increasing importance in computer security. Data mining and machine learning algorithms are now extensively employed in detecting cyber-attacks, developing authentication methods that distinguish legitimate from illegitimate users, and testing the strength of existing security technologies. In this course, students will learn how to use data mining techniques to solve real-world security problems, processing datasets, training models, and deploying solutions to strengthen a system’s defenses.
3

Prerequisites

DATA 55000