Proposed Courses

Existing Courses
Proposed New Data Science Courses
Course Descriptions for Approved Electives

Existing Course

DATA 5000 [0.5 credit]
Data Science Seminar
Cloud based distributed systems, statistics, machine learning, use of complex ecosystems of tools and platforms, and communication skills to explain advanced analytics. Students choose a project in Big Data management and/or analysis, deliver a paper and give a class presentation on their findings.

Proposed New Data Science Courses

DATA 5001 [0.5 credit]
Fundamentals in Data Science and Analytics
Visualization and knowledge discovery in massive datasets; unsupervised learning: clustering algorithms, dimension reduction; supervised learning: pattern recognition, smoothing techniques, classification. Precludes additional credit for STAT 5703.

DATA 5908
Project – M.Sc.

DATA 5918
Project – M.I.T.

DATA 5928
Project – M.Eng.

DATA 5909
Thesis M.Sc.

DATA 5919
Thesis M.I.T.

DATA 5929
Thesis M.A.Sc.

DATA 5939
Thesis MCS

DATA 6909
Dissertation

Course Descriptions for Approved Electives – For information only

Computer Science:

COMP 5101 [0.5 credit] (CSI 5311)
Distributed Databases and Transaction Processing Systems
Principles in the design and implementation of distributed databases and distributed transaction processing systems. Topics include: distributed computing concepts, computing networks, distributed and multi-database system architectures and models, atomicity, synchronization and distributed concurrency control algorithms, data replication, recovery techniques, reliability in distributed databases.

COMP 5107 [0.5 credit] (CSI 5185)
Statistical and Syntactic Pattern Recognition
Topics include a mathematical review, Bayes decision theory, maximum likelihood and Bayesian learning for parametric pattern recognition, non-parametric methods including nearest neighbor and linear discriminants. Syntactic recognition of strings, substrings, subsequences and tree structures. Applications include speech, shape and character recognition.

COMP 5111 [0.5 credit] (CSI 5153)
Data Management for Business Intelligence
Application of computational techniques to support business such as decision making, business understanding, data analysis, business process automation, learning from data, producing and using business models, data integration, data quality assessment and cleaning, use of contextual data, etc.
Also offered at the undergraduate level, with different requirements, as COMP 4111, for which additional credit is precluded.

COMP 5112 [0.5 credit] (CSI 5154)
Algorithms for Data Science
Algorithmic techniques to handle (massive/big) data arising from, for example, social media, mobile devices, sensors financial transactions. Algorithmic techniques may include locality-sensitive hashing, dimensionality reduction, streaming, clustering, VC-dimensions, external memory, core sets, link analysis and recommendation systems.

COMP 5113 [0.5 credit]
Machine Learning for Healthcare
Principles, techniques, technology and applications of machine learning for medical data such as medical imaging data, genomic data, physiological signals, speech and language.

COMP 5116 [0.5 credit]
Machine Learning
This course provides a broad introduction to the fundamental concepts, techniques and algorithms in machine learning.

COMP 5117 [0.5 credit]
Mining Software Repositories
Introduction to the methods and techniques of mining software engineering data. Software repositories and their associated data. Data extraction and mining. Data analysis and interpretation (statistics, metrics, machine learning). Empirical case studies.

COMP 5118 [0.5 credit]
Recent Trends in Big Data Management
Introduction to data management systems that affect our lives daily, from the systems that laid the foundations for today’s management of data in giants like Google and Facebook to the most recent trends in data management research.

COMP 5209 [0.5 credit] (CSI 5140)
Visual Analytics
Principles, techniques, technology and applications of information visualization for data analysis. Topics include human visual perception, cognitive processes, static and dynamic models of image semantics, interaction paradigms, big data visual analysis case studies.

COMP 5306 [0.5 credit] (CSI 5100)
Data Integration
Materialized and virtual approaches to integration of heterogeneous and independent data sources. Emphasis on data models, architectures, logic-based techniques for query processing, metadata and consistency management, the role of XML and ontologies in data integration; connections to schema mapping, data exchange, and P2P systems.

COMP 5704 [0.5 credit] (CSI 5131)
Parallel Algorithms and Applications in Data Science
Multiprocessor architectures from an application programmer’s perspective: programming models, processor clusters, multi-core processors, GPU’s, algorithmic paradigms, efficient parallel problem solving, scalability and portability. Projects on high performance computing in Data Science, incl. data analytics, bioinformatics, simulations. Programming experience on parallel processing equipment.

Information Technology:

ITEC 5102/SYSC 5500 [0.5 credit]
Designing Secure Networking and Computer Systems
Network security with coverage of computer security in support of networking concepts. Covers various security issues in data networks at different protocol layers. Routing security, worm attacks, and botnets. Security of new mobile networks and emerging networked paradigms such as social networks and cloud computing.

ITEC 5103 [0.5 credit]
Cloud and Datacentre Networking
Special issues of the networking requirements in datacentres and cloud computing environments. Performance, power requirements, redundancy of datacentre networks.

ITEC 5205 [0.5 credit]
Design and Development of Data-Intensive Applications
Design and development of data-intensive applications dealing with large-scale data. Data may include spatial data, time series, text, social media and different forms of digital media. Data modeling and management techniques will be discussed that enhance data analysis techniques and improve data-intensive applications.

ITEC 5206 [0.5 credit]
Data Protection and Legal Issues
Data privacy, security, protection, and related legal issues when dealing with data and information. Insights to understanding of the data privacy rules, regulations, laws, or policies relevant to different jurisdictions, rights, and responsibilities for protecting data and personal information.

ITEC 5207 [0.5 credit]
Data Interaction Techniques
Design and development of how humans (e.g., end-users, knowledge-users and expert-users) interact with data ecosystem like data collection, storage, analysis and visualization. Techniques, methods and tools will be discussed on how humans interact with data based on capabilities of machines and needs of humans.

Statistics:

STAT 5504 [0.5 credit] (MAT 5194)
Stochastic Processes and Time Series Analysis
Stationary stochastic processes, inference for stochastic processes, applications to time series and spatial series analysis.

STAT 5509 [0.5 credit] (MAT 5196)
Multivariate Analysis
Multivariate methods of data analysis, including principal components, cluster analysis, factor analysis, canonical correlation, MANOVA, profile analysis, discriminant analysis, path analysis.

STAT 5702 [0.5 credit] (MAT 5182)
Modern Applied and Computational Statistics
Resampling and computer intensive methods: bootstrap, jackknife with applications to bias estimation, variance estimation, confidence intervals, and regression analysis. Smoothing methods in curve estimation; statistical classification and pattern recognition: error counting methods, optimal classifiers, bootstrap estimates of the bias of the misclassification error.

STAT 5713 [0.5 credit]
Advanced Data Mining
Topics from recent literature on mining complex data structures and data such as: tree/graph, sequence, web/test, stream, spatiotemporal, high-dimensional, multivariate time series, mixed-mode; clustering (EM, topic modeling, fuzzy), SVM; multi-label learning; deep learning; combining learners, network analysis/link prediction/graphical models (Bayesian, Markov networks); anomaly detection.

Systems and Computer Engineering:

SYSC 5103 [0.5 credit] (ELG 6113)
Software Agents
Agent-based programming; elements of Distributed Artificial Intelligence; beliefs, desires and intentions; component-based technology; languages for agent implementations; interface agents; information sharing and coordination; KIF; collaboration; communication; ontologies; KQML; autonomy; adaptability; security issues; mobility; standards; agent design issues and frameworks, applications in telecommunications.

SYSC 5206 [0.5 credit]
Resource Management on Distributed Systems
Principles and techniques for resource management on distributed systems including clouds, grids and data analytics platforms; management of computing and storage resources; service level agreements; performance and energy aware techniques for scheduling, allocation, dynamic resource provisioning; cyber-physical systems and BigData; resource management for BigData analytics.

SYSC 5405 [0.5 credit]
Pattern Classification and Experiment Design
Introduction to a variety of supervised and unsupervised pattern classification techniques with emphasis on correct application. Statistically rigorous experimental design and reporting of performance results. Case studies will be drawn from various fields including biomedical informatics. Also listed as BIOM 5405.

SYSC 5703 [0.5 credit] (ELG 6173)
Integrated Database Systems
Database definitions, applications, architectures. Conceptual design based on entity-relationship, object-oriented models. Relational data model: relational algebra and calculus, normal forms, data definition and manipulation languages. Database management systems: transaction management, recovery and concurrency control. Current trends: object-oriented, knowledge-based, multimedia, distributed databases.