Diploma

# Graduate Diploma in Data Science

- CRICOS Code: 095994M

### Navigation

## What will I study?

### Overview

Successful completion of **100 credit points**, made up of:

- Core statistics subjects (50 points)
- Core computer science subjects (50 points).

In the course, you'll cover:

- Statistical modelling and inference
- Algorithms, machine learning and data mining
- Database systems
- Using a range of methods to conduct analyses
- Reporting on analytical findings.

#### Tailoring the course to you

Your subjects will be tailored to you, depending on your previous academic background.

You'll be allocated to one of four streams (Engineering and Science, Computer Science, Statistics or Commerce and Arts), and automatically assessed for credit (Advanced Standing), during the selection process.

You will be automatically assessed for credit (advanced standing) during the selection process. If you've already studied some of the core subjects (or their equivalents), you may be granted exemptions. In these cases, you'll take additional subjects to yield 100 points in total (50 points of statistics subjects and 50 points of computer science subjects).

### Explore this course

Explore the subjects you could choose as part of this diploma.

### Engineering and Science Stream

##### Complete all of the following subjects:

- Algorithms and Complexity12.5
### Algorithms and Complexity

**AIMS**The aim of this subject is for students to develop familiarity and competence in assessing and designing computer programs for computational efficiency. Although computers manipulate data very quickly, to solve large-scale problems, we must design strategies so that the calculations combine effectively. Over the latter half of the 20th century, an elegant theory of computational efficiency developed. This subject introduces students to the fundamentals of this theory and to many of the classical algorithms and data structures that solve key computational questions. These questions include distance computations in networks, searching items in large collections, and sorting them in order.

**INDICATIVE CONTENT**

Topics covered include complexity classes and asymptotic notation; empirical analysis of algorithms; abstract data types including queues, trees, priority queues and graphs; algorithmic techniques including brute force, divide-and-conquer, dynamic programming and greedy approaches; space and time trade-offs; and the theoretical limits of algorithm power.

- Programming and Software Development12.5
### Programming and Software Development

**AIMS**The aims for this subject is for students to develop an understanding of approaches to solving moderately complex problems with computers, and to be able to demonstrate proficiency in designing and writing programs. The programming language used is Java.

**INDICATIVE CONTENT**

Topics covered will include:

- Java basics
- Console input/output
- Control flow
- Defining classes
- Using object references
- Programming with arrays
- Inheritance
- Polymorphism and abstract classes
- Exception handling
- UML basics
- Interfaces
- Generics.

- Database Systems & Information Modelling12.5
### Database Systems & Information Modelling

**AIMS**The subject introduces key topics in modern information organization, particularly with regard to structured databases. The well-founded relational theory behind modern structured query language (SQL) engines, has given them as much a place behind the web site of an organization and on the desktop, as they traditionally enjoyed on corporate mainframes. Topics covered may include: the managerial view of data, information and knowledge; conceptual, logical and physical data modelling; normalization and de-normalization; the SQL language; data integrity; transaction processing, data warehousing, web services and organizational memory technologies. This is a core foundation subject for both the Master of Information Systems and Master of Information Technology.

**INDICATIVE CONTENT**This subject serves as an introduction to databases and data modelling from a data management perspective. Database design, from conceptual design through to physical implementation will be covered. This will include Entity Relationship modelling, normalisation and de-normalisation and SQL. Additionally the use of databases in various contexts will be explored (web based databases, connecting programs to databases, data warehousing, health contexts, geospatial databases).

- Knowledge Technologies12.5
### Knowledge Technologies

**AIMS**Much of the world's knowledge is stored in the form of unstructured data (e.g. text) or implicitly in structured data (e.g. databases). In this subject students will learn algorithms and data structures for extracting, retrieving and analysing explicit knowledge from various data sources, with a focus on the web. Topics include: data encoding and markup, web crawling, regular expressions, document indexing, text retrieval, clustering, classification and prediction, pattern mining, and approaches to evaluation of knowledge technologies.

**INDICATIVE CONTENT**Introduction to Knowledge Technologies; String search; Genomics; Text processing and search; Web search and retrieval; Introduction to Data Mining; Introduction to basic Probability; Classification; Association Rules; Clustering; Evaluation measures.

Examples of projects that students may completed are:

- A method for automatically predicting the geo-location of a Twitter user on the basis of their posts
- An automatic method for tagging multilingual Wikipedia documents with Wikipedia categories
- A search engine for Twitter data, which takes into account the time stamp of the query and documents
- A search engine for web user forum data
- A search engine servicing mixed monolingual queries (as in monolingual queries from a range of languages) over a large-scale document collection
- Classification and prediction of some real world problems using machine learning techniques.

- Methods of Mathematical Statistics25
### Methods of Mathematical Statistics

This subject introduces probability and the theory underlying modern statistical inference. Properties of probability are reviewed, univariate and multivariate random variables are introduced, and their properties are developed. It demonstrates that many commonly used statistical procedures arise as applications of a common theory. Both classical and Bayesian statistical methods are developed. Basic statistical concepts including maximum likelihood, sufficiency, unbiased estimation, confidence intervals, hypothesis testing and significance levels are discussed. Computer packages are used for numerical and theoretical calculations.

- A First Course In Statistical Learning25
### A First Course In Statistical Learning

Supervised statistical learning is based on the widely used linear models that model a response as a linear combination of explanatory variables. Initially this subject develops an elegant unified theory for a quantitative response that includes the estimation of model parameters, hypothesis testing using analysis of variance, model selection, diagnostics on model assumptions, and prediction. Some classification methods for qualitative responses are then developed. This subject then considers computational techniques, including the EM algorithm. Bayes methods and Monte-Carlo methods are considered. The subject concludes by considering some unsupervised learning techniques.

### Computer Science Stream

##### Complete all of the following subjects:

- Internet Technologies12.5
### Internet Technologies

**AIMS**The subject will introduce the basics of computer networks to students through a study of layered models of computer networks and applications. The first half of the subject deals with data communication protocols in the lower layers of OSI and TCP/IP reference models. The students will be exposed to the working of various fundamental networking technologies such as wireless, LAN, RFID and sensor networks. The second half of the subject deals with the upper layers of the TCP/IP reference model through a study of several Internet applications.

**INDICATIVE CONTENT**

Topics covered include: Introduction to Internet, OSI reference model layers, protocols and services, data transmission basics, interface standards, network topologies, data link protocols, message routing, LANs, WANs, TCP/IP suite, detailed study of common network applications (e.g., email, news, FTP, Web), network management, current and future developments in network hardware and protocols.

- Knowledge Technologies12.5
### Knowledge Technologies

**AIMS**Much of the world's knowledge is stored in the form of unstructured data (e.g. text) or implicitly in structured data (e.g. databases). In this subject students will learn algorithms and data structures for extracting, retrieving and analysing explicit knowledge from various data sources, with a focus on the web. Topics include: data encoding and markup, web crawling, regular expressions, document indexing, text retrieval, clustering, classification and prediction, pattern mining, and approaches to evaluation of knowledge technologies.

**INDICATIVE CONTENT**Introduction to Knowledge Technologies; String search; Genomics; Text processing and search; Web search and retrieval; Introduction to Data Mining; Introduction to basic Probability; Classification; Association Rules; Clustering; Evaluation measures.

Examples of projects that students may completed are:

- A method for automatically predicting the geo-location of a Twitter user on the basis of their posts
- An automatic method for tagging multilingual Wikipedia documents with Wikipedia categories
- A search engine for Twitter data, which takes into account the time stamp of the query and documents
- A search engine for web user forum data
- A search engine servicing mixed monolingual queries (as in monolingual queries from a range of languages) over a large-scale document collection
- Classification and prediction of some real world problems using machine learning techniques.

- Methods of Mathematical Statistics25
### Methods of Mathematical Statistics

This subject introduces probability and the theory underlying modern statistical inference. Properties of probability are reviewed, univariate and multivariate random variables are introduced, and their properties are developed. It demonstrates that many commonly used statistical procedures arise as applications of a common theory. Both classical and Bayesian statistical methods are developed. Basic statistical concepts including maximum likelihood, sufficiency, unbiased estimation, confidence intervals, hypothesis testing and significance levels are discussed. Computer packages are used for numerical and theoretical calculations.

- A First Course In Statistical Learning25
### A First Course In Statistical Learning

Supervised statistical learning is based on the widely used linear models that model a response as a linear combination of explanatory variables. Initially this subject develops an elegant unified theory for a quantitative response that includes the estimation of model parameters, hypothesis testing using analysis of variance, model selection, diagnostics on model assumptions, and prediction. Some classification methods for qualitative responses are then developed. This subject then considers computational techniques, including the EM algorithm. Bayes methods and Monte-Carlo methods are considered. The subject concludes by considering some unsupervised learning techniques.

- Database Systems & Information Modelling12.5
### Database Systems & Information Modelling

**AIMS**The subject introduces key topics in modern information organization, particularly with regard to structured databases. The well-founded relational theory behind modern structured query language (SQL) engines, has given them as much a place behind the web site of an organization and on the desktop, as they traditionally enjoyed on corporate mainframes. Topics covered may include: the managerial view of data, information and knowledge; conceptual, logical and physical data modelling; normalization and de-normalization; the SQL language; data integrity; transaction processing, data warehousing, web services and organizational memory technologies. This is a core foundation subject for both the Master of Information Systems and Master of Information Technology.

**INDICATIVE CONTENT**This subject serves as an introduction to databases and data modelling from a data management perspective. Database design, from conceptual design through to physical implementation will be covered. This will include Entity Relationship modelling, normalisation and de-normalisation and SQL. Additionally the use of databases in various contexts will be explored (web based databases, connecting programs to databases, data warehousing, health contexts, geospatial databases).

- Algorithms and Complexity12.5
### Algorithms and Complexity

**AIMS**The aim of this subject is for students to develop familiarity and competence in assessing and designing computer programs for computational efficiency. Although computers manipulate data very quickly, to solve large-scale problems, we must design strategies so that the calculations combine effectively. Over the latter half of the 20th century, an elegant theory of computational efficiency developed. This subject introduces students to the fundamentals of this theory and to many of the classical algorithms and data structures that solve key computational questions. These questions include distance computations in networks, searching items in large collections, and sorting them in order.

**INDICATIVE CONTENT**

Topics covered include complexity classes and asymptotic notation; empirical analysis of algorithms; abstract data types including queues, trees, priority queues and graphs; algorithmic techniques including brute force, divide-and-conquer, dynamic programming and greedy approaches; space and time trade-offs; and the theoretical limits of algorithm power.

##### Select one of the following subjects:

- Database Systems & Information Modelling12.5
### Database Systems & Information Modelling

**AIMS**The subject introduces key topics in modern information organization, particularly with regard to structured databases. The well-founded relational theory behind modern structured query language (SQL) engines, has given them as much a place behind the web site of an organization and on the desktop, as they traditionally enjoyed on corporate mainframes. Topics covered may include: the managerial view of data, information and knowledge; conceptual, logical and physical data modelling; normalization and de-normalization; the SQL language; data integrity; transaction processing, data warehousing, web services and organizational memory technologies. This is a core foundation subject for both the Master of Information Systems and Master of Information Technology.

**INDICATIVE CONTENT**This subject serves as an introduction to databases and data modelling from a data management perspective. Database design, from conceptual design through to physical implementation will be covered. This will include Entity Relationship modelling, normalisation and de-normalisation and SQL. Additionally the use of databases in various contexts will be explored (web based databases, connecting programs to databases, data warehousing, health contexts, geospatial databases).

- Advanced Database Systems12.5
### Advanced Database Systems

**AIMS**Many applications require access to very large amounts of data. These applications often require reliability (data must not be lost even in the presence of hardware failures), and the ability to retrieve and process the data very efficiently.

The subject will cover the technologies used in advanced database systems. Topics covered will include: transactions, including concurrency, reliability (the ACID properties) and performance; and indexing of both structured and unstructured data. The subject will also cover additional topics such as: uncertain data; Xquery; the Semantic Web and the Resource Description Framework; dataspaces and data provenance; datacentres; and data archiving.**INDICATIVE CONTENT**

Topics include:

- Introduction to High Performance Database Systems
- Issues of Performance and Reliability
- Transaction Processing
- Recovery from Failures
- Map Reduce Models.

##### Plus one of the following subjects:

- Algorithms and Complexity12.5
### Algorithms and Complexity

**AIMS**The aim of this subject is for students to develop familiarity and competence in assessing and designing computer programs for computational efficiency. Although computers manipulate data very quickly, to solve large-scale problems, we must design strategies so that the calculations combine effectively. Over the latter half of the 20th century, an elegant theory of computational efficiency developed. This subject introduces students to the fundamentals of this theory and to many of the classical algorithms and data structures that solve key computational questions. These questions include distance computations in networks, searching items in large collections, and sorting them in order.

**INDICATIVE CONTENT**

Topics covered include complexity classes and asymptotic notation; empirical analysis of algorithms; abstract data types including queues, trees, priority queues and graphs; algorithmic techniques including brute force, divide-and-conquer, dynamic programming and greedy approaches; space and time trade-offs; and the theoretical limits of algorithm power.

- Models of Computation12.5
### Models of Computation

**AIMS**Formal logic and discrete mathematics provide the theoretical foundations for computer science. This subject uses logic and discrete mathematics to model the science of computing. It provides a grounding in the theories of logic, sets, relations, functions, automata, formal languages, and computability, providing concepts that underpin virtually all the practical tools contributed by the discipline, for automated storage, retrieval, manipulation and communication of data.

**INDICATIVE CONTENT**- Logic: Propositional and predicate logic, resolution proofs, mathematical proof
- Discrete mathematics: Sets, functions, relations, order, well-foundedness, induction and recursion
- Automata: Regular languages, finite-state automata, context-free grammars and languages, parsing
- Computability briefly: Turing machines, computability, decidability.

A functional programming language will be used to implement and illustrate concepts.

### Statistics Stream

##### Complete all of the following subjects:

- Introduction to Programming12.5
### Introduction to Programming

**AIMS**This subject introduces the fundamental concepts of computing programming, and how to solve simple problems using high-level procedural language, with a specific emphasis on data manipulation, transformation, and visualisation of data.

**INDICATIVE CONTENT**Fundamental programming constructs; fundamental data structures; abstraction; basic program structures; algorithmic problem solving; use of modules.

The subject assumes no prior knowledge of computer programming.

- Algorithms and Complexity12.5
### Algorithms and Complexity

**AIMS****INDICATIVE CONTENT**

- Programming and Software Development12.5
### Programming and Software Development

**AIMS**The aims for this subject is for students to develop an understanding of approaches to solving moderately complex problems with computers, and to be able to demonstrate proficiency in designing and writing programs. The programming language used is Java.

**INDICATIVE CONTENT**

Topics covered will include:

- Java basics
- Console input/output
- Control flow
- Defining classes
- Using object references
- Programming with arrays
- Inheritance
- Polymorphism and abstract classes
- Exception handling
- UML basics
- Interfaces
- Generics.

- Database Systems & Information Modelling12.5
### Database Systems & Information Modelling

**AIMS****INDICATIVE CONTENT**

- A First Course In Statistical Learning25
### A First Course In Statistical Learning

Supervised statistical learning is based on the widely used linear models that model a response as a linear combination of explanatory variables. Initially this subject develops an elegant unified theory for a quantitative response that includes the estimation of model parameters, hypothesis testing using analysis of variance, model selection, diagnostics on model assumptions, and prediction. Some classification methods for qualitative responses are then developed. This subject then considers computational techniques, including the EM algorithm. Bayes methods and Monte-Carlo methods are considered. The subject concludes by considering some unsupervised learning techniques.

##### Select two of the following subjects:

- Probability for Inference12.5
### Probability for Inference

This subject introduces a measured-theoretic approach to probability theory and presents its fundamentals concepts and results.

Topics covered include: probability spaces and random variables, expectation, conditional expectation and distributions, elements of multivariate distribution theory, modes of convergence in probabilty theory, characteristics functions and their application in key limit theorems.

- Stochastic Modelling12.5
### Stochastic Modelling

Stochastic processes occur in finance as models for asset prices, in telecommunications as models for data traffic, in computational biology as hidden Markov models for gene structure, in chemistry as models for reactions, in manufacturing as models for assembly and inventory processes, in biology as models for the growth and dispersion of plant and animal populations, in speech pathology and speech recognition and many other areas.

This course introduces the theory of stochastic processes including Poisson processes, Markov chains in discrete and continuous time, and renewal processes. These processes are illustrated using examples from real-life situations. It then considers in more detail important applications in areas such as queues and networks (the foundation of telecommunication models), finance, and genetics.

- Statistical Genomics12.5
### Statistical Genomics

This subject introduces the biology and technology underlying modern genomics data, features of the resulting data types including the frequency and patterns of error and missingness, and the statistical methods used to analyse them. It will include hands-on data analysis using R software. The material covered will evolve as genomics technology and practice change, and will span the following four areas: introduction to genomics technology and the resulting data, population genetics, association analysis including tests of association and major sources of confounding, heritability and prediction both in human genetics and for animal and plant breeding, and analysis of expression quantitative trait loci.

- Computational Statistics & Data Science12.5
### Computational Statistics & Data Science

Computing techniques and data mining methods are indispensable in modern statistical research and data science applications, where “Big Data” problems are often involved. This subject will introduce a number of recently developed methods and applications in computational statistics and data science that are scalable to large datasets and high-performance computing. The data mining methods to be introduced include general model diagnostic and assessment techniques, kernel and local polynomial nonparametric regression, basis expansion and nonparametric spline regression, generalised additive models, classification and regression trees, forward stagewise and gradient boosting models. Important statistical computing algorithms and techniques used in data science will be explained in detail. These include the bootstrap resampling and inference, cross-validation, the EM algorithm and Louis method, and Markov chain Monte Carlo methods including adaptive rejection and squeeze sampling, sequential importance sampling, slice sampling, Gibbs sampler and Metropolis-Hastings algorithm.

### Commerce and Arts Stream

##### Complete all of the following subjects:

- Introduction to Programming12.5
### Introduction to Programming

**AIMS**This subject introduces the fundamental concepts of computing programming, and how to solve simple problems using high-level procedural language, with a specific emphasis on data manipulation, transformation, and visualisation of data.

**INDICATIVE CONTENT**Fundamental programming constructs; fundamental data structures; abstraction; basic program structures; algorithmic problem solving; use of modules.

The subject assumes no prior knowledge of computer programming.

- Algorithms and Complexity12.5
### Algorithms and Complexity

**AIMS****INDICATIVE CONTENT**

- Programming and Software Development12.5
### Programming and Software Development

**AIMS**The aims for this subject is for students to develop an understanding of approaches to solving moderately complex problems with computers, and to be able to demonstrate proficiency in designing and writing programs. The programming language used is Java.

**INDICATIVE CONTENT**

Topics covered will include:

- Java basics
- Console input/output
- Control flow
- Defining classes
- Using object references
- Programming with arrays
- Inheritance
- Polymorphism and abstract classes
- Exception handling
- UML basics
- Interfaces
- Generics.

- Database Systems & Information Modelling12.5
### Database Systems & Information Modelling

**AIMS****INDICATIVE CONTENT**

- Methods of Mathematical Statistics25
### Methods of Mathematical Statistics

This subject introduces probability and the theory underlying modern statistical inference. Properties of probability are reviewed, univariate and multivariate random variables are introduced, and their properties are developed. It demonstrates that many commonly used statistical procedures arise as applications of a common theory. Both classical and Bayesian statistical methods are developed. Basic statistical concepts including maximum likelihood, sufficiency, unbiased estimation, confidence intervals, hypothesis testing and significance levels are discussed. Computer packages are used for numerical and theoretical calculations.

- A First Course In Statistical Learning25
### A First Course In Statistical Learning