Department of Computer Science and Technology – Course pages 2021–22: Concurrent and Distributed Systems skip to primary navigationskip to content Study at Cambridge About the University Research at Cambridge Search site Home Study at Cambridge Undergraduate Courses Applying Events and open days Fees and finance Student blogs and videos Graduate Why Cambridge Course directory How to apply Fees and funding Frequently asked questions International students Continuing education Executive and professional education Courses in education About the University How the University and Colleges work History Visiting the University Term dates and calendars Map For media Video and audio Find an expert Publications Global Cambridge News Events Public engagement Jobs Give to Cambridge Research at Cambridge For staff For current students For alumni For business Colleges & departments Libraries & facilities Museums & collections Email & phone search Computer Laboratory Teaching Courses 2021–22 Concurrent and Distributed Systems Department of Computer Science and Technology Undergraduate Course pages 2021–22 Concurrent and Distributed Systems Syllabus Course materials Recordings Information for supervisors Principal lecturers: Dr David Greaves, Dr Martin Kleppmann Taken by: Part IB CST Hours: 16 Format: Video lectures and in-person Q&A sessions Suggested hours of supervisions: 4 Prerequisites: Object-Oriented Programming, Operating Systems This course is a prerequisite for: Cloud Computing Past exam questions Aims This course considers two closely related topics, Concurrent Systems and Distributed Systems, over 16 lectures. The aim of the first half of the course is to introduce concurrency control concepts and their implications for system design and implementation. The aims of the latter half of the course are to study the fundamental characteristics of distributed systems, including their models and architectures; the implications for software design; some of the techniques that have been used to build them; and the resulting details of good distributed algorithms and applications. Lectures: Concurrency Introduction to concurrency, threads, and mutual exclusion. Introduction to concurrent systems; threads; interleaving; preemption; parallelism; execution orderings; processes and threads; kernel vs. user threads; M:N threads; atomicity; mutual exclusion; and mutual exclusion locks (mutexes). Automata Composition. Synchronous and asynchronous parallelism; sequential consistency; rendezvous. Safety, liveness and deadlock; the Dining Philosophers; Hardware foundations for atomicity: test-and-set, load-linked/store-conditional and fence instructions. Lamport bakery algorithm. Common design patterns: semaphores, producer-consumer, and MRSW. Locks and invariants; semaphores; condition synchronisation; N-resource allocation; two-party and generalised producer-consumer; Multi-Reader, Single-Write (MRSW) locks. CCR, monitors, and concurrency in practice. Conditional critical regions (CCR); monitors; condition variables; signal-wait vs. signal-continue semantics; concurrency in practice (kernels, pthreads, Java, Cilk, OpenMP). Deadlock and liveness guarantees Offline vs. online; model checking; resource allocation graphs; lock order checking; deadlock prevention, avoidance, detection, and recovery; livelock; priority inversion; auto parallelisation. Concurrency without shared data; transactions. Active objects; message passing; tuple spaces; CSP; and actor models. Composite operations; transactions; ACID; isolation; and serialisability. Further transactions History graphs; good and bad schedules; isolation vs. strict isolation; 2-phase locking; rollback; timestamp ordering (TSO); and optimistic concurrency control (OCC). Crash recovery, lock-free programming, and transactional memory. Write-ahead logging, checkpoints, and recovery. Lock-free programming. Hardware and software transactional memories. Lectures: Distributed Systems Introduction to distributed systems; RPC. Avantages and challenges of distributed systems; unbounded delay and partial failure; network protocols; transparency; client-server systems; remote procedure call (RPC); marshalling; interface definition languages (IDLs). System models and faults. Synchronous, partially synchronous, and asynchronous network models; crash-stop, crash-recovery, and Byzantine faults; failures, faults, and fault tolerance; two generals problem. Time, clocks, and ordering of events. Physical clocks; leap seconds; UTC; clock synchronisation and drift; Network Time Protocol (NTP). Causality; happens-before relation. Logical time; Lamport clocks; vector clocks. Broadcast (FIFO, causal, total order); gossip protocols. Replication. Quorums; idempotence; replica consistency; read-after-write consistency. State machine replication; leader-based replication. Consensus and total order broadcast. FLP result; leader election; the Raft consensus algorithm. Replica consistency. Two-phase commit; relationship between 2PC and consensus; linearizability; ABD algorithm; eventual consistency; CAP theorem. Case studies. Conflict-free Replicated Data Types (CRDTs); collaborative text editing. Google's Spanner; TrueTime. Objectives At the end of Concurrent Systems portion of the course, students should: understand the need for concurrency control in operating systems and applications, both mutual exclusion and condition synchronisation; understand how multi-threading can be supported and the implications of different approaches; be familiar with the support offered by various programming languages for concurrency control and be able to judge the scope, performance implications and possible applications of the various approaches; be aware that dynamic resource allocation can lead to deadlock; understand the concept of transaction; the properties of transactions, how they can be implemented, and how their performance can be optimised based on optimistic assumptions; understand how the persistence properties of transactions are addressed through logging; and have a high-level understanding of the evolution of software use of concurrency in the operating-system kernel case study. At the end of the Distributed Systems portion of the course, students should: understand the difference between shared-memory concurrency and distributed systems; understand the fundamental properties of distributed systems and their implications for system design; understand notions of time, including logical clocks, vector clocks, and physical time synchronisation; be familiar with various approaches to data and service replication, as well as the concept of replica consistency; understand the effects of large scale on the provision of fundamental services and the tradeoffs arising from scale; appreciate the implications of individual node and network communications failures on distributed computation; be aware of a variety of programming models and abstractions for distributed systems, such as RPC, middleware, and total order broadcast; be familiar with a range of distributed algorithms, such as consensus, causal broadcast, and two-phase commit. Recommended reading Modern Operating Systems (free PDF available online) by Andrew S Tanenbaum, Herbert Bos Java Concurrency in Practice' (2006 but still highly relevant) by Brian Goetz Kleppmann, M. (2017). Designing data-intensive applications. O’Reilly. Tanenbaum, A.S. and van Steen, M. (2017). Distributed systems, 3rd edition. available online. Cachin, C., Guerraoui, R. and Rodrigues, L. (2011) Introduction to Reliable and Secure Distributed Programming. Springer (2nd edition). © 2021 Department of Computer Science and Technology, University of Cambridge Information provided by Dr David Greaves – edit page University A-Z Contact the University Accessibility Freedom of information Terms and conditions Study at Cambridge Undergraduate Graduate International students Continuing education Executive and professional education Courses in education About the University How the University and Colleges work Visiting the University Map News Events Jobs Give to Cambridge Research at Cambridge News Features Discussion Spotlight on... About research at Cambridge