nabg 1 Operating systems Unix and related 1 Unix.org timeline ‐ 1 1969 The Beginning The history of UNIX starts back in 1969, when Ken Thompson, Dennis Ritchie and others started working on the "little-used PDP-7 in a corner" at Bell Labs and what was to become UNIX. 1971 First Edition It had a assembler for a PDP-11/20, file system, fork(), roff and ed. It was used for text processing of patent documents. 1973 Fourth Edition It was rewritten in C. This made it portable and changed the history of OS's. 1975 Sixth Edition UNIX leaves home. Also widely known as Version 6, this is the first to be widely available out side of Bell Labs. The first BSD version (1.x) was derived from V6. 1979 Seventh Edition It was a "improvement over all preceding and following Unices" [Bourne]. It had C, UUCP and the Bourne shell. It was ported to the VAX and the kernel was more than 40 Kilobytes (K). 1980 Xenix Microsoft introduces Xenix. 32V and 4BSD introduced. 1982 System III AT&T's UNIX System Group (USG) release System III, the first public release outside Bell Laboratories. SunOS 1.0 ships. HP-UX introduced. Ultrix-11 Introduced. 1983 System V Computer Research Group (CRG), UNIX System Group (USG) and a third group merge to become UNIX System Development Lab. AT&T announces UNIX System V, the first supported release. Installed base 45,000. 1984 4.2BSD University of California at Berkeley releases 4.2BSD, includes TCP/IP, new signals and much more. X/Open formed. 1984 SVR2 System V Release 2 introduced. At this time there are 100,000 UNIX installations around the world. 1986 4.3BSD 4.3BSD released, including internet name server. SVID introduced. NFS shipped. AIX announced. Installed base 250,000. 2 Unix.org timeline ‐ 2 1987 SVR3 System V Release 3 including STREAMS, TLI, RFS. At this time there are 750,000 UNIX installations around the world. IRIX introduced. 1988 POSIX.1 published. Open Software Foundation (OSF) and UNIX International (UI) formed. Ultrix 4.2 ships. Late 1993 SVR4.2MP Novell transfers rights to the "UNIX" trademark and the Single UNIX Specification to X/Open. COSE initiative delivers "Spec 1170" to X/Open for fasttrack. In December Novell ships SVR4.2MP , the final USL OEM release of System V 1994 Single UNIX Specification BSD 4.4‐Lite eliminated all code claimed to infringe on USL/Novell. As the new owner of the UNIX trademark, X/Open introduces the Single UNIX Specification (formerly Spec 1170), separating the UNIX trademark from any actual code stream. 1998 UNIX 98 The Open Group introduces the UNIX 98 family of brands, including Base, Workstation and Server. First UNIX 98 registered products shipped by Sun, IBM and NCR. The Open Source movement starts to take off with announcements from Netscape and IBM. UnixWare 7 and IRIX 6.5 ship. 2001 Single UNIX Specification, Version 3 Version 3 of the Single UNIX Specification unites IEEE POSIX, The Open Group and the industry efforts. Linux 2.4 kernel released. IT stocks face a hard time at the markets. The value of procurements for the UNIX brand exceeds $25 billion. AIX 5L ships. 2009 UNIX at 40 IDC on UNIX market ‐‐ says UNIX $69 billion in 2008, predicts UNIX $74 billion in 2013 2010 UNIX on the Desktop Apple reports 50 million desktops and growing ‐‐ these are Certified UNIX systems. There is a kind of international standard for Unixes – implementations should try to be compatible 3 Unix systems • Unix – Well it started as proprietary – though copies were freely given out by AT&T – Then start getting licensed versions like Xenix that had to be purchased – Berkeley BSD? • Written on an ARPA grant – ARPA rules state that source must be publically available for review – Still, ambiguities about ownership • There have been attempts to claim proprietary rights over all Unix variants – requiring all users to pay license fees 4 GNU • And anyway, an operating system just supports programs running on a computer – Manage a file system – Manage processes in memory – Schedule activities, look after I/O • But of course, you need more – An editor • How else are you going to write the program you want to run – A compiler‐assembler‐linker suite • How else are you going to get your program into executable form – Lots of libraries • You don’t want to write everything from scratch! How about – Higher level interface to I/O (not just read/write char, read/write block) – Maths (functions like rand(), sin(), – Utilities (functions like sort() 5 GNU is not Unix • Project started at MIT by Stallman ~1983 – Eventual aim to produce – “The GNU operating system is the Unix‐like operating system, which is entirely free software, that we in the GNU Project have developed since 1984. – A Unix‐like operating system consists of many programs. The GNU system includes all of the official GNU packages. It also includes many other packages, such as the X Window System and TeX, which are not GNU software.” 6 nabg 2 GNU packages • They didn’t make that much progress with the operating system! – “We had also started a kernel, the GNU Hurd, which runs on top of Mach. Developing this kernel has been a lot harder than we expected; the GNU Hurd started working reliably in 2001, but it is a long way from being ready for people to use in general.” • But the GNU project did create all those other parts like compilers, editors, various libraries, … 7 GNU packages • A few of the packages … – gcc : GNU compiler collection • Compilers for C, C++, Objective C, Java, Ada, Fortran – Emacs: “Extraordinarily powerful text editor” – Shell utilities such as: • Which (identifies full path name of an executable file) • Time (user, system, real time of program execution) • Ed (line editor) and Sed (scriptable version of editor) – Bash – Bourne again shell; a reworking and extension of Bourne’s original shell for Unix – Coreutils – commands to list files, display disk usage, change file permissions and ownership, rename files, sort a file, od, … – Glib – main C library 8 GNU packages • A few of the packages … – Ghostscript – postscript/pdf interpreter – GTK+ ‐ toolkit for Xwindows – Gzip – Make – RCS – version control – … • Without GNU, you could have a Unix like system – but not be able to do anything on it. – Stallman gets very hissy over people talking about the “Linux operating system”– 9 Minix • MINIX is a Unix‐like computer operating system based on a microkernel architecture created by Andrew S. Tanenbaum for educational purposes; MINIX also inspired the creation of the Linux kernel. • MINIX (from "mini‐Unix") was first released in 1987, with its complete source code made available to universities for study in courses and research. • It’s about 12000 lines of C code – you can read it if you get Tanenbaum’s text book 10 Minix • MINIX 3 2005 – Serves as an example for the new edition of Tanenbaum and Woodhull's textbook – Designed to be "usable as a serious system on resource‐limited and embedded computers and for applications requiring high reliability." 11 Linux • “Linux is a Unix‐like and POSIX‐compliant computer operating system assembled under the model of free and open source software development and distribution.” • The defining component of Linux is the Linux kernel ‐an operating system kernel first released on 5 October 1991 by Linus Torvalds. • The main form of distribution is through Linux distributions. – A Linux distribution (often called distro for short) is an operating system built on top of the Linux kernel and often around a package management system. – Because most of the kernel and supporting packages are free and open source software, Linux distributions have taken a wide variety of forms—from fully featured desktop, server, laptop, netbook, mobile phone, and tablet operating systems as well as minimal environments. 12 nabg 3 Linux distributions • There are many – See family tree Is that too small? A Linux distribution usually includes a very large collection of free and open‐source software of all sorts. The software is usually adapted to the distribution and then packaged into software packages by the distribution maintainers. The software packages are available online in so called repositories, on various servers around the world. 13Find it in wikipedia Debian family Ubuntu subgroup 14 Red Hat family 15 Slackware family And there are other smaller groupings16 Linux • A kernel – with additions 17 18 nabg 4 So, what does a modern operating system do, and how does it work Initial overview 19 Modern – i.e. post Unix version 4 ~1973 • By ~1973, the designs for operating systems had evolved – Some things were no longer part of an OS • Helper functions like itoa, and atoi (which were part of early OSs and which I included in the mini‐OS for the PDP‐11 demos) had moved into libraries that could be linked with application code; they would run as ordinary user code. • Editors etc became ordinary applications that could be launched by shell commands • Even the shell was no longer part of the OS – just another program; – configuration data specified the shell program that was to run for a new terminal connection. 20 Just the essentials • The kernel – Software that manages and allocates computer resources • CPU • Memory • File space • Devices • A ‘shell’ is still essential, it’s just separate and not privileged in any way – A command interpreter and ‘job control language’ – separate from the software kernel, but still an essential part of the system – The Unix shell demonstrated an approach to implementing complex tasks by running a sequence of communicating processes; similar composition approaches are now standard in the shells for operating systems • You need some familiarity with shell scripting, redirection, pipes etc 21 OS – 1‐ an API • System call application programmer interface – The OS provides services • Create a file, open file, read, write, start new process, terminate process, allocate more memory, … – These services are made available to application programs through “system calls” • These are simple wrappers for the “trap” requests that are built into the OS • System calls will appear as functions (typically in C) in a standard library () • Most programs don’t use this low‐level API directly, instead using other libraries that provide higher level abstractions – E.g. don’t use functions in for your I/O, use functions from or 22 OS – 2 – A file system • The OS must provide a file system – Some parts are apparent to users • Access and ownership – File permissions, “owner”, “group” • Organisation – Directory tree structure and files – Some are largely hidden • How are disk blocks allocated as a program creates and then writes to a file? • How can the OS “seek” to a particular position in a file? • How can processes share files concurrently and how can a process claim exclusive access if it needs this? • Can block transfers be optimised – reading ahead, delayed writes, parts of files held in main memory, … ? 23 OS – 3 ‐ processes • The OS must provide for multiple processes (multi‐tasking) – This has several aspects • Creation (and termination) of processes • Communication between processes • Scheduling of processes • Allocation of main memory to processes • Management of main memory and 2nd‐ry memory (virtual memory systems) – The creation of processes is apparent through the OS API and forms a part of many applications (e.g. the shell – almost all the shell commands involve the creation of a short‐lived process); the other aspects are mainly of concern to those implementing the OS • (though occasionally a systems administrator might want to intervene – changing process priorities, terminating processes etc) 24 nabg 5 OS – 4 – Devices • The OS must manage transfers of data – The OS should almost entirely hide details of data transfers from the application level¶ • You certainly don’t want to have to write very different code for data transfers to terminals, data transfers to disks, and data transfers to networks – Transfers of data take time – and applications typically need to wait until transfers are complete • So OS needs to integrate device management and process management ¶Occasionally, an application may have a legitimate reason for changing how a device is handled – e.g. setting input into a non blocking mode (instead of using a blocking read, application issues a non‐blocking read – this immediately returns with whatever data are already available, maybe just a few bytes or none). The OS will need to provide a system call that lets an application request such adjustments to standard handling. 25 OS – 5 – Startup and closedown mechanisms • When a computer is started, the OS must – Find out what disk systems are connected and map them into a structure that it can use to find files – Set up handlers for network connections (ability to respond to TCP/IP connection requests) – Start a selection of processes (e.g. a group of Apache servers, a MySQL server, …) – Set up a login handler – … • Traditionally, this was a leisurely process that started with an “init” program being loaded, this init would then read a script (“start X1; wait till X1 running; start X2; …”); it wasn’t fast, but then large time‐share systems only get restarted maybe once a week; laptop computers are being turned on and off frequently – and no‐one wants to wait 5..10 minutes for their system to be ready; consequently, startup mechanisms are now much more elaborate – with many tasks proceeding concurrently (with interlocks, e.g. MySQL needs the network up before it can complete its initialization) 26 Diagram based on that in Bach’s early (1985) book on the design of Unix User program Libraries User level Kernel level System call interface traps traps File Subsystem Buffer cache Character | Block Device Drivers Hardware control Hardware Kernel level Hardware level Process Control subsystem Inter Process communication Scheduler Memory Management 27 OS‐1:API OS‐2:Filesystem OS‐3:Processes OS‐4:Devices OS process Overview – all the following sections will need to talk about processes and their properties, so need a bit more explanation first 28 Process • A “process” is a program that is being executed – It may be running on a CPU – It may be ready, waiting for a CPU – It may be waiting for an I/O request to complete (until the request is complete it cannot use the CPU); less commonly, it may be waiting for another process to finish. • Every shell command that you give, every “application you launch by double clicking” results in one or more processes being run by the OS processes 29 Seeing what processes are running • Try looking at what processes are running on some time‐ shared multi‐user Linux / Unix machine processes Another useful command – “top” – displays the processes ordered by how much CPU they are consuming (a bit like TaskManager on Windows) 30 nabg 6 ps processes 31 Processes • UID – user id – root ‐ the OS (or systems administrator if you will) – nabg, tgh612, jy761 – users – daemon • PID – process id • PPID – parent process id • C – a measure of CPU usage (0 simply means small, not really 0) • STIME – when started • TTY – identifies controlling terminal • TIME – ~number of minutes and seconds of CPU used so far • CMD – identifies command being run The ps listing was from a computer running SunOS – this doesn’t have the standard Linux arrangement of the pid for init being 1 – it can be anything in SunOS; also something (4285) ran that started init (4302) and other processes on this Sunos processes 32 Processes belonging to root • It’s possible for a systems administrator to be logged in and running some programs, but the vast majority of the processes belonging to root are parts of the OS that run as applications (rather than being part of kernel code) • In the example, have – init – this is the process that really starts up a Unix/Linux system – automountd – it will look after mounting new file systems (e.g. sys admin inserts a flash drive on server USB) – cron – it’s a standard program on Unix/Linux that looks after programs that must be run at specific times (e.g. a backup program that saves recent changes to disk) – /usr/lib/saf/sac – monitors ports – /usr/lib/inet/inetd – inetd “server” (launches server programs if a request arrives at a registered port and server not yet running) – … processes 33 Processes belonging to OS • Most of the processes that are running on a computer at any particular moment will belong to the OS – Check with ‘ps’ command on Unix/Linux or with Task Manager on Windows • They perform tasks like – Managing the print queue – Listening for network connections – Handling scheduling of jobs that must run at particular times – Network file system support • Some run as ‘root’ – These have privileged access to the file system and can use certain system calls that are not available to user programs • Others run as pseudo users – ‘nobody’, ‘lp’, ‘cron’ – These run just like ordinary user programs – no privileges; but are still essential parts of OS 34 Fragments from another ps listing virindi server (Linux) init and other OS processes (Has this machine been running without restart for more than 1 year? Must need patching! Apache web server group; mysql server Me logged in via ssh in bash and running ps –ef User jun – has been logged in for over one year processes 35 Daemons, tty etc • On the original Unix time‐share system, processes were – either OS programs like those just listed, – or programs launched by users logged in at terminals • These would have an identified user and details of the “teletype” (TTY) where the user was logged in • As systems became more complex, you started to get long running server applications – accessed either across the network or via local inter‐process communication systems – These had different characteristics • No controlling terminal • Running (hopefully ‘forever’) in the background • Often responsible for a group of subprocesses (e.g. the Apache web server has a daemon process that can create additional web server processes if the number of clients goes up) • ‘Daemon’ – Unix evolved ways of setting up these server processes processes 36 nabg 7 Daemons • The example ps listing included – statd, nfs4bcd, lockd – these process are all parts of the overall “Networked File System” • The banshee computer used in this example has its own local disks used for the OS, but all user files (below /home) are shared with other computers on a file server accessed by Network File System – rpcbind – lookup service for applications using (old fashioned) remote procedure call (including maybe NFS) – kcfd – this process is involved in tasks that involve high CPU usage cryptography – • Maybe things like SSL communications processes 37 User processes • When snapshot was taken, none of the users were actually running anything – all logged in users were idle in their bash shells – Well, I was running ps ‐ef • Usually, when user enters a command to bash: – One (or more) new processes are created and start to run – The bash shell process waits for the(se) process(es) to finish • User can “run a process in the background” $ ./MyProg < input.txt > output.txt& – The & at end of command line puts the process for MyProg program into the background – The bash shell doesn’t wait for that process to terminate it returns immediately allowing the user to enter another command. processes 38 Orphan processes • Unix/Linux is a strange world, parents normally outlive their children – If you launched processes into the background from shell, and then tried to logout, part of the processing of the logout would involve killing off those background processes before your shell process really terminated. – It is possible to arrange for a child to outlive a parent process (e.g. use the nohup command when launching a background process) • Such a process is an “orphan” • It gets adopted by “init” whose PID will appear in the orphan’s PPID field processes 39 Zombies • (Linux/Unix provides a real world of adventure for the would‐be sys admin – not only do you deal with daemons, care for orphans, you must also handle zombies) • A zombie process is one where the program has terminated but the data in the OS process table, and the process id, have not yet been released. • Unix/Linux expects parent processes to be caring of their children – If you start a child process, then presumably you want to know whether it succeeded or failed! – Parent processes are expected to use waitpid() (or similar) to check what happened when the child process ran. – The data entries in the process table are kept until the return status information has been returned to the parent. processes 40 Zombies • If you create child processes, and never check return status with waitpid, you will gradually fill up the OS’s process table with zombies – This is a favourite amongst students enrolled in CSCI212; every year they crash the banshee server by overwhelming it in a mass zombie attack. • Daemon servers – like httpd (web), ftpd (ftp), sshd (ssh) – do create child processes while having no interest in their fates and so don’t check with waitpid – But a properly written daemon does not create zombies. • A proper daemon process is set up using a convoluted sequence of OS calls that inform the OS that this daemon will be an uncaring parent and its children are to be left to die alone and their deaths pass without acknowledgement. processes 41 The OS and processes • The OS must keep track of the state of each process – – When a process is running, some of its data are in CPU registers – If the process has to surrender the CPU, the OS must save those data values so that they can be restored when a process next gets to run on a CPU • If the CPU has a data cache, the OS must similarly write all data from the cache back into memory whenever it decides that the process that was running on that CPU should temporarily stop processes 42 nabg 8 Process’s memory areas • Processes on Unix/Linux have at least 5 areas of memory 1. Code – ideally “execute only” (some machines have hardware features that can enforce this) • If code is “execute only” (or “read only”) it may be possible to share the code – E.g. you can have 5 apache web server processes running but only one copy of the Apache code in memory. 2. Stack 3. Static data • Static data – all the global variables and filescope variables 4. Heap • Heap – free storage area for malloc/free (new/delete) • System call allows process to request that its heap be enlarged 5. A process entry in the operating system’s own area of memory. • Some of the data in this may be mapped into user address space and be readable by process; much is private to OS • An OS may support “shared data areas” – Block of data mapped into address spaces of several processes processes 43 Standard Linux terminology • Linux uses the term “segments” for the various user memory elements – Code is the “text segment” – Stack and heap are the stack and heap segments – The static data segment will distinguish “initialized static segment” and “zero‐filled static segment” (alternatively named “bss”) • Following would go in “initialized static segment” – int startval = 100; static float pi = 3.14159; char* msg = “hi mom”; • While the following would be in the zero‐filled part – int map[256][128]; int usecount; … Unfortunately, the term “segment” is also used for one of the ways of hardware assisted mapping of ‘virtual’ addresses to physical memory addresses. This usage is quite distinct. You will learn about hardware segmentation in an OS subject. processes 44 Simplified Linux virtual memory model • A simplified version of the Linux virtual memory model has the user memory areas as one large virtual address space • There are many different mechanisms built into the hardware of different kinds of computer for mapping virtual addresses to actual memory addresses – Learn about them in OS subject – Many exploit “pages” • 4k block of virtual address space mapped onto some 4k block of bytes in memory (a “page frame”); actual physical addresses of contiguous blocks of virtual memory aren’t necessarily contiguous Text (i.e. code) Static Initialized Zeroed Heap Stack Virtual addresses not yet mapped into physical memory Program “brk” (or “break”) point – program can ask OS to expand heap by adjusting brk point Those parts of OS’s process data that are accessible to program processes 45 OS’s data for process • The OS will maintain a lot of information in the data areas that it uses for process management, including – A save area for the data in registers (the OS saves the values of all CPU registers when it removes a process from a CPU) • No direct access from process code – A communication area for environment variables • Key/value pairs • Can be accessed by process using calls like getenv & putenv • Also used for the command line arguments – File descriptor entries • Process’s I/O streams indexed by file descriptor number – these are pointers to entries in general “open file/device” tables belonging to OS – Pointers to buffers being used for process’s files • OS has a collection of buffer structures; this collection is shared by all processes – Status information • Ready, waiting for I/O request … to complete, … – Identification data • Process id, user id, effective user id, current working directory, … processes 46 Example – process starting and stopping • Following examples illustrate how processes get started using system calls – fork – exec – Also wait processes 47 fork • Create new process (“parent process ‘forks’ creating new child process”) – New (child) process will get: • The same code segment – If OS doesn’t support shared code, child will have to get a copy of code. But all Unix/Linux systems should be able to support shared code. • An exact copy of the stack • An exact copy of the static data and heap • A new entry in the OS’s collection of processes • The “parent” process and the “child” process share a little of the data in their OS process structures – E.g. the entries for file descriptors 0, 1, & 2 point to the same file/character device (any data in buffers gets copied) processes 48 nabg 9 fork • One difference for parent and child – – Different values for the return from fork • Parent gets process id of child • Child gets 0 • So code following the fork() system call can act in different ways for parent and child processes 49 Fork demo • Process starts and prints welcome message • It forks – Child prints text, “sleeps”, prints text and terminates – Parent prints text, asks OS to suspend it until child terminates; after child has terminated, parent process will restart, print text, and terminate. Welcome message Parent Child Report on child termination Child’s start message Child’s end message SleepingWaiting Fork processes 50 Fork demo waitpid – wait for specific child process to finish processes Child Parent 51 Fork demo 2 • System call fork() – Returns • ‐ve value : unable to start new process (OS’s process table is full) • 0: value for child • Integer > 0 : process id of child (value for parent) processes 52 Fork demo 3 • Following the fork() call, both processes continue executing the same code – but can take different paths – Child sleeps a little then quits – using exit to return a value (8‐bit integer) that can be checked by parent – Parent asks OS to let it wait until child has finished. WEXITSTATUS is a macro that selects specific bits (with exit value) from status word used to report how child finished; other bits in status word relate to things like signals (mentioned later) processes 53 Fork demo runs Output from parent and child will be interleaved in an arbitrary fashion, the processes are sharing the same output device. processes 54 nabg 10 Do you want to execute the same program? • Sometimes, yes! – (Slightly simplified example) • Apache web server process receives a request for a new client connection • But it’s too busy to handle the extra request • So it forks a new Apache process that will handle the request – Apache code being run by both parent and child. • Usually, no! – You’ve created a new process to execute some other program. processes 55 Launch new program from bash shell • Two main steps – – 1st: • Fork – call to operating system, “please duplicate this process” – 2nd: (you now have two processes running bash) • Original process – call to operating system “please let me sleep until the new child process finishes – then wake me” • New child process – call to operating system “I don’t want to run ‘bash’; I want to run ‘cat’ – make it happen” processes 56 exec system call • OS has an exec system call – Arguments identify the file with the executable code that is to run instead of the current code. • OS steps handling an exec call: 1. Discard stack, static data, and heap data areas of process making exec request (and code segment if OS duplicates the code on fork); the only data remaining are those in the OS’s process table. 2. Read first part of disk file with executable (getting information on the size of its code (‘text’), static data, and initial heap) 3. Find space in memory for these new segments and for a stack • Maybe check to see if the same code already loaded for some other process – could then share it. 4. Load code if necessary 5. Initialize static data 6. Start new program processes 57 Exec demo • Two programs needed – One for parent process • It forks, waits for child to finish, reports – all more or less as in previous example – One for child process • Quite different code, just reporting its entry to life • Code in parent process will need a path name for the file with the executable for the child – This gets a bit too messy in a NetBeans/Windows/Cygwin system, due partly to the way NetBeans hides executables in sub‐sub‐subdirectories of its project directory! So, the code was transferred to a proper Linux system for execution! The two executables were in the same directory on that Linux system. processes 58 Parent Exec system call processes 59 Child processes 60 nabg 11 Running (on Linux) processes 61 exec() system calls • There are a number of variants with differing argument lists etc – example uses execl() processes 62 execl() • execl() – 1st argument is path to executable – 2nd argument is “name of program” – 3rd and subsequent • These would be the command line arguments – And a sentinel marker at end! • It is one of those functions that takes a variable number of arguments (how many command line args do you want this time? In the example it was 0 arguments, so apart from path only needed program name.) Sentinel needed so that code can recognize when it has all values. processes 63 Real examples • Example to illustrate what happens when you run a command at shell level – You are in bash shell – You enter the command cat testfile.txt • cat program displays the context of a text file on the terminal processes 64 In the computer ‐ 1 Interrupt & trap vector; OS trap dispatcher System stack Device handlers OS buffers for I/O OS code for I/O management OS data structures for I/O requests OS data structures for processes OS code for scheduling processes OS trap handling functions OS code for filesystem (files & directories) bash Code User application 1 stack User application 1 Static data & heap You – running bash User application 2 Code User app 2 stack User application 2 Static data & heap User 2 – waiting for input Block of memory not in use Each block represents thousands of consecutive memory bytes Layout of user memory not standard Linux compliant processes 65 In the computer – 2 – after fork Interrupt & trap vector; OS trap dispatcher System stack Device handlers OS buffers for I/O OS code for I/O management OS data structures for I/O requests OS data structures for processes OS code for scheduling processes OS trap handling functions OS code for filesystem (files & directories) bash Code User application 1 stack User application 1 Static data & heap You – still running bash; but it’s invoked waitpid() User application 2 Code User app 2 stack User application 2 Static data & heap User 2 – waiting for input Block of memory not in use Each block represents thousands of consecutive memory bytes User application 1 stack User application 1 Static data & heap You – also running bash! (sharing the code!) processes 66 nabg 12 In the computer – 3 – dealing with exec() request from client process Interrupt & trap vector; OS trap dispatcher System stack Device handlers OS buffers for I/O OS code for I/O management OS data structures for I/O requests OS data structures for processes OS code for scheduling processes OS trap handling functions OS code for filesystem (files & directories) bash Code User application 1 stack User application 1 Static data & heap You – still running bash; but it’s invoked waitpid() User application 2 Code User app 2 stack User application 2 Static data & heap User 2 – waiting for input Block of memory not in use Each block represents thousands of consecutive memory bytes User application 1 stack User application 1 Static data & heapYou – this child bash has invoked execl and OS is discarding the data segments processes 67 In the computer – 4 – out with old, in with new Interrupt & trap vector; OS trap dispatcher System stack Device handlers OS buffers for I/O OS code for I/O management OS data structures for I/O requests OS data structures for processes OS code for scheduling processes OS trap handling functions OS code for filesystem (files & directories) bash Code User application 1 stack User application 1 Static data & heap You – still running bash; but it’s invoked waitpid() User application 2 Code User app 2 stack User application 2 Static data & heap User 2 – waiting for input Block of memory not in use Each block represents thousands of consecutive memory bytes You – kind of still in existence, only data left is in OS process table 68 In the computer – 5 – running cat Interrupt & trap vector; OS trap dispatcher System stack Device handlers OS buffers for I/O OS code for I/O management OS data structures for I/O requests OS data structures for processes OS code for scheduling processes OS trap handling functions OS code for filesystem (files & directories) bash Code User application 1 stack User application 1 Static data & heap You – still running bash; but it’s invoked waitpid() User application 2 Code User app 2 stack User application 2 Static data & heap User 2 – waiting for input Block of memory not in use Each block represents thousands of consecutive memory bytes User application 1 stack User application 1 Static data & heap cat code You –running cat! processes 69 In the computer – 6 – after child exit() Interrupt & trap vector; OS trap dispatcher System stack Device handlers OS buffers for I/O OS code for I/O management OS data structures for I/O requests OS data structures for processes OS code for scheduling processes OS trap handling functions OS code for filesystem (files & directories) bash Code User application 1 stack User application 1 Static data & heap You – running bash User application 2 Code User app 2 stack User application 2 Static data & heap User 2 – waiting for input Block of memory not in use Each block represents thousands of consecutive memory bytes processes 70 See what really happens • Most versions of Unix or Linux will have a command that lets you trace the system calls being made by a program as it runs – strace, truss, … ‐ the name varies depending on version of Unix/Linux that you are using • Can strace the “cat testfile.txt” command run from shell • Obviously details of this example are “not examinable” It is simply a more detailed illustration of what happens when a process is started. And it’s a hint to those students that are really interested in OSs that they can discover a lot with through tools like strace. processes 71 strace cat testfile.txt : 1 neil@neil‐OptiPlex‐760:~/tmp$ strace cat testfile.txt execve("/bin/cat", ["cat", "testfile.txt"], [/* 39 vars */]) = 0 brk(0) = 0x2141000 access("/etc/ld.so.nohwcap", F_OK) = ‐1 ENOENT (No such file or directory) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, ‐1, 0) = 0x7f940c150000 access("/etc/ld.so.preload", R_OK) = ‐1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=81942, ...}) = 0 mmap(NULL, 81942, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f940c13b000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = ‐1 ENOENT (No such file or directory) open("/lib/x86_64‐linux‐gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\30\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1815224, ...}) = 0 Execve – different variant of exec, arguments in an array brk(0) – code wanted to know size of existing memory in use (top of heap) These requests – well, they are all to do with starting the load process (to load cat executable)72 nabg 13 Some of the system calls • execv – exec() variant • brk – Usually ask for more memory, with 0 as argument it’s essentially asking how much existing memory • access – Checking access permissions on file (F_OK simply checks whether file exists) • mmap – Modern Linux systems have this mmap system call as well as brk; it is again a request for more memory – but it can be mapped anywhere into process address space, not limited to expansion of heap; first example mmap request is simply “give me an 8k block of bytes” • open – It opened ld.so.cache (as file 3) – this file contains information about libraries • fstat and stat – stat gets information about a file (fstat if using a file descriptor) • mmap – 2nd mmap call basically loaded data from ld.so.cache • Continues in similar fashion loading libc – GNU’s standard library of C functions; – cat program is getting all the functions it requires processes 73 strace cat testfile.txt : 2 … fstat(3, {st_mode=S_IFREG|0755, st_size=1815224, ...}) = 0 mmap(NULL, 3929304, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f940bb70000 mprotect(0x7f940bd25000, 2097152, PROT_NONE) = 0 mmap(0x7f940bf25000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b5000) = 0x7f940bf25000 mmap(0x7f940bf2b000, 17624, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, ‐ 1, 0) = 0x7f940bf2b000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, ‐1, 0) = 0x7f940c13a000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, ‐1, 0) = 0x7f940c139000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, ‐1, 0) = 0x7f940c138000 arch_prctl(ARCH_SET_FS, 0x7f940c139700) = 0 mprotect(0x7f940bf25000, 16384, PROT_READ) = 0 mprotect(0x60a000, 4096, PROT_READ) = 0 mprotect(0x7f940c152000, 4096, PROT_READ) = 0 munmap(0x7f940c13b000, 81942) = 0 Sorting out its memory requiremnts processes 74 More system calls • mprotect – Change process’s ability to access memory – • PROT_NONE The memory cannot be accessed at all. • PROT_READ The memory can be read. • PROT_WRITE The memory can be modified. • PROT_EXEC The memory can be executed. • arch_prctl – It’s setting the start address • munmap – Free some memory 75 strace cat testfile.txt : 3 mprotect(0x7f940c152000, 4096, PROT_READ) = 0 munmap(0x7f940c13b000, 81942) = 0 brk(0) = 0x2141000 brk(0x2162000) = 0x2162000 open("/usr/lib/locale/locale‐archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=7220736, ...}) = 0 mmap(NULL, 7220736, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f940b48d000 close(3) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 open("testfile.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0664, st_size=53, ...}) = 0 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 read(3, "This is a file with some text.\nA"..., 32768) = 53 write(1, "This is a file with some text.\nA"..., 53This is a file with some text. And that is about it. ) = 53 read(3, "", 32768) = 0 close(3) = 0 close(1) = 0 close(2) = 0 exit_group(0) = ? Finally! Finally cat opens the file testfile.txt on filedes 3 reads data (53 bytes), writes to stdout, and closes files and exits processes 76 fork – exec – “family tree” • Unix/Linux starts with just one process – init – Typically gets identifier 1 • It forks to create other processes, these fork again to create still more processes – Process numbers used to be 16bit (not on some modern Linux), OS would just search for next lowest unused process number – OS records allocated process number and parent process number as part of OS process data • Typically, in the Unix/Linux world parents outlive children – Parent process forks – can wait for child to terminate (as in example) or simply forget about it • In many cases, termination of parent process will automatically result in forcible termination of child processes – You log out of your terminal session after setting a program to run in the background – that process gets terminated. • Sometimes a parent can terminate and leave living descendants – – These “orphan processes” are “adopted” by init i.e. their parent process id is changed to the pid for the init process processes 77 fork – exec – “family tree” • Example in that ps listing shown earlier – nabg 14570 25387 0 14:25:19 pts/7 0:00 ps –ef • Me, as process 14570, running command ps –ef • Parent process 25387 – nabg 25387 25321 0 10:07:11 pts/7 0:00 /usr/bin/bash • Me as process 25387, running bash • Parent process 25321 – nabg 25321 25320 0 10:07:04 ? 0:02 /usr/lib/ssh/sshd • Me as process 25321, I logged into banshee computer via ssh so I am running an instance of the sshd daemon process • Parent process 25320 – root 25320 24114 0 10:07:04 ? 0:00 /usr/lib/ssh/sshd • Process 25320, leader of a group of sshd daemon processes; its parent 24114 – root 24114 4285 0 Dec 11 ? 75:41 /usr/lib/ssh/sshd • Process 24114, real leader of all ssh daemon processes; its parent 4285 – Process 4285 ? • On this SunOS system, this a now departed process that started init, ssh, utmpd, statd, … 78 nabg 14 strace, ps, and other tools for the nascent Linux implementor! • On your own system – where you have ‘root’ access via su or sudo – you can look inside the /proc virtual filesystem. – You will find that this has • a variety of files containing information about the computer and OS • Subdirectories for each currently executing process – Best to create a simple process like one running this program int main(int argc,char** argv) { pid_t procid = getpid(); fprintf(stdout, “process id is %d\n”,procid); sleep(1500); exit(0); } that will tell you the process number and then give you about 25 minutes to look around – Then you can explore all sorts of information about it in subdirectory (use process number), this subdirectory will be deleted when process ends. – Then give your program something more to do and discover how processes really work processes 79 Memory in processes • Let us grow the stack – and watch it grow long is 4 bytes on this Windows‐XP/Cygwin system processes 80 Stack growth (downwards) • Program simply calls recursive function 10 times, at each level of recursion add a little more than 16kbyte to stack Can get address of a local in the function ~ address of top of stack Printing values in decimal (hex would be more typical for Linux system programs) processes 81 Heap • All heap management operations are built around the system call brk() – But this provides a very low level control! • And brk() not even in Cygwin’s ! – More often, a wrapper function in unistd – sbrk – is used as the lowest level function 82 brk() and sbrk() NAME brk, sbrk ‐ change data segment size SYNOPSIS #include int brk(void *end_data_segment); void *sbrk(intptr_t increment); brk() sets the end of the data segment to the value specified by end_data_segment, when that value is reasonable, the system does have enough memory and the process does not exceed its max data size. sbrk() increments the program’s data space by increment bytes. sbrk() isn’t a system call, it is just a C library wrapper. Calling sbrk() with an increment of 0 can be used to find the current location of the program break. RETURN VALUE On success, brk() returns zero. On error, ‐1 is returned, and errno is set to ENOMEM. On success, sbrk() returns a pointer to the start of the new area. On error, ‐1 is returned, and errno is set to ENOMEM. processes 83 Demo of low level heap management • Program – uses sbrk to expand heap and get current “break” point (address of top of heap); – Memory is written to – just to show that it is acceptable – brk is used to “release memory” • In principle, it’s been removed from our address space – but OS may ignore the request (it did when this example was run, the “released memory” was still writeable!) Program must be run on a proper Linux system – Cygwin doesn’t support direct call to brk()84 nabg 15 processes 85 processes 86 It runs Text (i.e. code) Static Initialized Zeroed Heap Stack Initial “break” Break after 1st claim, & when memory freed Break after 3rd claim Break after 2nd claim 87 Free store management • Very few application program work directly with brk() and sbrk() • Most use the malloc() and free() functions in stdlib (C++’s “new” and “delete” will be implemented using malloc() and free()) processes 88 malloc • malloc – On 1st use, it will claim a ‘large’ block of memory using sbrk, this block will be initialized – • malloc will treat the memory that it manages as a linked list of blocks of defined size; initially there is just one block whose size is ~whatever malloc asked for with sbrk() – Malloc carves out a block of bytes from its large memory area, returning the start address, and updating its own “free list” – It may makes subsequent calls to sbrk if the program needs more heap space than was claimed in the initial step. – It may release memory • The OS may ignore calls to brk that release memory. • The OS may simply leave the heap to be released in one go when the program terminates – Most systems use virtual memory and will page out regions of memory that aren’t being used, any large area claimed with malloc and then later released with free will soon end up in 2ndry storage; which is why the OS may choose to ignore release requests processes 89 malloc • Program request will be something like malloc(1024) (request for 1024 bytes) 1028 byte block Address returned to program Block size stored here Buggy programmers often write outside of the memory blocks that they get allocated. Most typically, this occurs when using an allocated array e.g. int data[256] and writing to data[‐1] or data[256]. Obviously, this trashes the “blocksize” data of this block or the data in a neighbouring block. You can use debugging variants of malloc and free that put additional headers and trailers on each allocated block. These contain specific bit patterns. When the debug‐malloc or debug‐free functions are called, they always check that all blocks are properly terminated – exiting the program on error. 90 nabg 16 free • free(void* ptr) should only be called with a ptr value originally returned by malloc – free inserts “previous” and “next” links so that the released area is back in the freelist used by malloc Address as given to free Block size stored here Link to previous free block Link to next free block processes 91 Linked list of blocks • Memory owned by malloc Blocksize record for a block in use Block still in use Size field for malloc’s allocated space Head of free list A “NULL” valued previous pointer A next pointer free Blocksize record for free block A previous pointer free free free A NULL valued next pointer processes 92 Linked list of blocks • Regions merged on free free free free free Free this free free free freefree free free freefree State when request to free a block was made Reconstructing free list New state on completion of free(block) operation processes 93 malloc free demo processes 94 processes 95 It runs processes 96 nabg 17 Malloc and free demo • Allocate a number of blocks – Demo shows malloc makes an initial claim for large area of memory, no further expansion needed • Free a few • Allocate again – Space that was made by freeing blocks gets used processes 97 Errors in system calls errno is set appropriately errno is set to ENOMEM processes 98 errno • The integer variable errno is used by the OS to record data identifying a specific error condition if something went wrong when processing a system call. • errno is one of the system variables (along with the environment vector, argc, argv etc) that is mapped into the user address space and which can be read by a program processes 99 Associated functions • has void perror(const char* msg); • has char* strerror(int errnum); • These functions look up the value of the errnum in a table of error messages – strerror returns the string – perror prints, on stdout, the msg string given as argument followed by the standard error message • Considered good practice to check after any system call, or any library function that uses errno int s; s = socket(PF_INET, SOCK_STREAM, 0); if (s == -1) { perror("socket error"); } processes 100 Enough on processes for now • You should now have some idea as to what a process is, and how one might interact with the OS via system calls • It’s relatively rare for programs to make system calls directly – mostly the calls are made via C wrapper functions in libraries such as , • The system calls most common in simple programs (either direct calls or calls via libraries) are: – exit – open, close, read, write – brk (via malloc) processes 101 So, back to the OS system call interface The things that the OS can do for you 102 nabg 18 OS – 1‐ an API • System call application programmer interface – The OS provides services • Create a file, open file, read, write, start new process, terminate process, allocate more memory, … – These services are made available to application programs through “system calls” • These are simple wrappers for the “trap” requests that are built into the OS • System calls will appear as functions (typically in C) in a standard library ( ) • Most programs don’t use this low‐level API directly, instead using other libraries that provide higher level abstractions – E.g. don’t use functions in for your I/O, use functions from or 103 System calls • The original Unix had about 60, modern Linux variants have 200‐300! – Several groups • Process control • File management • Communication • Device management • Information management – E.g. ability to set the system’s clock OS : system call API 104 Illustration –early Unix version’s system calls (Unix V6 list) • break • chdir • chmod • chown • close • creat • dup • exec • exit • fork • fstat • getgid • getpid • getuid • gtty • indir • kill • link • mknod • mount • nice • open • pipe • profil • ptrace • read • seek • setgid • setuid • signal • sleep • stat • stime • stty • sync • time • times • umount • unlink • wait • write OS : system call API 105 Calls for managing processes • Already introduced some of these – Fork – create another process – Exec – “overlays calling process with named file” i.e. changes the code being run – Exit – terminate a process – Break – Process asks for more memory – Wait – stop and wait for one of your children to die – Sleep – stop execution for an interval – … OS : system call API 106 Calls for managing processes • A few more exotic process management system calls – Nice – Process asks to change priority • priority is used by OS scheduler code when choosing which ‘ready’ process to run next – Kill – send a kill signal to other process • signals explained more shortly – Profil – collect profiling data • At regular intervals, record value in PC register – used to find which parts of your code take most time – Ptrace – For debugging, parent process (debugger) can control execution of child – Times – get user, system, and real time used by a process OS : system call API 107 I/O • “Unix had a drastically simplified file model compared to many contemporary operating systems: treating all kinds of files as simple byte arrays. The file system hierarchy contained machine services and devices (such as printers, terminals, or disk drives), providing a uniform interface, but at the expense of occasionally requiring additional mechanisms such as ioctl and mode flags to access features of the hardware that did not fit the simple "stream of bytes" model.” OS : system call API 108 nabg 19 I/O : Just the 4+ calls • I/O calls: – Open – Close – Read –Write – And, where applicable, • seek OS : system call API 109 I/O: also … • There are a couple of supplementary systems I/O calls: – dup • Duplicate a file descriptor – Gives program to access paths to same ‘file’ – It’s more typically used to change the file descriptor numbers that are being used, rather than to really create extra paths – Pipe • A mechanism for inter‐process communication, often 2‐way – Before fork, “parent program” uses pipe system call » OS sets up a couple of data buffers where data can be written » OS allocates file descriptors to read from and write to buffer – After fork » Parent can “write to pipe”, data in buffer, child can “read from pipe” » If 2‐way communication, child can “write to pipe”, data go in other buffer, parent can “read from pipe” • Unidirectional flow via pipe (memory buffers) is common mechanism when processing task involves a sequence of each implemented by separate program – like gcc – C preprocessor, compiler steps, linker much more efficient than writing to and reading from file OS : system call API 110 OS sys calls : filesystem, also users/groups, permissions • mount – “A removable file system has been mountd on the block structured device special” • At start up, system has to “mount” all file systems – building them into a single file hierarchy • Subsequently, can mount exchangeable disks (and these days USB flash drives etc) • unmount – Remove a file system • mknod – Create a “device special file” • Super‐user only – creates special file entry representing a disk partition, for insertion into file‐hierarchy – In V6, was also used to create directories but now there is a separate mkdir system call OS : system call API 111 OS sys calls : filesystem, also users/groups, permissions • creat – Create a new file • link, unlink – Create/remove a 2ndry reference to a file (or directory) – Unlink works as “delete” if it is removing the last reference to a file or directory • chdir – Change the “working directory” • chmod – Change access permissions on a file • chown – Change ownership of a file OS : system call API Some of the simpler shell commands, e.g. shell’s chmod, are simply packaged system calls112 OS sys calls : filesystem, also users/groups, permissions • stat and fstat – Read data defining properties of a file • Get back a struct with data such as – Number of links to this file – User id of owner – Time last accessed – Time last modified – … OS : system call API 113 OS sys calls : communications • Pipe – already mentioned • Signal – Signals were an additional (rather crude) form of inter process communication (more for OS to program communication really) – “A signal is generated by some abnormal event, initiated either by user at a terminal (quit, interrupt), by a program error (bus error, etc.), or by request of another program (kill). Normally all signals cause termination of the receiving process, but a signal call allows them either to be ignored or to cause an interrupt to a specified location.” • Lots of different signals defined – many are errors that hardware could detect – Illegal instruction, floating point exception, bus error, bad arg to system call, … – The signal system call allowed a program to nominate a function as a “signal handler” for a particular type of signal • Receipt of that kind of signal would result in the function being called, and possibly the program being resumed, rather than program termination. OS : system call API 114 nabg 20 OS sys calls : Device management • There was little provision for device management in Unix V6 – ioctl – • the main system call for more modern Unixes, turns up in V7 • Provides for things like non‐blocking I/O • V6 had stty and gtty system calls that could get information about how tty (terminal) connections were being handled, or modify the settings. OS : system call API 115 OS sys calls : Information management • Again, fairly limited in Unix V6; calls only available to super‐user – stime • Set system clock – sync • Flush all I/O buffers, make sure all disk operations completed OS : system call API 116 Modern Linux does have rather more OS : system call API 117 OS provides more services these days – so more OS system calls! • 1980s added – Networking • So need calls to handle sockets, set host name etc – Distinctions like actual and effective user id of process, multiple groups (originally Unix users could only be in one “group”) • Extra calls to get/set group, effective user id etc – File control and I/O control – mmap memory handling (supplementing break and supporting shared memory) – Logging – Additional inter‐process communication mechanisms – mprotect – setting access controls on code and data segments – Quotas and resource management – Poll (select) mechanism for handling multiple I/O streams – … OS : system call API 118 System calls rare in typical program code • Although everything works through these low level system calls, it is rare for application programs to make the calls directly – Almost always, application programmers use C (C++) libraries that provide • At least, simple wrappers to system calls • Often, a higher level conceptual interface with more elaborate functionality and more convenient data structures – E.g. malloc and free rather than brk and sbrk OS : system call API 119 Carefully read the documentation before coding systems calls! • Many sys calls return pointers to data structures with lots of information fields – In early days, functions often implemented along the lines: struct demo *syscall(arg1, arg2, arg3) { static struct demo results; … return &results; } – i.e. the results are placed in a static variable of function whose address is returned – But some system’s functions would malloc a structure and return a pointer to space in the heap. • Do you free the result struct when you’ve finished with it? – Of course, if the result was a pointer a static you must NOT invoke free() but you must invoke free() if the result was a heap pointer OS : system call API 120 nabg 21 Carefully read the documentation before coding systems calls! • Another problem with code using static variables is that such code is not thread safe – With multi‐threaded programs now much more common this is a serious problem • So often the libraries now have – Old style functions • For compatibility with existing code – New, variants that are thread‐safe (“re‐entrant”) • With these, there is usually a more complex function signature as you must pass additional arguments – the addresses of structures that you have created to hold the function results. OS : system call API 121 OS – Management of Users and Filesystem 122 OS – 2 – A file system • The OS must provide a file system – Some parts are apparent to users • Access and ownership – File permissions, “owner”, “group” • Organisation – Directory tree structure and files – Some are largely hidden • How are disk blocks allocated as a program creates and then writes to a file? • How can the OS “seek” to a particular position in a file? • How can processes share files concurrently and how can a process claim exclusive access if it needs this? • Can block transfers be optimised – reading ahead, delayed writes, parts of files held in main memory, … ? 123 Users • Once file‐systems became available (early/mid‐1960s), the operating systems needed the concept of “user” – Individual users own specific files – Individual users may be subject to quotas on resource usage and priorities for getting jobs run • Maybe differing resource rights – some users, but not all, can run jobs that request private volume disks or tapes to be mounted • Often, users are part of groups – This may affect sharing of files or other use of resources • Clearly some users need special privileges – The sys admin has to be able to • Set the time • Maybe prevent any additional processes being created so that the system can wind down before being closed for hardware/software maintenance • Variety of user management schemes were prototyped on early Oss, some with multiple levels of privilege Users 124 Unix • Unix started with – Ordinary users – “root” • The “super‐user” • root can do anything – Mount/unmount devices – Change priority / kill any process – Access / modify any file or directory • Groups were added fairly early – Initially, a user could only be in one group; later, support was provided for multiple groups Linux is now reworking the ‘multiple levels of privilege’ that existed in some other earlier OSs. A variety of ‘capabilities’ can be assigned to processes. Users 125 Creating users • The systems administrator can create users and groups via the commands useradd, and goupadd (command names vary with Unix/Linux version); admin can then modify the user record adding membership of different groups Users 126 nabg 22 Being systems admin (root) • You should have Linux – either as dual boot or a virtual machine • You will be systems admin for your Linux • You can – Login as root • Generally regarded as foolish – too much risk of accidentally deleting your entire Unix – Login with a normal account and use “su root” (set user command) • Still risky – and requires knowledge of root password – Installing and using “sudo” • Safest option • Sudoers file identifies accounts with sudo access – Can limit commands that are available via sudo (though not commonly done) • All activities using sudo get logged. • Uses sudoer’s own user password – so don’t have to give out real root password Login as root should only be possible on the system console 127 /etc/passwd • Details of users and groups are stored in /etc/passwd and /etc/group; Part of /etc/passwd root:x:0:0:Super-User:/root:/sbin/sh lp:x:71:8:Line Printer Admin:/usr/spool/lp: postgres:x:90:90:PostgreSQL Reserved UID:/:/usr/bin/pfksh aa008:x:175060:202:Ahmed Almahrouq,,,,,100%000:/home/ug/g/aa008:/share/bin/ uow_sh aa009:x:132956:14000:Ali Aghassi,,,,,100%000:/home/pg/e/aa009:/share/bin/uo w_sh Users 128 /etc/passwd • Each account gets a record in the file • Fields in record root:x:0:0:Super-User:/root:/sbin/sh lp:x:71:8:Line Printer Admin:/usr/spool/lp: aa008:x:175060:202:Ahmed Almahrouq,,,,,100%000:/home/ug/g/aa008:/share/bin/uow_sh • Field 1: (root, lp,aa008) : user name • Field 2: (x) : it used to be a hash of password • Field 3: (0, 71, 175060) : user id • Field 4: (0, 8, 202) : group id • Field 5: (Super‐User, Line …, Ahmed) : user info – A comma separated list that can include – real name, contact phone number, room, … • Field 6: (/root, /usr/spool/lp, /home/ug/g/aa08) : home directory • Field 7: (/sbin/sh, none, /share/bin/uow_sh) : shell Users 129 Shells • Last field of user record is the shell program that will be looking after user once login is complete – Some “users” don’t get shells, e.g. lp • User lp is a fiction, it is a defined “user” for the OS process that manages print spooling – As mentioned in overview, programs nowadays never output directly to a printer; print output is “spooled” (sent to a disk file in a system owned directory); the process managing printing will take files and print them • “lp” should never login – so gets no shell • Although part of the OS, it doesn’t need ‘root’ privileges Users 130 Shells • Root and aa008 get different shells – Root gets a standard (SunOS Unix) shell – aa008 is a student account and gets a locally defined, slightly modified version of the bash shell Users 131 Passwords • The /etc/passwd file used to contain hashed versions of user passwords – When user logged in, the password that they entered was processed by the same hashing function and the two hashed values were compared (they had to be identical) • The file /etc/passwd has to be readable by all (it’s used by lots of programs, e.g. finger – a program that tells you about another user whose id you know); so attackers could easily get a copy of the file with the encrypted passwords. • Hashed password strings cannot be converted back into clear text passwords – but they were still vulnerable to attacks. Users 132 nabg 23 Passwords • Hackers couldn’t reverse the hash coding, but they could use “dictionary attacks” – They built dictionary files containing popular passwords (‘password’, ‘itsme’, ‘1234’, …) along with the strings output from the standard Unix password hashing function • File organised with hash values as keys and clear text passwords as values – They could then simply check whether any password entry in /etc/passwd matched an entry in their dictionary file • Most Unix/Linux systems now use “shadow passwords” to reduce the ease of dictionary attacks – /etc/passwd file contains ‘x’ in the password field rather than the hashed password – A separate file, /etc/shadow, that is only readable by programs running as ‘root’ (e.g. login) contains userids and the real hashed passwords Users 133 /etc/group • The group file’s records have groupname, :: separator, group id, : separator and list of usernames. root::0:yuan,david,cdf,pdg,andy,xpdg,lls,hardbn,blane,sdun ster,aconte,klove,agalka,steve,ruiy,onge,abeleski,eburns,b arra,miker,deantr,dward,ssteele robotics::402:phillip,koren unimovie::404:risque Users 134 Userid and groupid • By default, after you login, your userid gets associated with every process that you run and every file that you create; you have a default groupid that can also be used for fileaccess; your userid is also used by those parts of the OS that apply resource limits – like a quota on your disk space. Users 135 Changing user id & group id • System calls allow a process to change user id and group id; a process run by root can always do anything that it wants; processes launched by users must be given permission to change id! • Many Unix/Linux utilities are “set user id” programs – E.g. ‘cron’ • cron can be run by a user entering the command to their bash shell, it is used to run jobs at specific times, the command line arguments identify the executable, time at which to run etc • cron must change data in its files – so it has to change the process’s effective user id from the user’s id to cron id Users Cron process – an OS process that reads work files and launches tasks at appropriate times Cron program (via shell command) – adds a new task to the work files 136 Changing user id & group id • If you have the permissions (only ‘root’ has them by default) you can change the directory entry for an executable to mark it as a “set user id program” – It will then run using the userid of the file owner rather than using the userid of the person who invoked the executable. • Daemons, like apache, often run as root but the child processes that they launch run using the userid of a less privileged user – like “www‐data” – When the daemon forks a child, the child’s first task is to change the effective user‐id Users 137 Changing user id & group id • Shell commands chown and chgrp can be used to change details of file ownership and group – You can change the group of any file that you own from its default to that of any other group of which you are a member. – You would need sudo privilege to change a file’s ownership Users 138 nabg 24 Access control lists • Modern Linux variants also support “access control lists” (ACLs) • ACLs allow for more fine‐grained permissions to be granted on files. 139 Disks and partitions • Disk drives typically have their storage capacity divided into a number of “partitions” • Partitions can be: – Swap area – Data area – “File‐system” Disk data storage 140 Disks : swap area • Swap area – Typically, the total virtual memory of all processes exceeds actual physical memory, so some processes are “swapped out” • Can be completely removed from main memory to disk • Or paged – parts of code and data not recently accessed are moved to disk Disk data storage 141 Disks : data area • Data area – “Raw mode device” • Most commonly, “raw mode” access is for database systems – Can achieve higher performance by allocating tracks, sectors, and blocks in accord with database’s own usage pattern rather than storing data using OS files » Simple database systems ‐ like sqlite, mysql, personal editions of oracle/db2 – will use OS files to store their data Disk data storage 142 Disks : “file‐system” • File‐system – Holds regular files and directories – There are many possible versions • ‘ext2’ – the traditional Linux system • Newer Linux file systems use ‘journaling’ – a different approach to recording changes to files and directories – Advantage – these systems allow a quicker restart after a system crash Disk data storage 143 File system structure • A disk partition used for a file system will contain – A boot block – A “super block” – An “i‐node” table – Data blocks Disk data storage These are conceptual entities. The boot block and super block may require many disk blocks to store them. 144 nabg 25 File system structure • A boot block – All partitions contain boot blocks, the boot loader (BIOS) will be configured to read the boot block of a specific partition on a particular disk – The configured boot block has the startup code needed to read in the OS, initialize interrupt vectors etc, and start the OS • A “super block” – Contains configuration data • Size of i‐node table • Size of a block in this file system (used to be 512bytes, now more typically 4kbytes) • Size of the file system – total number of data blocks • Free inodes and disk blocks – Varies with file system, may use freelist or bitmap representation – This data may be in separate ‘group descriptor’ record rather than superblock Disk data storage 145 File system structure • i‐node table – Inodes are data structures used to hold information about files; – Every file (and directory, and symbolic link) has an inode – So the size of the inode table defines an upper limit on the number of files in this disk partition • Data blocks – Data blocks are used for the files, directories, and for pointers (use explained with inodes next) Disk data storage 146 Files – access and organization • Have two issues when designing a file system – File access mechanism – Conceptual organization (things like directory hierarchy) • Want access mechanism to be simple and fast – so looking for things like fixed size tables that can be indexed by file identifier and will contain details of the blocks that make up the file Files 147 Allocating disk blocks to files • Simple approach (as in early OSs and 1980s micro‐computer systems) – contiguous allocation – Disadvantages • Need to know file size in advance (ask for number of blocks when create file) • Leads to fragmentation of free space on disk – Have to regularly run a program that moves files so as to coalesce free blocks – Advantages • Fast when reading all file sequentially • Minimal information required to define file‐blocks (1st block, and number of blocks) File access mechanism Disk blocks represented as array Directory File‐1 Free (was file 2, deleted) File‐3 File‐4 File‐7 (was part of file 5, deleted) Free (was part of file 5, deleted) File‐6 Free 148 Another simple scheme … • Blocks in file form a linked list – Last 4‐bytes of each block are not file data, instead these bytes contain the block number of next block of file • When a file is deleted, it simply gets pre‐ pended to a list of free‐blocks on this disk File access mechanism free File‐1/1 File‐2/2 free File‐2/1 free File‐1/3 free File‐2/3 free File‐1/2 free File‐1/4 File‐3/1 free freelist 149 Linked list schemes • Tried in some OSs – but not very practical – OK for text files that are always read sequentially, but problematic for record structure (binary files) and don’t support random access – Creation and deletion of files soon leads to severe fragmentation – the blocks that make up a file are scattered all over the disk necessitating lots of movement of disk heads and therefore slow access File access mechanism 150 nabg 26 How about this … • Data structure defining a file: – File name – File type (file, directory, soft link, character device, …) – File ownership and access rights – Overall byte count – Array of block numbers (pointers to disk blocks) • 1st block of file • 2nd block of file • 3rd block of file • … • 999th block of file • Schemes like this were tried – No need to specify file size in advance – Random access for record structured file is possible • But most files are small, so don’t want a big array of pointers – The array size limits the maximum size of any file File access mechanism 151 Unix/Linux i‐node • Unix wanted something like the last scheme – but better! – File size unrestricted, and not specified at creation time – Random access to any block in file is quick – Small files (the majority in any file system) are represented compactly – Files should be able to contain holes • If you want something like a large hash map represented in a disk file, you don’t want to allocate blocks for address ranges that contain no data • “All problems in computer science are solved by another level of indirection” File access mechanism 152 i‐node information • i‐node record structure – File type (file, directory, symbolic link, …) – Owner and Group ids – Access permissions for owner, group, and other – Timestamps – creation, last update, last use – Number of ‘hard links’ to file – Size of file in bytes (highest byte number) – Number of blocks actually allocated – And pointers to blocks File access mechanism 153 i‐node – pointers to blocksFile access mechanism 2 3 4 5 6 7 8 9 10 11 12 13 14 Block 0 Block 1 Single indirection pointer block containing block numbers of next 1024 blocks in file Pointer block with disk block numbers of 1024 blocks in file Double indirection pointer block containing block numbers of 1024 pointer blocks Triple indirection pointer block containing block numbers of 1024 double indirection pointer blocks Double indirection pointer block containing block numbers of1024 pointer blocks Pointer block with disk block numbers of 1024 blocks in file 154 i‐node – pointers to blocks • i‐node contains 12 direct pointers to blocks – Blocks 0‐11 of file – If disk uses 4096 byte blocks, that’s ~48Kbytes; most files are smaller; can access any part of typical file from data in i‐node. • i‐node has a 13th pointer entry – but this points to a data block that contains pointers to blocks in the file – If 4k byte blocks, and 4byte file pointers, that allows for another 1024 blocks in the file • To access any of blocks 13 to 1035 of file, OS ‘seek’ must get the pointer block address from the i‐node, load the pointer block, look up block number in pointer block, and then get block • i‐node has two more entries – 13th entry – double indirection pointers – 14th entry – triple indirection pointers! File access mechanism 155 i‐node – pointers to blocks (rpt)File access mechanism 2 3 4 5 6 7 8 9 10 11 12 13 14 Block 0 Block 1 Single indirection pointer block containing block numbers of next 1024 blocks in file Pointer block with disk block numbers of 1024 blocks in file Double indirection pointer block containing block numbers of 1024 pointer blocks Triple indirection pointer block containing block numbers of 1024 double indirection pointer blocks Double indirection pointer block containing block numbers of1024 pointer blocks Pointer block with disk block numbers of 1024 blocks in file 156 nabg 27 i‐node • i‐node – It’s fixed size, relatively small, provides all information needed to access most files • Kerrisk – ‘Linux Programming Interface’ – suggest 95% of files in typical system – It’s indirection block entries allow for a maximum file size ~4terabytes. • Most complex access – From i‐node get address of triple indirection block – Read triple indirection block, find appropriate entry for double indirection block, – Read double indirection block, find appropriate entry for single indirection block – Read single indirection block, look up block number on disk of logical block in file File access mechanism 157 Linux ext2 disk partition layout Use of block groups helps avoid the blocks comprising a file from being scattered too widely over a disk. Inode table for blockgroup not too far from the files. (Big files may end up with data blocks from more than one blockgroup.) Free data blocks are tagged in the datablock bitmap (which must be 1 disk block size) so can have 128Mbyte of storage referenced if using 4kbyte disk blocks. 158 File System : logical structure • Unix adopted the file system scheme pioneered in Multics – a single rooted tree of directories and files – When Unix/Linux starts, it checks all the mounted data storage devices and combines their file systems into the single tree • If data storage devices are added later, e.g. a USB disk or flash drive gets plugged in, the automountd process will examine it and link it into the file system File system : logical structure In other OSs, e.g. Windows, the file systems on separate volumes are distinct – so you check for files on your C: drive or your E: drive etc. 159 File hierarchyFile system : logical structure 160 Standard Unix/Linux file systemFile system : logical structure Directory Purpose / Root directory /bin Essential command binaries (executable programs on disk) /opt Optional application packages /boot Boot loader files – kernel code etc /root Home directory for root /dev Devices /sbin System binaries – init, mount, … /etc Configuration files /srv “site specific data served by the system” 161 Standard Unix/Linux file systemFile system : logical structure Directory Purpose /home User directories /tmp Temporary files /lib Libraries used by programs in /bin and /sbin /usr Secondary hierarchy of applications and libraries /usr/bin Programs /usr/include Header files /usr/lib Libraries for binaries in /usr/bin and /usr/sbing /usr/sbin Less important system binaries and services /usr/src Source code for applications etc /usr/share Shared data 162 nabg 28 Standard Unix/Linux file systemFile system : logical structure Directory Purpose /usr/local Local installed applications etc /usr/local/bin Programs /usr/local/lib Libraries /usr/local/include Header files /usr/local/doc Documentation /usr/local/share Shared data /media Mount point for CDs and DVDs etc /mnt Mount point for temporary volumes e.g. flash drives /var ‘Variable files’ /var/cache Place where applications can temporarily cache costly computed data /var/lib State information /var/log Log files /var/spool Print queues, outgoing mail queue, … 163 /home • Entries in a directory are not ordered by filename – so linear searches are necessary when doing a directory lookup • It is therefore unwise to have directories with many thousands of files. • Organisation like uow with 20000+ users will create a hierarchy within /home – /home/staff – /home/pg – /home/ug • /home/ug/a … /home/ug/z File system : logical structure 164 Your Linux • Packages that you install will go in – /usr/include, /usr/bin/, and /usr/lib – /usr/local/include, /usr/local/bin/, /usr/local/lib – (Where they go depends a bit on the Linux distribution and its package manager; one ‘distro’ may rank a package as mainstream and put it in /usr, while another will consider it minority interest and put it in /usr/local; /usr/local is appropriate if different machines in a cluster have different versions of software, clustered machines might share /usr on some networked file system and all use the same versions) File system : logical structure 165 Your Linux • /var/logs is the place to look for log files from applications like Apache – You need sudo access – Error messages from applications, and records that might indicate hacker attacks will be in these logs • /etc is the place to look for configuration files for applications that you install, e.g. mysql or Apached/PHP – Sometimes you need to edit these config files, e.g. to allow ‘user directories’ support in your Apache File system : logical structure 166 Files, directories, and links • Unix/Linux has ~four file types – Directories – Normal files – Links – Device (block, character, socket, …) • Special type for devices, rare, mostly (?) in /dev • Unix/Linux doesn’t take account of file “types” based on file name extensions (.doc, .png, .gif, .pdf, …) with bindings to particular applications, as is done in Windows – Some Unix/Linux applications do use file name extensions to determine how a file is processed • E.g. Apache web server uses these to determine the “Content‐type:” header that will be added to a response for a requested file Files, directories, and links 167 Directories • The OS recognizes directories from the type information in their i‐nodes and treats them specially – While you can “open” a directory, you cannot use “read” or “write” • The contents of directories can only be accessed via readdir system call • A directory is – Conceptually, a table of entries • i‐node number, file‐name (max 255 characters) – Look up file‐name, get back i‐node number – Actually, just a big byte stream Files, directories, and links 168 nabg 29 Directories • The OS handles file create requests by allocating an i‐node number, and inserting this number along with the filename into the appropriate directory. • When a file is deleted, the OS reclaims the disk blocks, and recycles the i‐node; but in the directory, it simply replaces the i‐node number with 0 – Directories therefore tend to grow as files are created and deleted Files, directories, and links 169 Files • Just a sequence of bytes –Well, maybe – there can be holes where there are no bytes! Files, directories, and links 170 Links • Links: – Basically, these allow the same file to appear (possibly under different names) in several directories – There are “hard links” and “soft links” • A hard link has i‐node number and name – There is no difference between the original directory entry and other entries subsequently created as links • Creation of a hard link increments the link count entry in the file’s i‐node – An unlink (delete) operation on a file reduces the count, the file is only deleted when the count goes down to 0 • Soft links are special files – the data in a soft link file is the (full path) name of the real file – Soft links don’t get tidied up by the system, and may remain after the real file has been deleted Files, directories, and links Hard links are limited to a single ‘file system’. 171 Links • The OS will not allow ‘hard links’ to directories – Such links would convert the tree‐structured file hierarchy into a graph possibly including cycles! • A cyclic graph file system would greatly complicate the work of all programs that must search for and process files! Files, directories, and links 172 Access permissions • Unix/Linux distinguishes three classes of potential users for files – Owner – Member of the group that owner assigned to file – Others • And there are three kinds of usage – Read – Write – Execute • For a file – run as a program • For a directory – allow access to files in the directory File and directory access permissions If ‘others’ have execute permission on a directory e.g. $HOME/public_html, they cannot search that directory to discover file names, but they can lookup a known file to get its inode173 Access permissions • Permissions are represented as bits in the mode field of the i‐node for the file or directory – Three sets (user, group, other) each of three bits • r, w, x – read write and execute • Listings of directories will show file access permissions nabg@zim:~$ ls -l VirindiSetup/ total 128 drwx------ 2 nabg staff 4096 2009-02-25 15:49 2009setups -rw------- 1 nabg staff 92160 2007-07-23 10:14 BitDemo.tar -rwxr-xr-x 1 nabg staff 7915 2012-11-27 10:47 dav_svn.conf -rw------- 1 nabg staff 1251 2012-11-27 10:10 dav_svn.groups -rwxr-xr-x 1 nabg staff 956 2009-02-26 10:40 Groups.pl -rw------- 1 nabg staff 2805 2009-02-26 11:01 Groups.txt drwxr-xr-x 4 nabg staff 4096 2009-02-26 11:24 prototype drwxr-xr-x 3 nabg staff 4096 2007-07-23 16:37 trycheckouts File and directory access permissions Only I have access to the contents of directory 2009setups. Only I have access to the file Groups.txt. Anyone can run the program Groups.pl, or look in the prototype directory.174 nabg 30 Directory listings • That directory listing had ‘d’ and ‘‐’ entries – differentiating directories and ordinary files. – If you looked in /dev, you would see other types like ‘c’ (character special device) or ‘b’ (block device) crw--w---- 1 root tty 4, 24 2013-12-03 12:40 tty24 crw--w---- 1 root tty 4, 25 2013-12-03 12:40 tty25 brw-rw---- 1 root floppy 2, 0 2013-12-03 12:40 fd0 – You might even find ‘p’ (pipe) or ‘s’ (socket) – but these will probably only be found in a process’s subdirectory of /proc File and directory access permissions 175 Directory listings ‐ links • Hard links appear as ordinary file entries • Soft links are tagged as links (l – that is letter L, not 1) and the file/directory that they link to is identified nabg@zim:~/tmp$ pwd /home/staff/n/nabg/tmp nabg@zim:~/tmp$ ls -l total 0 nabg@zim:~/tmp$ echo 'Hello World' > file1.txt nabg@zim:~/tmp$ ls -l total 0 -rw-r--r-- 1 nabg staff 12 2013-12-31 09:04 file1.txt nabg@zim:~/tmp$ ln ./file1.txt file2.txt nabg@zim:~/tmp$ ls -l total 0 -rw-r--r-- 2 nabg staff 12 2013-12-31 09:04 file1.txt -rw-r--r-- 2 nabg staff 12 2013-12-31 09:04 file2.txt File and directory access permissions 176 Directory listings ‐ links nabg@zim:~/tmp$ echo "Hi mom" >> file2.txt nabg@zim:~/tmp$ ls -l total 0 -rw-r--r-- 2 nabg staff 19 2013-12-31 09:05 file1.txt -rw-r--r-- 2 nabg staff 19 2013-12-31 09:05 file2.txt nabg@zim:~/tmp$ ln -s ./file1.txt file3.txt nabg@zim:~/tmp$ ls -l total 0 -rw-r--r-- 2 nabg staff 19 2013-12-31 09:05 file1.txt -rw-r--r-- 2 nabg staff 19 2013-12-31 09:05 file2.txt lrwxrwxrwx 1 nabg staff 11 2013-12-31 09:05 file3.txt -> ./file1.txt nabg@zim:~/tmp$ cat file3.txt Hello World Hi mom File and directory access permissions 177 Directory listings – i‐nodes • You can find what i‐nodes your files use should you really wish to know nabg@zim:~/tmp$ ls -li total 0 22816837 -rw-r--r-- 2 nabg staff 19 2013-12-31 09:05 file1.txt 22816837 -rw-r--r-- 2 nabg staff 19 2013-12-31 09:05 file2.txt 22816838 lrwxrwxrwx 1 nabg staff 11 2013-12-31 09:05 file3.txt -> ./file1.txt file1.txt and file2.txt are the same file – so the same i‐node number; file3.txt is a link file that holds a string with the name of the linked file. 178 Directory listings hidden files • Files whose names begin with ‘.’ are not usually shown in directory listings – The convention is that programs that need to create private configuration files or data files in your directory will give them names that begin with ‘.’ (or place them in directories whose names begin with ‘.’); the intention is that these work files and directories don’t show up and get confused with files that you have created yourself – You can get these files and directories displayed by using option –a (all) in a directory listing command 179 Directory listings nabg@zim:~$ pwd /home/staff/n/nabg nabg@zim:~$ ls ‐l total 4276 ‐rw‐‐‐‐‐‐‐ 1 nabg staff 0 2006‐12‐22 18:26 afile drwx‐‐‐‐‐x 6 nabg staff 4096 2010‐09‐30 14:55 Backup_INFO‐874752S drwxr‐xr‐x 7 nabg staff 4096 2013‐07‐17 14:37 CSCI222Examples ‐rw‐r‐‐r‐‐ 1 nabg staff 4091097 2013‐07‐16 11:17 CSCI399labinfo.pdf drwxr‐xr‐x 2 nabg staff 4096 2013‐07‐23 15:09 Desktop ‐rwx‐‐‐‐‐‐ 1 nabg staff 724 2007‐01‐05 11:03 Document.rtf drwxr‐xr‐x 2 nabg staff 4096 2013‐07‐09 09:17 Documents nabg@zim:~$ ls ‐la total 4624 drwx‐‐‐‐‐x 64 nabg staff 12288 2013‐12‐31 08:58 . drwxr‐xr‐x 25 root root 4096 2013‐12‐11 06:45 .. drwx‐‐‐‐‐‐ 4 nabg staff 4096 2013‐07‐23 14:46 .adobe ‐rw‐‐‐‐‐‐‐ 1 nabg staff 0 2006‐12‐22 18:26 afile ‐rw‐‐‐‐‐‐‐ 1 nabg staff 1663 2009‐07‐28 14:28 .asadmintruststore drwx‐‐‐‐‐x 6 nabg staff 4096 2010‐09‐30 14:55 Backup_INFO‐874752S ‐rwx‐‐‐‐‐x 1 nabg staff 21 2013‐12‐20 03:01 .BackupNominated.log.txt ‐rwx‐‐‐‐‐x 1 nabg staff 222 2011‐02‐02 11:05 .BackupProfile.log.txt ‐rw‐‐‐‐‐‐‐ 1 nabg staff 7635 2013‐12‐30 20:30 .bash_history ‐rw‐r‐‐r‐‐ 1 nabg staff 24 2000‐03‐02 17:06 .bash_logout 180 nabg 31 Directories • Unix/Linux starts each directory with two entries – ‘.’ i.e. self – an entry with the i‐node of the directory – ‘..’ i.e. parent – an entry with the i‐node of the parent directory • The OS simply keeps track of the current working directory, the .. entry allows a user to climb back up the file tree hierarchy 181 Directories nabg@zim:~$ pwd /home/staff/n/nabg nabg@zim:~$ ls -lai total 4624 8189181 drwx-----x 64 nabg staff 12288 2013-12-31 08:58 . 5646123 drwxr-xr-x 25 root root 4096 2013-12-11 06:45 .. nabg@zim:~$ cd .. nabg@zim:/home/staff/n$ pwd /home/staff/n nabg@zim:/home/staff/n$ ls -lai total 124 5646123 drwxr-xr-x 25 root root 4096 2013-12-11 06:45 . 64 drwxr-xr-x 30 root root 4096 2012-09-17 15:28 .. 8189181 drwx-----x 64 nabg staff 12288 2013-12-31 08:58 nabg 182 / (root) • The parent of / (root) is root (defining the top of the file hierarchy) nabg@zim:/$ pwd / nabg@zim:/$ ls -lai total 104 2 drwxr-xr-x 25 root root 4096 2012-04-27 10:43 . 2 drwxr-xr-x 25 root root 4096 2012-04-27 10:43 .. • i‐node – 0 – means no i‐node (used to indicate deleted entry in directory) – 1 – inode 1 is a ‘file’ comprising all the ‘bad blocks’ that have been found in the disk partition • OS can read after write to check that data OK, if fail marks block as bad so that it will not get used – 2 is for root 183 chroot • Sometimes it is appropriate to alter other directories so that like the root directory they set their parent to reference themselves – Only possible for process running as root to use chroot – Purpose of a ‘chroot jail’ is to prevent directory navigation by hostile elements • E.g. directory used for anonymous ftp access may have chroot jail – this stops malicious person using ‘..’ to navigate up through file system (trying to get to files that are private) 184 File system example 185 Example C program navigating file hierarchy • Program: – Input: a pathname of a directory – Output: • Number of searchable subdirectories • Number of files in directory and subdirectories • Pathname and size of largest file • System calls (packaged via C wrappers) – opendir, readdir, closedir – directory access – lstat – information about a file 186 nabg 32 dirent.h – Unix/Linux directories • dirent.h contains function prototypes: DIR*opendir(const char* dirname) Opens a directory stream corresponding to the directory named by dirname. The directory stream is positioned at the first entry. Upon successful completion, opendir() returns a pointer to an object of type DIR. Otherwise, a null pointer is returned and errno is set to indicate the error. 187 dirent.h – Unix/Linux directories struct dirent*readdir(DIR* dirp) Returns a pointer to a structure representing the directory entry at the current position in the directory stream specified by the argument dirp, and positions the directory stream at the next entry. It returns a null pointer upon reaching the end of the directory stream. When an error is encountered, a null pointer is returned and errno is set to indicate the error. When the end of the directory is encountered, a null pointer is returned and errno is not changed.The memory location pointed to by the return value is managed by the library and may change on subsequent calls to readdir. It should not be free’d by the user. 188 dirent.h – Unix/Linux directories int closedir(DIR* dirp) Closes the directory stream referred to by dirp. Upon return, dirp may no longer point to an accessible object of the type DIR. If a file descriptor is used to implement type DIR, that file descriptor will be closed. Upon successful completion, closedir() returns 0. Otherwise, ‐1 is returned and errno is set to indicate the error. 189 dirent.h • There are also functions that can be used to read specific entries in the directory, and to ‘rewind’ a directory if you need to run through the files repeatedly. • Structs defined – A DIR data structure contains data such as the file descriptor that is being used, directory name etc. It is rarely manipulated by the program, just used in calls. – A dirent data structure contains information such as name of file, type (directory, regular file, link, …) , i‐ node number etc. 190 sys/stat.h • This declares function prototypes and structures used to get detailed information about a file – stat, and lstat functions get details of a file given its pathname • stat follows symbolic links (since these may be stale, it can take you to files that no longer exist) • lstat returns the information about the link file itself for those directory entries that are symbolic links • struct stat contains lots of data including – i‐node number, mode (file type & access data), number of links, owner id, group id, size, number of ‘blocks’ actually occupied on disk, usage times etc • There are defined macros and functions that should be used to read data from a struct stat 191 Program Get pathname of directory from stdin Recursively search below that directory checking sizes of all regular files 192 nabg 33 Recursive function – arguments are an opened DIR* and the directory name; Loops through all entries in the directory, readdir() fills in “entry” with data for next entry. Ignore the entries for “.” and “..” Create full pathname of entry (directory name plus entry name) 193 (function continues) Use the lstat() function to get the ‘file’ information about the current entry in the directory. 194 Current entry is a directory – try to open it (skip entry if access permissions prevent opening), make recursive call Current entry is a regular file – check size, if appropriate note that it is largest file, increment count strdup() uses malloc to create space, and then copies a string 195 It runs … Output from the perror() function that looks up error codes and prints explanations for errors. 196 Deeper into the OS’s file manager Buffering of files 197 Buffered I/O • System calls read and write typically will not initiate disk transfers! • Generally: – Read: • The OS will read‐ahead, maybe as much as 128kbyte of a file into system buffers after the file is opened (128k >> average size of a data file) • Read will simply copy data from a system buffer to an application buffer – Write • The OS will buffer the data – it won’t wait for them to be written to file. • Result – Read and write system calls usually return immediately allowing the user process to continue, so minimizing the time that it is occupying space in main memory. 198 nabg 34 I/O buffering • Most applications can rely on the system’s defaults for buffering data, but sometimes it’s advantageous to overrule these defaults to achieve higher performance – Need to improve performance most likely to arise in applications that perform frequent updates of a binary record structured file (the system defaults for buffering don’t work well with such files) 199 I/O Buffering • OS will set aside some amount of memory for file buffers – Traditionally, there was an explicit ‘buffer cache’; more modern Linux systems may combine the space allocation aspects in the main ‘page management’ for free space in memory. • Writes will copy data into a buffer, when full (or at program request) OS will arrange an actual transfer operation • Reads will copy data from the buffer into memory • System read and write requests (unistd.h) transfer specified numbers of bytes; these calls are relatively expensive and its best to arrange that read/write requests are for reasonable amounts of data e.g. 512 bytes, rather than have numerous requests for small 4‐ byte of 8‐byte fragments. 200 I/O buffering • Moderately common for a block that was ‘written to disk’ to be re‐read shortly after – Particularly common if updating binary record structured files. • Buffer blocks that have been written out get appended to a buffer free‐list, but still have associated data defining what block of what file they represent • OS will check in the buffer pool when asked for a block before requesting an actual read. • Also not uncommon for different processes to want to read the same file – so checks through the buffer pool help here as well. 201 Buffer structures • This example from Bach’s presentation of Unix Sys‐V (not a current Linux) – Example intended simply to illustrate the complexities of low‐level OS 202 Buffer structures • Disk I/O buffers represented by ‘buffer header’ data structures, and data blocks (the same size as disk blocks) – Contain data identifying block (device, block number, …); linked in double linked lists device num block num status ptr to next buf on hash queue ptr to previous buf on hash queue ptr to next buf on free list ptr to previous buf on free list ptr to data area 203 Buffer queues and freelist • Buffers get allocated to disk blocks and are placed in these doubly linked hash queues • Once data has been read from buffer (or written out to disk), the buffer remains in the hash queue but is appended to a free‐list of buffers – Least recently used buffer at head of free list 204 nabg 35 Buffer ‘hash queues’ • OS will check in memory buffers before initiating a new read or write operations – E.g. • Buffer already has been read, it’s in memory, don’t need to read again • Buffer has been written to – so it’s more up to date than data on disk; use it. • Could be a large number of buffers – So to reduce length of search, group them • Hash block number (Brach’s example uses blocknumber % 4) • Separate queues for each hash value 205 Buffer hash queues 24 3713 145066 16327915 16 256 121 freelist blknum mod 4 0 1 2 3 260 43 206 Finding buffers • OS would find #16, #24, and #50 in memory, but not #333 – #24 might be marked as locked by another process – requesting process would end up in some wait queue associated with buffer access – #16 would be removed from freelist and marked with appropriate status (being read, or being written, locked/not‐locked) • For #333, buffer manager would take oldest buffer on freelist (#43) and relink into appropriate hash queue 207 Buffering and stdio • Details of how OS actually manages buffers are only apparent to people working on the OS code. • Application programmers can make use of functions in stdio (or iostream) to change some aspects of how buffering is performed int setvbuf(FILE*, char*, int mode, size_t size) – Can provide own buffer (in static data segment or heap) – Can set mode – • _IONBF – don’t buffer, write immediately (default for stderr) • _IOLBF – defaults for stdin/stdout to terminal – line oriented • _IOBF – block buffer, default for disk streams 208 Data in user program stdio prinf Data in stdio buffer write sytem call Data in buffer Write request Disk Use fflush to force data from stdio buffer to system buffer fsync, fdatasync, sync 209 Forcing writes • Programmer can use fflush to make certain all data sent from stdio’s buffers to system’s kernel buffer • Calls fsync, fdatasync, sync allow program to request that data are actually written to disk – fdatasync updates just the data, fsync also makes sure that inode information (meta‐data) are also updated 210 nabg 36 Even deeper into the OS’s file manager Scheduling of disk operations 211 Scheduling disk accesses • At a still deeper level within the OS¶, you may find algorithms and data structures used to schedule disk operations. – Objective – minimize the slow head movements • arrange to minimize number of changes of direction (inwards to lower track numbers, or outwards to higher track numbers) • This is done by working by carefully re‐ordering the block requests as they are inserted into the queue of requests for read/write access ¶Actually, some of this code may be in the disk controller! These days most disks have their own CPUs running code, as well as a cache of blocks. The OS “writes to disk” – the disk unit simply stores the data in its cache, only later transferring the data to disk surface. The disk unit’s CPU will be executing a scheduling program held in ROM. 212 Disk access scheduling Want to keep heads tracking inwards (until have dealt with ‘innermost’ request) then reversing and tracking outwards 213 Disk access scheduling – ‘Elevator algorithm’ • Logical block number gets converted into disk’s surface, track, sector, and block – but for example work with logical block numbers as if they were real. • Suppose heads going inwards (to lower block number) and currently around block 16783 seeking to block 10444 (the only request in queue) – Queue 10444, Current 16738 • Now add requests for 17827, 31914,15666, 9817 – Queue 15666,10444,9817, (will reverse direction) 17827, 31914 • Disk heads reach 15666 and transfer occurs • Add more requests – 11233, 24898, 16812, 2044, 49777, 12619, 23876 – Queue 12619 , 11233, 10444, 9817, 2044, reverse, 16812, 17827, 23876, 24898, 31914 Elevator algorithm – similar algorithms control elevators that head up, then head down, possibly picking up extra passengers at intermediate floors 214 Solid state memory (NAND‐based flash memory etc) • Semi‐conductor random‐access memory replaced magnetic core technology for the main computer memory back in the 1970s; but ‘DRAM’ requires power to maintain data (actually has to continuously refresh the bits). • Various alternative semi‐conductor technologies – e.g. ‘bubble memory’ – were devised that could store data without power (1970s‐1980s). • NAND‐based flash memory (ask Wikipedia!) started to be introduced in ~1995; initially used for USB drives, memory cards in cameras etc. • By 2010, flash‐memory starting to replace disks; now quite common for tablets and laptops to have solid‐state drive; 215 Solid state memory will change OS • With rotating disks, have two time‐consuming steps to access a block – Very slow movement of disk heads to required track (slow in computer terms – milliseconds!) – Slow rotational latency as wait for block to rotate around • With solid‐state devices, all blocks are equally readily accessed – no delays. • Lots of code in an OS is typically devoted to optimising sequence of requests for disk reads and writes to minimize head movements – all this code will become redundant when solid‐ state devices become the norm. • Less need for OS to use DRAM buffering as well • So expect some simplifications to OS code over next few years Currently, flash 10x disk in cost $/per gigabyte; probably will change 216 nabg 37 Process management 217 OS – 3 ‐ processes • The OS must provide for multiple processes (multi‐tasking) – This has several aspects • Creation (and termination) of processes • Communication between processes • Scheduling of processes • Allocation of main memory to processes • Management of main memory and 2nd‐ry memory (virtual memory systems) – The creation of processes is apparent through the OS API and forms a part of many applications (e.g. the shell – almost all the shell commands involve the creation of a short‐lived process); the other aspects are mainly of concern to those implementing the OS • (though occasionally a systems administrator might want to intervene – changing process priorities, terminating processes etc) 218 Processes • Have already introduced the process management system calls that are apparent to programmers – fork, exec, wait, exit • What else is there? – Threads • Brief overview – you don’t learn about threading until 200‐level (CSCI204, CSCI212, and CSCI213) – Inter‐process communication – The OS’s task scheduler • Mainly left to be covered in a subsequent OS subject – Memory management and virtual memory • Mainly left to be covered in a subsequent OS subject Processes 219 Threads • Threads are for concurrent activities (like multi‐tasking of processes); with threads its concurrency within a single process. • Why? – Sometimes, it’s because you have several tasks that can proceed independently • E.g. Something like Microsoft Word – One thread attending to your typing in text, adding text to document data structure – Possibly another thread updating the display of text – Another thread that is busy in the spell checker; this thread arranges for incorrectly spelt words to be uderlined • Maybe you have a problem where can perform multiple parallel computations on data (e.g. image analysis); but these problems often better suited to specialist ‘Single Instruction Multiple Data’ processors – Usually, it’s to look after multiple I/O streams! Processes: threading 220 Multiple I/O streams • Think of your browser trying to load a big complex web page with lots of images and special effects like a slide‐ show, and updates about activities of your friends – If you can handle only one I/O stream at a time, it’s going to be slow! • You will have to wait – The HTML page will download – but can’t be shown yet – The style sheet will download – the browser may be able to display a rough outline of the page – The first image must be fetched – the browser can show that – The 2nd image gets fetched – … – Much nicer if your browser starts multiple connections to the server(s) and starts downloading files at the same time Processes: threading 221 Multiple I/O streams? • The easy way to program I/O is to use the normal potentially blocking I/O requests – Want to read data across the network from a remote sever – well, you’ll certainly have to wait. – If you have a single threaded program, then waiting for input means you cannot do anything else – so you can’t handle other inputs, and your browser cannot do any preliminary page layout Processes: threading 222 nabg 38 Handling multiple I/O stream – the hard ways 1. You could use ioctl to switch the input streams to non‐ blocking, then you would write complex, error prone, and resourcing consuming code that loops around repeatedly checking the different input streams to check for data 2. You could use specialised ‘select’ (or ‘poll’) system calls – These allow you to tell the operating system that you are expecting input to arrive on any one of a number of connections – and you would like your process to sleep until data are available • This avoids those loops that waste CPU power, but the rest of the coding is just as hard and error prone as non‐blocking I/O Processes: threading 223 Multiple I/O streams with threads • If you use multiple threads, you can use standard blocking I/O – When one thread blocks on a read request (because data haven’t arrived yet), OS can schedule another ready thread within the same process. • Switching CPU attention to a different thread is less costly than switching to a different process – No need to change memory mapping, the threads share the memory space – Reduced need to flush and re‐load instruction and data caches. • Note – some OSs (including many versions of Linux) use the same OS data structures to represent threads and processes; then having lots of threads consumes space in main OS process table but may have advantage of not requiring two separate sections of scheduling code – one scheduling processes, another scheduling threads within a process. (This is a topic more suited to a more advanced OS subject.) Processes: threading 224 Threaded servers • Server programs – Oracle (DB2, SQLServer, …) database engine – WebServer – Apache, nginx – Application server – Jboss, Glassfish, … – These are typically multi‐threaded • Often “thread per connection” – When client connects, a thread gets allocated to handle that client; this thread waits for and responds to subsequent requests (like “run this SQL statement” or “give me this HTML page”) – When client disconnects, thread is either destroyed and resources reclaimed or is “recycled” and made ready to handle next client Processes: threading 225 Threaded server main thread create socket bind to port activate forever() accept client create new thread to deal with this client start thread in “handle_client” handle_client forever() read next request if “bye” then break handle request send response thread die handle_client forever() read next request if “bye” then break handle request send response thread die handle_client forever() read next request if “bye” then break handle request send response thread die handle_client forever() read next request if “bye” then break handle request send response thread die Server process (threaded) shared data May have shared data – needs synchronization locks My “threaded server” slide originally in my version CSCI213 226 Processes: threading Many CPUs? Many threads running concurrently • With symmetric multi‐processor designs (and now multi‐core chips) you can have many CPUs so it’s quite possible that the OS will allocate more than one CPU to a process so it can have multiple threads running. • Thread packages vary – Sun’s own thread package had features that allowed you to define thread groups and request specific numbers of CPUs be allocated (if available) to particular thread groups – But generally, you don’t have much control over the OS’s scheduling of threads within a process (other than possibly a priority value associated with each thread) Processes: threading 227 Threads need their own stacks! • Threads can share – Code segment – Static data segment – Heap – Those areas of OS data made available to a process • But threads must have their own stacks! – Each thread is going to be invoking functions that invoke other functions (possibly recursively), and so need their own space to store the call sequence, arguments, local data space of called functions etc. Processes: threading 228 nabg 39 Where do thread‐stacks go? • At one time (I think), they went on the heap – When you created a thread, you specified a stack size as one of the arguments. The thread library claimed a block of space on the heap. • Nowadays, the thread‐stacks are mapped into your virtual memory but this is often done in a manner other than by expanding the heap Processes: threading 229 Thread stacks in simplified Linux virtual memory model Text (i.e. code) Static Initialized Zeroed Heap Stack Virtual addresses not all mapped into physical memory (new stacks get mapped of course) Program “brk” (or “break”) point – program can ask OS to expand heap by adjusting brk point Those parts of OS’s process data that are accessible to program Three stacks created for three threads created by main thread of process Some programs, e.g. “application servers”, may have hundreds of threads – so layout of virtual memory can become complex (and crowded) Processes: threading 230 OS – 3 ‐ processes • The OS must provide for multiple processes (multi‐tasking) – This has several aspects • Creation (and termination) of processes • Communication between processes • Scheduling of processes • Allocation of main memory to processes • Management of main memory and 2nd‐ry memory (virtual memory systems) – The creation of processes is apparent through the OS API and forms a part of many applications (e.g. the shell – almost all the shell commands involve the creation of a short‐lived process); the other aspects are mainly of concern to those implementing the OS • (though occasionally a systems administrator might want to intervene – changing process priorities, terminating processes etc) 231 Inter‐process communication • There are several mechanisms for communication among processes running on the same machine, including – Signals – Use of files (with file locking) – Pipes – Sockets • Will be covered as a special case of sockets for communication between different computers – Semaphores – Shared memory • Processes running on different machines communicate via networks – At lowest level, ports & sockets • Usually, the port & socket layer is hidden – Other libraries provide higher level abstractions (that work via ports & sockets) such as “remote procedure call”, “remote object invocation (Java RMI, CORBA)”, or message passing via a message queue server Processes: inter‐process communication 232 Signals • Not really much communication – it’s commands like “die”! • Signals apply within a “process group” ~ a process and the descendant process that it has created (sometimes, creation of a child process can also arrange to create a new process group). – Restriction to process groups is to select which of your process the signal applies to • Process groups belong to “sessions” – a session is ~ your shell and anything started from your shell – Signals cannot be sent to processes from different sessions • This is to prevent you from sending a kill signal to a process belonging to another student on the same time‐shared Unix system! – (Well ‘root’ and OS can send signals to any process) Processes: inter‐process communication 233 Signals • Signals have identifiers (integer values) – and that’s the only information that they can convey • Some signals are nothing more than status data that the OS uses to notify a controlling shell process when an application process dies because of a programming error (e.g. arithmetic overflow) • OS sorts out the receipt of a few of the other signals (these are mostly those relating to waiting/ready‐to run status conditions). • For the others ‐ on receipt of a signal, the receiving process will (by default) roll over and die. – These signals are only useful as a communications mechanism when the code of the process has set up a signal‐handling function that allows it to do something other than die of shock! Processes: inter‐process communication 234 nabg 40 Signals • There is a set of signals defined in the POXIX standard. – Some are used by the OS to terminate a process when something drastic goes wrong – • SIGSEGV (segmentation fault i.e. addressing error from something like a bad pointer) • SIGBUS (other addressing error, e.g. trying to load a 4byte float with an odd address – again most likely a corrupted pointer) • SIGILL (illegal instruction) • SIGFPE – erroneous arithmetic operations • … – Others (changing ready/waiting status etc) that are used by OS include • SIGCHLD – when a child process terminates, this signal is sent to parent process (OS sorts it all out, if parent was waiting for child – via waitpid() – it will be marked as ready to run) • SIGSTOP/SIGCONT – suspend/resume a process • SIGTRAP – used by debuggers Processes: inter‐process communication 235 Signals • More signals defined in the POSIX standard – these often get sent by users to their own processes – SIGINT ‐ • when you have launched a process from shell, you can usually kill it with cntrol‐C (a SIGINT signal is sent to child processes of your shell session) – SIGKILL – • use a command to kill a particular process that you own – SIGUSR1 & SIGUSR2 – employed for control of some applications Processes: inter‐process communication 236 Example of use of signals for control • Long running (daemon) servers (database engines, web‐servers, application servers) typically have no control console session – so there is no easy way for a sys‐admin to send commands to change behaviours (e.g. temporarily suspending processing of some class of requests) – Some servers provide for sys admin control by maintaining an open “port” to which the sys‐admin can send requests (these days, often relying on HTTP and a web style client) – But if only simply controls are required, the SIGUSR1 and SIGUSR2 signals may suffice • SIGUSR1 – on receipt, server does something like write a checkpoint dump of its activities and then continues Processes: inter‐process communication 237 Sending signals • Signals are sent via “kill” – Shell level command • Find the process id (from ‘top’ or ‘ps –ef’), e.g. 2931 • Kill ‐9 2931 – C function wrapping system call • Kill –Which signal (SIGSTOP, SIGINT, …) – Process id of process to be “killed” Processes: inter‐process communication 238 Handling signals • C (C++) program can define signal handler functions – C library function sets up association between programmer defined signal handling function and signal number • If OS gets a signal for one of those handled it temporarily suspends current execution of program and causes handling function to run – If this function doesn’t terminate execution, the OS will allow program to resume when the function returns. Processes: inter‐process communication 239 Handler function defined for: SIGUSR1 – standard application defined signal communications, and for: SIGINT – the normal cntrl‐C “kill from shell” signal Handler functions associated with signals 240 nabg 41 Sending signals to specified processes using shell’s “kill” command SIGUSR1 – can be (and in this case is) handled by code in application; SIGKILL – outright kill, nothing can stop this Cntrl‐C from console, normally kills process but here is trapped by application code 241 Files for inter‐process communication • Obviously – One process can write/update a file – Another process can read it – (File on disk, or in modern Linux “memory mapped” – entire file read into memory and changes made to memory resident copy) • Equally obviously – Don’t want to cause confusion by having • Multiple processes trying to update a file – Their updates might get interleaved, corrupting the data in the file • One process overwriting a file as another is reading it – the reading process is quite likely to get a mix of old and new data Processes: inter‐process communication 242 Files for inter‐process communication • Generally need some form of locking mechanism if using files for communication – Along lines of • “Test & Set” lock – If test&set was successful – use file, when finished release lock – Otherwise – either sleep a bit and then try again, or (depending on OS call) get blocked until lock is released Processes: inter‐process communication 243 Locking files • Linux does have an flock() system’s function (in ) and a process can request an exclusive lock (the request function is blocking, i.e. it doesn’t return until the exclusive lock has been granted) and can later release the lock. • Processes should be able to collaborate using this locking mechanism – These locks are “advisory” – they only work for collaborating processes. – There is nothing to stop another process from just opening and using a supposedly locked file. • Making a group of processes work properly using such a locked file can be tricky – there are odd issues relating to when written data get flushed and whether a reader will see the latest data. Processes: inter‐process communication 244 FIFO (more practical version of communication via files on disk) • If you need to connect two processes so that one acts as a data producer (writer) and another independent process as a data consumer (reader) it will probably be easier to use a “named pipe” or “FIFO” rather than to try to set up a file and co‐ordinate access via locks. • The mkfifo() system call creates a special type of file (shows as prwx‐‐‐‐‐‐ in directory listing) • Can then use low level open() request to get a file descriptor for writing in producer process, or for reading in the consumer process. • Named pipes (FIFOs) are rarely used. 245 Processes: inter‐process communication Pipes • Pipes are much older and are more commonly used than FIFOs. – Pipes were a feature of the earliest Unix systems • Pipes are used to connect a parent process to a spawned child process • A pipe request sets up a unidirectional data stream – It is defined with two ‘file descriptors’ – one allows for reading, the other for writing • For two way communication, you set up two pipes – this set up step is done before the parent forks to create parent and child processes • Subsequently – the parent can write data for the child process to read on one stream – The child process can send results back to the parent on the other stream 246 Processes: inter‐process communication nabg 42 Example of using pipe : Apache web‐server and spawned CGI process • The Apache web‐server could handle requests for static content (HTML pages, images, etc) itself, but dynamic content (e.g. response data for a form request had to be generated by a separate “common gateway interface” program – a program written in C, or a Perl script etc – The CGI program could generate the content of a response page, but it was the Apache web‐server that had to return this response and deal with details of the HTTP protocol • The parent Apache process and the CGI program had to work together and exchange data – for this they used pipes. 247 Processes: inter‐process communication This description is ~correct for early versions of Apache and CGI programs, modern versions are a little more complex Serving the web 248 Processes: inter‐process communication Browser on web client displays a form HTTP request ‘posted’ over Internet Apache web‐server and Perl script running on server Perl script, CGI program PipesComposedresponse page Data from form Response page from CGI program, with response headers added by Apache Reads data Checks data Logs pizza order Compose acknowledgement with order number, price, and estimated delivery time Apache ‘forks off’ child CGI process to run Perl script Example ‘pipe’ code • Parent program mainline – Set up two pipes – Fork child – Write message through pipe‐1 to child process – Read response from child on pipe‐2 – Exit • Child process – Read message read from pipe‐1 – Reverse string – Write reversed string as response on pipe‐2 249 Processes: inter‐process communication 250 Processes: inter‐process communication 251 252 nabg 43 253 waitpid(childpid,&status,0) • Status – Integer • Actually really a struct with different fields, but each field is 1‐byte and they are packed into an integer. • Fields: – Signal » If child process was killed by a signal from OS (e.g. SIGSEGV) the signal value is stored in this field – Exit code » Integer N in child’s exit(N) or return N (in main()) that terminated child process 254 Processes: inter‐process communication Fork‐exec & pipes • Example simply showed fork, same code in both parent and child, no problem for child process to find the pipe file descriptors. • But suppose you want different code (as when Apache forks off a child process that is to run a Perl script)? • Child process would invoke exec and load a different program • Where are pipe1[] and pipe2[] in the new program? 255 Processes: inter‐process communication Fork‐exec & pipes • The “file descriptors” – identifiers of data streams – would be available to the new program because they are held in the shared OS memory • But new program wouldn’t be able to find which they were! • So, when pipes are used in a fork‐exec scheme, there is a bit more work to be done – Rearrange the plumbing! 256 Processes: inter‐process communication Fork‐exec & pipes : rearranging the plumbing • There is a little piece of standard code that gets pasted into the code: – Create the two uni‐directional pipes. – Fork – If child process • Re‐arrange plumbing! – Downstream‐read connection re‐mapped to file‐descriptor 0 (stdin) – Upstream‐write connection re‐mapped to file‐descriptor 1 (stdout) • Now exec and load other program code – Other program can read from stdin and write to stdout • Standard code works by child process closing stdin (0) and stdout (1) and then using the dup2() system call to re‐assign the appropriate file‐descriptors from the two pipes. 257 Processes: inter‐process communication Semaphores • 1970s Unix didn’t offer much in terms of primitives for synchronising activities of different processes. • There was a kind of “hack” – The calls mkdir() of open() (with file‐modes specifying create, write, …) will fail if the directory or file already exists – So you use the existence of a file as a form of lock • Try to create ‘lock directory/file’ – Success, go ahead do work, delete – Fail, sleep a bit, try again 258 Processes: inter‐process communication nabg 44 Hacking process synchronisation • Demo: – Supposedly multiple processes all using same code where need exclusive access to a set of files. – Each process running code of form • Do – Claim synchronisation lock – Work – Release lock – Idle • Repeat – Synchronisation lock using the “create file hack”! • The file created is simply the lock – it’s not part of the set of files for which processes need exclusive access. • Works by invoking “open(file, …)” with options that specify “create” etc, create request fails if lock file already exists. 259 Processes: inter‐process communication 260 261 Attempt to claim resource: Return from function on success (with lock file created) If fail to get resource, sleep for a random amount then try again. 262 Release resource: Just unlink (delete) that lock file 263 Supposed work functions – workabit (here would be using the set of files for which exclusive access is required), idleforabit (here would be doing other work) 264 Four processes launched concurrently, each running same code. Their “work” on exclusive access files gets interleaved without problems. nabg 45 Synchronisation with lock file hack • It’s easy • It works • It’s still a hack • As Unix evolved, more refined mechanisms were introduced. 265 Semaphores • In early 1980s, Unix acquired “semaphores” • sem∙a∙phore – noun1.an apparatus for conveying information by means of visual signals, as a light whose position may be changed. – 2.any of various devices for signalling by changing the position of a light, flag, etc. • In nineteenth century, railway stop/go signals were often referred to as semaphores 266 Processes: inter‐process communication Unix kernel would have included similar locks for internal use from the start; these system semaphores represent an extra service offered to user processes. Semaphores • Linux now has two kinds of semaphores handled by different libraries – One type of semaphore (in the POSIX library) is used for co‐ordinating threads within a process – The second kind, “System V semaphores”, (based on feature of Unix System V from early 1980s) is for co‐ ordinating processes • You will learn more about semaphores and other co‐ordination primitives for threads in CSCI204, CSCI212, and CSCI213 – It’s easier to explore co‐ordination issues with multi‐threaded programs rather than with lots of separated processes 267 Processes: inter‐process communication Semaphores • Using “System V semaphores” may be the right way to handle process co‐ordination, but it’s rather long‐winded. • These “semaphores” are OS managed resources belonging to individual users; the OS will have a limit on the total number of semaphores that can be created. • Each is actually a set of semaphores – the number needed in the set is specified at creation time (mostly want just 1) • A semaphore set has to be created, and the semaphores initialized, before they are needed by working processes. 268 Processes: inter‐process communication Semaphores • There are shell commands for listing the semaphore sets that exist in the OS and for deleting semaphore sets that you created but no longer need. • Semaphore sets are created with unique identifiers – These are generated by a library function • They are kept unique by incorporating a reference to a file that you own and a project number. – The creation function returns the identifier, this value is needed by the programs that utilize the semaphore set 269 Processes: inter‐process communication Semaphores • The example involves two projects – First project has simple C program that • Creates a semaphore set with a single semaphore • Initializes the semaphore – Initial state should correspond to “resource is free and available for use” • Returns the identifier – Second project is a variant of the last example, now using the semaphore to control access • The request to use the semaphore blocks if it is not available, so no longer need the “sleep, retry” arrangement 270 Processes: inter‐process communication nabg 46 271 Program to create a semaphore set, initialize it, and print its identifier 272 Just calls to system library functions semget(), and semctl() Semaphores • The semaphores have to be created before the programs using them get run – If you don’t do this, you end up with race conditions in the programs • Which one creates the semaphore? • Others must not start until the semaphore is both created and initialized • If you have created the semaphores, then you can use them 273 Processes: inter‐process communication 274 Same demo code as before with new versions of claimResource and releaseResource that will work with Semaphore in semaphore set. Identifier of semaphore set must be supplied. 275 Use semop() operations on semaphore (op = ‐1 is (attempt) to claim) (op = 1 is release) 276 Viewing (and deleting) semaphore sets using shell commands nabg 47 Semaphores • They may be the correct way of handling co‐ ordination (as opposed to hacking with lock files) but they do seem rather clumsy. • And they only provide co‐ordination • What about real communication – exchanging data among the processes? • That came a little later with “shared memory” 277 Processes: inter‐process communication Shared memory • Several variants have been created at various times – starting with Unix System V – and now Linux will typically offer a number of mechanisms. • Here just briefly consider Linux’s version of System V shared memory. – It’s just a block of bytes that can be used by your different processes • Your code has to perform pointer tricks and byte copy operations to get data into or out of shared memory – There are no co‐ordination mechanisms supplied • So you will also set up semaphores for co‐ordination – It’s the fastest means of sharing data • (Lots faster than pipes, fifos, ports‐&‐sockets) 278 Processes: inter‐process communication Alternative “Posix” shared memory mechanisms are described in http://man7.org/training/download/posix_shm_slides.pdf Using shared memory • It’s much like using semaphores – You create a shared memory segment with an id • The OS maintains it, the same shell commands can be used to view/delete instances as were used for semaphores – Your processes attach the segment • Basically, they just get a void* pointer to some space – They then do what they want with it, casting the pointer to something more useful and copying data – Remember to get rid of it when no longer used • Easiest to use shell command 279 Processes: inter‐process communication Using shared memory • Example – “Set up” process creates a shared memory object that the OS will manage • Object will hold a data structure that will allow two processes to play a game of noughts and crosses – “Player” process started • Player process communicate via shared memory object, taking turns to make moves in game until game is won or drawn – Shell command used to remove shared memory object when it’s no longer required 280 Processes: inter‐process communication Shared memory object • Some limitations! – You create the object specifying its size in bytes – When using the object, you “map it into your address space” – and get a void* pointer – You have to type cast the void* pointer into a pointer to some form of struct – That struct should not contain any pointer field! • So don’t have member fields like string or vector (C++ STL classes) as these have embedded pointers • The struct can have fields that are simple built in types (int, float, char, double), or fixed size arrays of simple types. 281 Processes: inter‐process communication Struct • For this example, the struct represents the state of a noughts & crosses game 282 Processes: inter‐process communication nabg 48 Setup program • Tasks 1. Create a key 2. Create a shared memory structure with that key, getting back identifier 3. Print identifier – it’s needed by other programs 4. Map into memory 5. Initialize the structure 283 Processes: inter‐process communication 284 File must exist; but isn’t really used for anything other than acting as part of data from which generate a unique key 285 Use ftok() to create the key (Can check for key in listing of shared memory segments managed by OS) 286 Create the shared memory object in OS, get it’s id (for input to other processes) and (just for interest) find where it was mapped into address space. 287 Type cast the void* pointer to a pointer to struct Initialize the fields in that struct 288 Can check its presence in among other shared memory objects nabg 49 Playing games • Player processes need to be co‐ordinated! – This is handled somewhat naively in this example • Field in shared memory struct identifies whose turn it is (nought‐player or cross‐player) • Processes co‐ordinate using scheme approximately ‐ – Repeat until game is won or drawn » Repeat • Check if its my turn – if so break from loop • Sleep a bit » Make my move and update shared struct » Change field so it’s now the other guy’s turn 289 Processes: inter‐process communication Wait for turn 290 291 • Signals: • Winning process will send a SIGUSR1 signal to losing process • Processes need to start by setting up a signal handler • Once signal handling set up, program 1. Prompts for identifier of shared memory segment 2. Maps structure into memory 3. Starts play 292 Signals are sent with the kill(,) call; the signal handling routine reports end of game and then “detaches” the shared memory segment. 293 Get identifier for the shared memory segment (id, not “key” used to generate it) Map it into own address space Type cast the pointer to the appropriate pointer type 294 Playstart – determine whether nought or cross player. Then loop until game finished wait for turn (claimGame()) user plays (showboard(), getmove()) check for wins or losses if not game over, change turn setting to other player Terminate game play Re‐initialize shared memory structure so that it can be used again nabg 50 295 Check status field in shared memory, maybe sleep a bit before another check 296 Game mechanics – nothing of interest, just for completeness 297 Game mechanics 298 Game mechanics Reset default state in shared memory structure (OS will look after it while we are away) Detach from this process 299 Mapped to different addresses • Set up program: 0x109805600 • Noughts player: 0x60244000 • Crosses player: 0x5ab4b000 300 Processes: inter‐process communication Text (i.e. code) Static Stack Text (i.e. code) Static Stack Virtual address space process 1 Virtual address space process 2 Real memory Shared memory segment nabg 51 Game over man 301 Tidy up 302 Remove shared memory segment from set managed by OS when no longer needed Shared memory & semaphores • You won’t see the like of them again in your undergraduate CS studies – too advanced. • Where would they be used for real? – Apache web‐server is one example • There is a controlling instance of the Apache web‐server (running as root) that organises the work of spawned Apache child processes (that run as www‐data) that do the actual work of handling page requests – These child Apaches may themselves be multi‐threaded • The control Apache co‐ordinates the activities of others through data structures held in shared memory 303 Processes: inter‐process communication Apache code • You can download and look at the code of something like Apache (not advised until you have progressed a bit further in your studies) • You will find – Code using semaphores in proc_mutex.c – Code using shared memory in shm.c 304 Processes: inter‐process communication Shared memory • Heavily used by OS related processes like a window manager ‐ 305 Just a few of the shared memory segments created for me (as listed by ipcs –m command) Processes: inter‐process communication Ports & sockets • The most commonly encountered inter‐ process communication mechanism • Created as part of Berkeley Unix ~1980 to connect to the new “Internet” – Primarily for communication between processes on different computers • Supporting both package oriented and stream oriented connections – In 1980, shared memory, fifos etc did not exist, so a variant of ports & sockets supplied for communication between local Unix processes 306 Processes: inter‐process communication nabg 52 Ports and sockets for inter‐process communication on one machine • In C code using sockets, the “socket()” function has an argument specifying the “address domain” of the socket – Could be for local communications (two processes on same machine) – AF_UNIX – Could be for processes on machines connected via the Internet – AF_INET • Socket library code can optimise a little if both processes on same machine – Messages don’t have to be packaged for Internet – Simply get copied by OS from one system buffer to another • It’s not uncommon – but probably AF_UNIX won’t turn up in any example that you run 307 Ports and sockets for inter‐process communication on one machine • You will frequently use the AF_INET setting even though your processes are running on the same machine – Simply a matter of convenience when doing port‐&‐ sockets exercises; saves you from having to be logged in on two different workstations – one for your “client” and one for your “server” • When doing web exercises you frequently run a web‐ browser on the same machine as your web‐server – Again, it’s just for convenience; avoids the need to use multiple machines 308 Ports & sockets • You will be meeting these a lot! – CSCI110 • When explaining the network connection in web – CSCI212 • Programming network communications in C++ – CSCI213 • Programming network communications in Java – Java uses wrapper classes to hide some of the messier details – Also touched on in security and distributed systems subjects 309 Processes: inter‐process communication Network communications • Early networks (1960s) – – Computers connected by dedicated “phone” links with slow speed serial I/O • Link allocated to specific programs using some OS call • Programs on the two connected computers using their own application defined protocol – Messages typically something like: » Start of message marker (“stx” character) » Two (or more) bytes encoding message length » Bytes of message (7‐bit ascii code with 1‐bit parity) » Two or four bytes for a checksum » End of message (“etx” character) • Communications links unreliable, code needs to parity check each character and confirm checksums; frequent need to throw away corrupted message and ask for re‐transmission. – “ack” acknowledge character (message received ok) “nak” (message bad) 310 Processes: inter‐process communication Early usage – things like Sabre airline seat reservation system, banking, … Modern “packet switched” networks 311 ARPAnet ‐ 1969 Modern “packet switched” networks • Modern networking technology evolved from ARPAnet (1969) • ARPAnet – Not created to withstand nuclear war (urban myth) – It was created to allow access to ARPA funded computer centres by ARPA funded researchers at universities and research institutes all over US • By late 1960s, campus wide “timeshare” systems were relatively common; but ARPA needed something better. • ARPA was funding a few large timeshare systems that they needed to make available to widely dispersed collaborative research groups – a kind of remote login 312 Processes: inter‐process communication ARPA – Advanced Research Projects Agency nabg 53 Sigma IBM “stretch” PDP10 PDP6 Multics IBM 360/75 A main ARPA computer Interface Message Processor (packet switcher / router) TIP - terminal controller TTYstipImp (router) ARPA • Initially, ARPA network supported only one program – an early version of “telnet” – telnet (modern version still used a little) allows you to start a login session on a remote computer • Once started you are logged in, using an appropriate shell program, and can create and edit files, run programs etc. • Arrangement: – Your teletype terminal (TTY) connected to a low powered “terminal interface processor” (tip) • tip was located on your campus, would have supported maybe a dozen terminals – tip connected onto an interface message processor (imp) • imp ~ modern “router”, interconnected to other imps that form the network; forwards your messages, and returns response to your tip – Timeshare computers 314 Processes: inter‐process communication ARPAnet – early “Telnet” days • On timeshare systems, it was common to use a terminal concentrator (small, low powered computer) that looked after many ttys, so connection to “imp” looked simply like another group of local terminals each running shell sessions 315 Processes: inter‐process communication Data packets routed through network • ARPAnet used data packets – Packet – a little struct, something like this (for packet from terminal to timeshare server computer) • ID of server computer (timeshare machine) (1..254; one byte) • ID of client (identifying TIP & TTY terminal) • Byte count • Data bytes • Some validation data (checksum or something more sophisticated) – Original packets from terminal had only one data byte, responses from server typically bytes for one line of output – TIP composed request packets from TTY and forwarded them, sent response lines character‐by‐character to TTY 316 Processes: inter‐process communication Redundant links • Ideally, there were to be multiple routes between an imp attached to a tip and any of the host computers – (Not so that ARPAnet could survive nuclear attack¶) – Links in those days were much less reliable than modern technology • Essentially telephone lines with modem like connections – Easily disrupted by electronic noise (e.g. an electric train running past the phone cable) – Easily destroyed by lightning striking the cable (or anywhere on ground close to modem at end of cable) – Often cut (backhoes, poles knocked down) – And data traffic on links often bursty – a specific link temporarily overloaded, so find a more circuitous route that is in less demand. 317 Processes: inter‐process communication ¶ARPAnet was being used by groups developing software like “algebra systems” – a forerunner of the modern Mathematica program; this research wouldn’t be a priority in event of nuclear war! “imp” (router) software • Imp’s used sophisticated algorithms for managing packet delivery – At regular intervals, imps exchanged data about their links and amount of data traffic on different links – So each imp could maintain a kind of map of the network and use this, along with information about current data traffic, to work out the best route for a packet • Imp would check each packet received – if not valid, packet would be discarded • Packets could also be discarded if imp too busy and didn’t have space to store packet in memory for forwarding 318 Processes: inter‐process communication nabg 54 Delayed acknowledgement • Of course, cannot simply lose packets! • Each packet sent would have to be acknowledged – to make sure data transfers were complete and valid • But at the network speeds used in 1969 it would take several seconds for a packet to cross USA and an acknowledgement to return – (and remember, sending 1‐character data packets from tty) • So software incorporated mechanisms that allowed for acknowledgements to be delayed 319 Processes: inter‐process communication • Sender – send 1 – send 2 – send 3 – send 4 – ‐ – ‐ – send 5 – send 6 – send 7 – send 8 – send 9 – send 4 again • Receiver – - – - – receive & acknowledge 1 – receive & acknowledge 3 – receive & acknowledge 2 – - – - – receive & acknowledge 5 – receive & acknowledge 6 – receive & acknowledge 7 ? Early scheme! These transmit windows are a lot more elaborate these days! Out of order packets? • The software in tip and time‐share computer dealt with issues like – re‐transmitting packets that weren’t acknowledged – re‐ordering packets so that the data was in the correct order before being given to a program or being used to print lines on the tty 321 Processes: inter‐process communication ARPAnet ‐ growing • Immediately successful • Lots of other US universities and research institutes sought to attach their main research machines to net 322 Processes: inter‐process communication Dec 1970 Sep 1973 Jul 1977 ARPAnet – becoming more sophisticated • Not just telnet access. • Communications protocols (protocol = set of rules) made more sophisticated – Low level protocols used by imps to route packages through network – Higher level protocols so that can have multiple services • telnet from imp to time‐share machine • File transfer protocol (ftp) for time‐share machine to time‐share machine • Emails 323 Processes: inter‐process communication ARPAnet – email & news • Email systems had earlier been developed for campus wide time‐share systems – Essentially just special files (‘mailboxes’); one in among each user’s set of files – Mail program added message to recipient’s mailbox file – Viewer program displayed contents of mailbox • By ~1973, ARPAnet wide email – making it easier for distributed research groups to collaborate on research projects • “Newsgroups” – Started (~1977) as a form of multi‐recipient mail‐outs – Forerunner of today’s blogs and bulletin boards 324 Processes: inter‐process communication nabg 55 Other changes in 1970s – more computers • Cheaper computers – Smaller time‐share systems like PDP‐10 • Not as powerful, and nowhere as costly as Multics, IBM370/168, … –Mini‐computers like PDP‐11 • Result of new computers –Most campuses had many computers, not just one single large time‐share system 325 Processes: inter‐process communication Other changes in 1970s – local networks • Lots of small PDP‐11s, Data‐General Novas etc – none having a decent sized disk, and none having printers, plotters etc • Solution – – Share the scarce resources – Link the little computers together on a local network and have one machine more richly endowed with peripheral devices • Add some software that lets computers share devices – – File manager for shared disk – Something like “spooler” to handle printer output • This really was the motivation for a variety of local networking technologies that appeared in mid‐1970s 326 Processes: inter‐process communication Ring‐networks, ethernet • Variety of technologies – “token ring”, ethernet, … – Ethernet survives in modified form as basis of most modern local networks 327 Processes: inter‐process communication Shared disk Shared printer Token ring Messages pass around ring from computer to computer Ethernet Messages broadcast on cable minicomputer By ~1980 • ARPAnet needs to be replaced – Original scheme, and network addressing, assumes a small number of time‐shared computers – Now campuses have local networks with many machines • Replacement for ARPAnet will have to link up these local networks and provide access to specific machine on these local networks • Need for an Internet – a network of networks 328 Processes: inter‐process communication Internet • An Internet address – Network identifier, computer identifier • Layered software for communications – Lowest level (generally ignored) (“Data layer”) • The device drivers that work the network interface cards to send & receive bits on the network – Packets are just blocks of bits – The Internet Protocol level (“Network layer”) • The software (in the routers – previously known as imps) that move packets from source machine to destination machine – Communications level (“Transport layer”) • Explicitly send and receive packets – or create the appearance of input and output data streams – Applications level • What messages do the applications want to send. 329 Processes: inter‐process communication IP addresses • As of ~1980, 32‐bit address – In principle, 32‐bits allowed ~4000000000 computers on internet; – In practice, it’s less. • Some bits serve as a network identifier • Remaining bits serve to identify computer in that local network • Allocation of bits for network and bits for computer has been refined over time – In subsequent subjects, you will be introduced to original scheme with its A, B, and C network classes – because that scheme is easier to understand than the subsequently refined “classless internet domain” addressing 330 Processes: inter‐process communication Currently, Internet is being changed over from using these 32‐bit addresses (IP‐V4) to 128‐bit addresses (IP‐V6) so as to allow for more devices. Examples in these notes will continue to use IP‐V4 addresses – they are easier to understand than IP‐V6, though the principles remain the same. nabg 56 Your network identifier • Network identifiers (in modern terms, “domain addresses”) are allocated by a central authority – Originally, Stanford Research Institute – Nowadays, IANA – the Internet Assigned Numbers Authority • IANA delegates allocation to regional authorities who further delegate responsibility to internet service providers 331 Processes: inter‐process communication Your computer identifier • Allocated by the system’s administrator for your domain – For your PC at home, it’s effectively been allocated an identifier by sys‐admin of your ISP • Well, there are complications if you a running a little home network – learn about things like NAT (Network Address Translation) and the 192.168 network in some future subject – For computers on campus like UOW: • UOW got a network address from regional authority • Local systems administrators choose how to assign computer identifiers to individual machines – Lots of refinements: dynamic allocation, sub‐nets, … – Learn about these in later subjects 332 Processes: inter‐process communication IP addresses • Address in a large domain e.g. UOW • Address in a small domain 333 Processes: inter‐process communication These bits make up the network number (domain address) These bits identify a sub‐net & a computer number These bits make up the network number (domain address) These bits identify a computer Dotted decimal notation • Network addresses may have to be listed in documents, and have to be entered into programs etc – Certainly wouldn’t want to enter data in binary 0110010110100111001111001011 too much opportunity for error! – Convention arose that you represented the value as a sequence of four decimal values each in range 0..255 (equivalent to 8‐bits of the address) so might represent an address by a sequence like 130.130.64.1 – This dotted decimal notation is simply a convenience; you cannot generally place any interpretation on the decimal values 334 Processes: inter‐process communication IP addresses – “domain” addresses • The network part of an IP address is a “domain address” – UOW is a domain – Amazon is a domain – YouTube is a domain – (your ISP is a domain – or has a domain address for the connections that it provides to individual clients like you) – … • Domains addresses – responsibility of IANA or delegate • Domains have administrators who are responsible for the computer addresses within the domain (sub‐net/computer, or just computer) 335 Processes: inter‐process communication Getting the address of a computer • Initially (Internet up to ~1984) there was a simple list of hostnames and IP addresses available from Stanford Research Institute (who then administered net) • That didn’t scale, so the “Domain Name System” (DNS) was created – DNS is an address lookup system • Really has two stage 1. Get in contact with an authoritative “name server” computer in the domain that has the computer you want 2. Ask that “name server” computer for the complete IP address 336 Processes: inter‐process communication nabg 57 DNS • DNS relies on “name server” processes running on Internet computers. – On Unix/Linux, this would be the “named” daemon process – (Will subsequently refer to “name server computers”; a computer running “named” can frequently run other work as well) • A name server computer has two tasks – It acts as an intermediary for processes running on any of the machines in the same domain; it’s the name server computer that works with the other name servers that make up DNS and which actually finds out addresses for machines that you want to contact – www.amazon.com, new.bbc.co.uk, … – It responds to requests from outside its domain – outsiders asking for the addresses of specific machines within the domain 337 Processes: inter‐process communication 338 Processes: inter‐process communication The Internet Computer running “named” Network gatewayUOW domain Amazon domain Some other domain in .com As well as having a name server for the domain, here there is a name server for “.com” Domain DNS name‐servers • Systems administrator for a domain configures the local “named” server – It’s here that you get the tables with the data that define the individual computer numbers for the complete IP addresses of machines with the domain • But a name server program, running in say UOW, that is trying to help find the address of e.g. www.ibm.com will have to be able to find the address of the IBM computer that is running IBM’s name service 339 Processes: inter‐process communication DNS naming hierarchy • The DNS system resolved the issue of how to find the name servers for specific domains by using a hierarchical naming system. 340 Processes: inter‐process communication . .edu .com .net .gov .mil .org .int .arpa .hp .ibm .oracle .mit .yale .usc The original ~1984 hierarchy DNS domain hierarchy • Domains (i.e. local networks belonging to organisation) were grouped by organisation type – .edu • Universities – .com • Commercial companies – .net • Internet organisation itself – .gov • US government research laboratories working with the original Internet – .org • Non‐commercial organisations – e.g IEEE, ACM, … • And a “root” level for the tree structured hierarchy “.” 341 Processes: inter‐process communication DNS domain hierarchy • Root and “top‐level” domains – All universities (MIT, Yale, USC, …) in the .edu domain contribute to the costs of running a .edu domain name server • This .edu nameserver knows the IP addresses for the MIT nameserver, the Yale nameserver, … – All commercial organisations (IBM, HP, Oracle, …) in .com contribute to the costs of running a .com name server • This .com nameserver knows the IP addresses for the IBM nameserver, the Oracle nameserver, … – Everyone contributes to the cost of a root nameserver • This “.” nameserver knows the IP addresses for the .com nameserver, the .edu nameserver etc 342 Processes: inter‐process communication Actually, all nameservers are duplicated; this avoids parts of the network becoming out of reach if a nameserver machine goes down nabg 58 Using the hierarchy when looking up domain names ‐ 1 • Assume for example, that the MIT nameserver machine has just been re‐initialized – It can read its own files to create data defining the IP addresses of all machines in mit.edu – But the only address outside mit.edu that it will know is that of the root nameserver (“.”) • This (actually these as there are multiple copies) runs at a fixed IP address that is widely published – Root nameservers have included : 192.228.79.201, 198.41.0. 343 Processes: inter‐process communication Using the hierarchy when looking up domain names ‐ 2 • A user in mit.edu runs a program that wants to contact dev3.ibm.com • Mit.edu nameserver takes on responsibility for finding this address – It contacts “.” nameserver and asks for address of a .com nameserver – It contacts the .com nameserver and asks for the address of an ibm.com nameserver – It contacts the ibm.com namserver and asks for the name of the computer called dev3 in the ibm.com domain – It returns this IP address to the users program • The user’s program now has the IP address for a server running a program that it wants to contact (e.g. an ftp server) 344 Processes: inter‐process communication Caching DNS results • A nameserver machine keeps a cache of domain names and addresses of their “name server” computers – In example, the mit.edu nameserver now knows the address of a .com nameserver and the ibm.com nameserver • If subsequently asked for hp.com, it goes directly to the .com nameserver; it no longer has to go to “.” • If subsequently asked for sales.ibm.com, it goes directly to the ibm.com nameserver. 345 Processes: inter‐process communication DNS lookup: a complex procedure • Resolving a DNS name is a complex procedure involving many server machines exchanging messages • But for a programmer, it’s been made easy! • The C libraries include a function “gethostbyname” that handles the lookup (or getnameinfo() in some Linuxes) – Check in local /etc/hosts file in case the machine name (or domain name) is listed there (it’s an efficiency hack for addresses you often need) – If not in /etc/hosts, contact the domain name server for the local domain and ask it to resolve the address • Local domain name server checks its cache, and only goes through the long lookup procedure if necessary 346 Processes: inter‐process communication You may remember setting up a new Windows system; at one point you would have been asked for the IP address of your local name server – this would have been listed in the documentation for your ISP connection Once you have the IP address … • Once you have the IP address of the machine that you wish to contact, you can start transmitting packets 347 Processes: inter‐process communication Address lookup can take several seconds if there are no relevant cached data. It’s largely wait time; waiting for all those message exchanges amongst the nameservers. IP packets • Packets sent across internet have structure like: 348 Processes: inter‐process communication The IP addresses An IP‐V4 packet could have up to 64Kbytes of data. On modern main links of Internet IP‐V6, much larger packets may get used nabg 59 Routing packets • The routers that make up the network get the destination address for each arriving packet from its header, and work out which neighbouring router should receive it next – Internet is now too large for routers to have maps for all addresses – Instead routers use rules based that take account of how blocks of addresses have been allocated • E.g. – IANA allocated to ARIN a block of 2million addresses 208.128.0.0/11 (11 bits of those 32 bits) identifying the block – ARIN (which looks after US & Canada) could have allocated the block 208.130.28.0/22 (a block of 1000 addresses that forms part of the 208.128.0.0/11 group) to a particular company – A router in Australia seeing 208.130.28.17 would know just to send the packet to Canada, another router in Canada would have more local information and could forward it to actual network entry point for company 349 Processes: inter‐process communication Routing packets • You can learn more at wikipedia. • “Link‐state” algorithms – Bit like the scheme in the original small ARPAnet • “When applying link‐state algorithms, a graphical map of the network is the fundamental data used for each node. To produce its map, each node floods the entire network with information about the other nodes it can connect to. Each node then independently assembles this information into a map. Using this map, each router independently determines the least‐cost path from itself to every other node using a standard shortest paths algorithm such as Dijkstra's algorithm. The result is a tree graph rooted at the current node, such that the path through the tree from the root to any other node is the least‐cost path to that node. This tree then serves to construct the routing table, which specifies the best next hop to get from the current node to any other node.” – Sometimes used as basis for routing within a domain 350 Processes: inter‐process communication Routing packets • “Path vector” algorithms for inter‐domain routing – Individual nodes in domains act as “speakers”, only speaker nodes communicate routing data 351 Processes: inter‐process communication IP layer TCP & UDP layer • IP software, now in the routers and switches, looks after transfer of single packets of data between machines. • But we want communication between specific programs running on those machines. – Two higher level protocols were developed during the later years of ARPAnet – these protocols were adapted for the Internet and still form the basis for most traffic • TCP & UDP 352 Processes: inter‐process communication Program to program communication • A mechanism for defining program to program communication became necessary when ARPAnet needed to support more than just telnet – file transfer, mail, … • The mechanism chosen involved additional structure in the packets sent across the network – A subordinate “protocol” header that was added immediately after the IP header • Information in this protocol header section identifies the programs that are communicating and gives details of how they will exchange messages 353 Processes: inter‐process communication IP header Protocol header Application data IP packet structure (for ARPAnet > ~1971, and modern Internet) Identifying programs – really entry points for communications • The concept of port (from a Latin word for door) was introduced – A port would be a data structured managed by the OS • An identifying number – Other related data – like the process id of the process that has been allocated this port • One or more buffers (character arrays) for storing incoming data • One or more buffers for storing outgoing data • Possibly a queue of pending connection requests 354 Processes: inter‐process communication nabg 60 Server ports • A server program, such as ftp, would ask the local OS to reserve one of the ports (ftp actually uses ports 20 and 21) – Published details of the service would include its standard port number • The original services like ftp, echo, … are said to have “well known port numbers” – the port numbers that these programs use were fixed almost half a century ago • The server program would then make a request to the OS to activate this port – i.e. prepare it to receive connection requests. 355 Processes: inter‐process communication Server ports • Once the server port had been activated, the server program would make a blocking OS call – Accept connection • Server process marked as blocked waiting for connection • … • Eventually, a client would connect. OS wakens the server providing – Details of client (such as client’s IP address just in case server program is interested) – Newly allocated input and output buffers that can be used to data exchange between server program and client process 356 Processes: inter‐process communication Client ports • A client program has to be allocated a port number – It’s needed to route responses back to the correct program • Client code simply asks its local OS to allocate a temporary port number while it is setting up a connection to a server (with known IP address and ‘well known port’) 357 Processes: inter‐process communication User datagram protocol UDP • This simple protocol allows a “client” to send a single package of data to a server – Hopefully, server will respond, if simply with an acknowledgement of submitted data • But UDP protocol doesn’t provide any guarantee of delivery (sometimes known as ‘unreliable datagram protocol’) • Used for simple services like “ping” and for programs where occasional loss of a packet won’t matter – E.g. some “voice over IP” services – loss of one packet of digitized voice shows as “poor quality phone line” 358 Processes: inter‐process communication UDP header – in data block following IP packet header Port numbers UDP – single block of data transferred Transmission Control Protocol TCP • TCP inherited all the flow control code from the original Telnet/IP protocol of the original ARPAnet • Now a very sophisticated protocol that – Completely hides packetized nature of communications • Client and server treat connection much as if it were a read/write file; always accessible – Deals with all problems like missed packets and out‐ of‐sequence packets – Handles flow control • Application program using TCP doesn’t have to send “slow down” messages – such things are all in the library code 359 Processes: inter‐process communication TCP • Package structure more complex – Things like sequence number/acknowledgement number are there to deal with packets getting lost, or arriving out of sequence (library code rearranges them so that application sees valid data stream) – “Urgent” pointer – with things like “slow down” messages 360 Processes: inter‐process communication Port numbers nabg 61 Secure Socket Layer • When in mid 1990s TCP/IP started being used for commercial transactions across the growing World Wide Web, an extra layer was added – Encryption • Negotiated between client (e.g. browser) and server (e.g. Apache web server) immediately after TCP/IP connection established – Sequence of standard messages across new connection • Data bytes are encrypted once encryption mechanism established • Learn about this aspect in future security subjects 361 Processes: inter‐process communication Client‐server using TCP/IP • Why would you want to contact a remote computer? – It’s almost always a matter of having a server that owns a data collection, and a client that needs to access that data collection • Retrieve data, submit additional data – Just think web • Usually retrieving data – Google, tell me … – Amazon, what have you got for sale … – Youtube, show me • Adding data – Facebook, add this posting about my fascinating life, opinions, … 362 Processes: inter‐process communication Passive server Active client • Server – Starts up when host OS restarted • (Our simple demo servers will be started by user command) – Prepares “well known port” – Waits for client(s) – Responds to clients when they connect • Client – Starts on user command – Establishes connection to server – Transmits command entered by user through to server program – Displays server response Processes: inter‐process communication Ports & sockets example • Data service – Server has a data collection with information about “countries” • Country ~ just a struct with fields like country name, area, population, …; country code acts as unique id (e.g. ru, cn, us, br, …) – Client connects • Loops – User enters country code – Request sent, response received, country details displayed 364 Processes: inter‐process communication Countries service • Server logs client connections and requests • Client usage 365 Countries ‐ projects • TCPServer and TCPClient applications – Both Linux applications, for demo client also run on Windows using cygwin to emulate Linux environment 366 Processes: inter‐process communication nabg 62 Some structs 367 Processes: inter‐process communication Struct declarations shared by projects; Structs “request” and “response” used for communications “country” used only in server Data 368 Processes: inter‐process communication Part of server, a small table of data about countries (top 10 by area) (try the CIA factbook on the web if you really want some data) Finding country (part of server code) 369 Processes: inter‐process communication Server structure 370 Processes: inter‐process communication Lots of #include statements! The struct and function declarations for networking have become widely scattered. (sockets interface has variants for different protocols – e.g. some radio protocol, and the long outdated Appletalk – and so is separate from the TCP/IP specific functions etc) Setting up a server 371 Processes: inter‐process communication Create a socket ~ an input/output file‐descriptor Create data structure describing how we want OS to set up port – note the htonl() & htons() functions, these convert integers to standard “network representation” (remember big‐endian and little endian from discussion of data?) Bind ~ essentially lay claim to chosen OS port listen ~ activate port to receive clients Handling clients – one at a time 372 Processes: inter‐process communication Accept call blocks until a client connects; get back info about client and a new file‐descriptor attached to input & output buffers nabg 63 Handling a request 373 Processes: inter‐process communication Receive request; Will get a zero length request if client has dropped the TCP/IP connection – this terminates client handling loop. Generate response Send response Server runs 374 Processes: inter‐process communication Client 375 Processes: inter‐process communication Establish connection • Use DNS lookup to convert server name to IP address – For simplicity, example assumes IP‐IV and uses the older (thread Unsafe) functions • Open a socket (~ input/output file descriptor) • Build data structure describing server • Ask OS to connect using information in data structure and the socket created – OS allocates an unused port for use in client • If connection succeeds, will be able to write requests and read responses on socket 376 Processes: inter‐process communication Client: establish connection 377 Processes: inter‐process communication DNS lookup! Create socket ~ input/output stream Build struct describing server Connect Client – command loop • Client applications are almost always interactive with a user entering requests and reviewing responses – So, a simple loop getting user commands until “quit” command entered • For each different command (there is only one – “get country data” ‐in this example) – Send appropriate request to server – Wait for response – Display response 378 Processes: inter‐process communication nabg 64 379 Processes: inter‐process communication Forever loop – exited when quit command entered 380 Processes: inter‐process communication Send request Receive response Display response Details, details • Note how code receiving response loops until it has got all the bytes expected (advantage of using fixed size structs in communications) • It’s overkill in this little example – but necessary in practice • TCP/IP could in principle deliver a small packet with only part of a response; need to wait until all data arrived before starting to process data – (Didn’t bother in server, requests are < 4 bytes; they’ll never be split into multiple packets!) 381 Processes: inter‐process communication Client runs 382 Processes: inter‐process communication Ports, sockets, & you • You will get several ports & sockets exercises – this subject, CSCI212, possibly CSCI204, … – Large parts are “cut‐&‐paste” • Server setup code • Client establish connection • Send and receive a struct defined for application protocol • Useful to work with ports & sockets • But don’t use this level of code in a real application! • There are many very good C++ class libraries (Qt‐ networking, boost, …) that provide a much higher level abstraction for network requests to a server – exploit these libraries and you will build better applications more quickly. 383 Processes: inter‐process communication Shell commands related to ports & sockets • Because ports & sockets are so widely used, there are several shell commands that can be used to check on activity, including – tcpdump • Look at the data traffic – netstat • Display information about socket connections – nc • Simplified client or server (!) for testing and scripted actions, port scannning, … – ss • Socket information – … 384 Processes: inter‐process communication Some of these shell commands require sudo privileges nabg 65 tcpdump • tcpdump –A host 130.130.64.204 and tcp – Show all tcp traffic from 130.130.64.204, and list contents as text • (it’s the client asking for country dz, and reply Algeria) 385 Processes: inter‐process communication netstat • Netstat can provide information about ports in use: – Show sockets • That are listening for connections • Numerical addresses • Include program pid and name • Using tcp or udp 386 Processes: inter‐process communication My server netstat • Ask netstat for more – Discover a huge number of Unix sockets! • (AF_UNIX connections between local processes using ports and sockets) – Told you that this usage quite common • These belong to OS processes 387 Processes: inter‐process communication tcpdump, netstat and you • You will find these commands useful when you start programming with ports and sockets – You can check that you are opening ports correctly – You can monitor the actual message exchanges when first testing your application 388 Lots of TCP/IP servers • Computers are often configured with a large number of TCP/IP servers that are started when the OS has initialized the machine, and which run “forever” – Mail forwarder – Apache web server – MySQL database server – Memcache/redis or other caching service – NFS (network file system) processes – … • Some of these servers will involve single processes, others may start many – E.g. a “root” Apache and several www‐data “Apache” processes (Same code, different “users”) • Lots of processes – lots of space taken up in memory 389 Processes: inter‐process communication inetd • Some server processes are useful – but don’t have high demand. It’s better if these servers are only started when needed, and are configured so that they will shutdown if idle for more than a certain time. • inetd is a system’s process that can manage such TCP/IP servers – Has a “database” with details of servers that it manages – Can start server on demand 390 Processes: inter‐process communication nabg 66 inetd • inetd “database” – Port number – Program that should run if a request arrives at that port and there is no process currently bound to that port • System’s administrator edits the database file recording details of the less used servers and their ‘well known ports’ • inetd daemon process started; registers its interest in the ports it knows about; sleeps – A request at any of those ports will result in OS awakening inetd – inetd forks and execs required program and hands over port 391 Processes: inter‐process communication OS – 3 ‐ processes • The OS must provide for multiple processes (multi‐tasking) – This has several aspects • Creation (and termination) of processes • Communication between processes • Scheduling of processes • Allocation of main memory to processes • Management of main memory and 2nd‐ry memory (virtual memory systems) – The creation of processes is apparent through the OS API and forms a part of many applications (e.g. the shell – almost all the shell commands involve the creation of a short‐lived process); the other aspects are mainly of concern to those implementing the OS • (though occasionally a systems administrator might want to intervene – changing process priorities, terminating processes etc) 392 Processes (still) • What else? – Scheduling –Memory management – Virtual memory • These topics will be significant elements in subsequent OS related subjects – CSCI212 and CSCI330 393 Scheduling • Look at all those processes – They all need CPU time 394 Process scheduling OS task for scheduling • Processes have differing CPU needs – Long compute – High priority handling of I/O traffic – Regular small amounts of processing • Many of the OS’s own processes (those parts of the OS that run as user processes – print spool, chron, … ‐ or as privileged processes (user=root) need to run at regular intervals) – Programs with lots of interactive I/O to terminal • The OS has to allocate time on the CPU (or a CPU in now common symmetric multi‐processing systems) so as to satisfy the varying demands of the processes. 395 Process scheduling When to schedule? • Enter kernel of OS whenever there is an interrupt from an I/O device, or a supervisor call (svc); could consider re‐scheduling before return from every interrupt or svc – That would be too frequent – • Unnecessary time spent in OS examining process states. • Cost of switching process – Updating caches, memory mapping system etc • But OS cannot simply allow a current process to run until it releases CPU 396 Process scheduling nabg 67 Basic scheduling strategy • You will learn much more about scheduling strategies in later subjects • Typical basic scheduling strategy – “Round robin” with a time limit • OS keeps ready processes on a queue – Picks front process – Allocates CPU and sets timer – Process runs until either » It makes a blocking supervisor call (wait for something) • in which case OS moves it from “ready process queue” to an appropriate list of “waiting processes” » It uses up its time; the OS will put the process back on the end of the “ready process queue”; OS will then schedule next process from queue » OS handles an interrupt that results in a high‐priority process changing state from “waiting” to “ready to run” 397 Process scheduling Practical scheduling strategy • In practice, scheduling is more complex than simple “round robin” because some processes have priority – OS needs to maintain several different queues that correspond to differing priority levels • Will need to balance running of highest priority process against risk of “starving” a process – keeping it waiting in memory for long periods of time while other small tasks run • Within a given priority level, can use “round robin” allocation • Will need to use different time amounts for different levels – if running compute bound jobs, should allocate a larger time quantum • May move processes between different priority levels – If process has used all its time in last few occasions its been allocated CPU, it could get moved to a lower priority queue with large time quantum 398 Process scheduling Data structures for scheduling • In OS area, there will be a stack of queues (this diagram based on one by Bach illustrating Unix V5) 399 Process scheduling Swapper Waiting disk IO Waiting for buffer Waiting for Inode Waiting for TTY In Waiting for TTY Out Waiting for Child Exit User 0 User 1 User n Queues of waiting/ready processes at same priority levels Lower priorities (and often larger time quantum) If OS has handled an interrupt that changed state of one of these waiting processes, it will reschedule before return Memory • Major component in OS – keeping track of how user processes are mapped into real memory 400 Text (i.e. code) Static Stack Text (i.e. code) Static Stack Text (i.e. code) Static Stack Real memory Process 9234 Process 11236 Process 19483 (happens to be running same code as 11236 – OS has to notice this and arrange to share code) Virtual memory • But it isn’t simply a matter of mapping a process’s address space into real memory • With virtual memory, a process’s address space may be mapped partly into main memory and partly into blocks on a “swapping disk” • Current “Virtual memory” systems have evolved from the operating systems of the 1960s where the OS arranged to transparently swap blocks of bytes between real memory and a high speed disk or drum • In some cases, the objective was to make it seem as if the available memory was larger than it really was. • In other cases, it was more a matter of fitting more processes into memory so that the scheduler would have a better chance of keeping the CPU(s) occupied – Modern virtual memory systems are more motivated along these lines 401 Virtual memory • At any particular point in its execution, a typical process is accessing only a small part of its address space – The code of the particular group of functions implementing a data processing algorithm – A subset of its data • So you don’t need all a program’s address space mapped into real memory – just the sections currently in use. – The other parts can be held on a fast “swapping disk” 402 nabg 68 Pages (naïve version) • (Paging is widely utilized – but there are alternatives, take a specialist OS subject) • The OS conceptualizes memory as being made up of blocks of bytes referred to as “pages” – Page size will be same as disk block size on the swap disk, e.g. 4Kbytes • An address in a process’s virtual address space will be interpreted as a page number and a location in page – low order 12 bits would be used for location on page in a system using 4kbyte pages 403 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 Location on page Page number in processes address space 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 00 Mapped to this page of real memory 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 00 0 0 0 0 1 1 0 0 1 1 0 0 Actual address used: Paging (naïve version) • Paging requires extra hardware in the “decode address” part of the CPU 1. Address is generated (normal interpretation of hardware addressing mode as illustrated in assembly language exercises earlier) 2. “Location on page” part is separated out • For example, will assume a 32‐bit address space with 4kbyte pages; the bottom 12‐bits are the “location on page” 3. Page part to be mapped • There could be 1million pages! (For our example address and page size) • But the entire justification is that at any one time only a few are being used 4. Hardware will assist in mapping the few “in use” pages 404 Paging (naïve version) • Address translation hardware could have a set of ~16 paging registers – Paging registers have two parts • Page address (the 20‐bits of address) in process’s address space • A page frame address (again assume 20‐bits) – the location in real memory where that page of process has been placed • It’s an associative store – Kind of in parallel comparison of “page address in process address space” with each the page addresses in each of the 16 registers • If find entry, then get mapping address from page frame part 405 406 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Address from standard address decoding – page number and location on page Page number part Page mapping hardware (16 entries) Page (process address space) Page frame (real memory address) 0 1 15 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 1 1 0 Match! So this address But what if the page address not matched? • If the code – Calls another function not in current set of in memory pages – Or, accesses a data page not in current set of in memory pages • Then the page number will not be found in the paging register associative store and the address decoding unit cannot create the actual real memory address. • Get an interrupt – “Page Fault” 407 Page fault interrupt • OS handles this 1. Is the address in process space valid? • It might be a program error; check validity of address. If address is invalid, send a “SIGSEGV” signal to kill the process and schedule some other process to run. 2. It is a valid address. So, it means need a different page in memory; need to make space by removing one of existing pages for process; pick page for removal 3. Create two requests for transfers to swapper disk 1. Write out chosen existing page 2. Read in replacement page – the one that matches the address that caused the problem 4. Place process on “waiting for swap queue” 5. Schedule another process. 6. … 7. Eventually, interrupt from swap disk will tell OS than replacement page loaded; OS can reschedule the process that page faulted. 408 nabg 69 Paging A little more complex in practice … • Realistic paging algorithms, address mapping hardware, and supporting data structures, are a little more complex! – Process will typically have a few hundred pages in memory, many thousands more out on swap disk, but only 16 described in paging hardware – maybe don’t need to swap because the desired page is in memory; it’s just the entries in the page addressing hardware that need to be changed and then same process can continue. – If do need to swap pages to disk, there are lots of subtleties in algorithm used to pick best page to swap • Lot’s to learn in a future OS subject. 409 Devices 410 OS – 4 – Devices • The OS must manage transfers of data – The OS should almost entirely hide details of data transfers from the application level¶ • You certainly don’t want to have to write very different code for data transfers to terminals, data transfers to disks, and data transfers to networks – Transfers of data take time – and applications typically need to wait until transfers are complete • So OS needs to integrate device management and process management ¶Occasionally, an application may have a legitimate reason for changing how a device is handled – e.g. setting input into a non blocking mode (instead of using a blocking read, application issues a non‐blocking read – this immediately returns with whatever data are already available, maybe just a few bytes or none). The OS will need to provide a system call that lets an application request such adjustments to standard handling. 411 Device handling • Known only to a select few – The Linux gurus who write device drivers! • Device handling unlikely to be covered in any detail in any university subject – If you want to learn about this aspect, apprentice yourself to a Linux developer! • In Unix/Linux, have this approach where there is a uniform “file‐system interface” (open, close, read, write, ioctl, mount, unmount) that is supported by all device drivers 412 Drivers 413Diagram base on one by Bach – The Design of the Unix Operating System open close read write ioctl open close read write mount unmount Buffer cache calls open close read write ioctl Driver device interrupt handler open close strategy Driver device interrupt handler Interrupt Vector Interrupt Vector Character Device Switch Table Block Device Switch Table Devices on computer • All I/O devices appear as “files” in /dev – Few extra elements in directory entry such as • Major number ~ identifies driver • Minor number ~ identifies instance of device – Could have many terminals, they would all use same driver (major number) but have distinct minor numbers • System’s programs for managing devices (e.g. adding an entry to /dev when a flash drive plugged in) – Linux: udevd, libudev etc 414 nabg 70 OS – 5 – Startup and closedown mechanisms • When a computer is started, the OS must – Find out what disk systems are connected and map them into a structure that it can use to find files – Set up handlers for network connections (ability to respond to TCP/IP connection requests) – Start a selection of processes (e.g. a group of Apache servers, a MySQL server, …) – Set up a login handler – … • Traditionally, this was a leisurely process that started with an “init” program being loaded, this init would then read a script (“start X1; wait till X1 running; start X2; …”); it wasn’t fast, but then large time‐share systems only get restarted maybe once a week; laptop computers are being turned on and off frequently – and no‐one wants to wait 5..10 minutes for their system to be ready; consequently, startup mechanisms are now much more elaborate – with many tasks proceeding concurrently (with interlocks, e.g. MySQL needs the network up before it can complete its initialization) 415 Waking up your Linux system init, upstart etc Only of interest to systems administrators! 416 Administering your Linux • A standard Linux distro should have suitable defaults in its initialization scripts so that you should not need to change any aspect of initialization for a ‘vanilla’ system. • But as you add services ‐ – MySQL, Apache, Redis, MongoDB, … • it gets more likely that you will need to modify aspects of your system’s initialization – Your Linux system’s package manager will add start up script entries for any services associated with additional packages that you install, the problem usually is that third party packages like Redis etc are updated separately from your Linux and its package manager – frequently inconsistencies arise with regard to initialization, these inconsistencies can lead to the start up process stalling or individual services failing to start • And you often have to make minor edits to the configuration files of individual services to match the features that you prefer, e.g. – allow/disallow ‘user directories’ in Apache – Allow remote access to your MySQL server 417 Configuration files for applications • You will find application specific subdirectories inside /etc (mostly require sudo privileges to make changes) – These subdirectories contain the files with application specific configuration data – E.g. /etc has subdirectories • apache2 – Files and subdirectories apache2.conf, mods‐available, mods‐enabled (Modules incorporated into you web‐server) • mysql – conf.d, my.cnf, … • ssh • php5 – E.g. might have to edit some of the files in this subdirectory if you needed to add support for Oracle database (you need to set a path for the Oracle extension files that must be dynamically linked) 418 Initialization scripts • Traditionally, the init process was started once the kernel had been loaded, and it would follow scripts in /etc/ and /etc/init • Linux standard has several ‘levels’ defined representing different run configurations (Linux implementations don’t all implement all these distinct run levels): Run level Information 0 Halt 1 Single‐user text mode (without networking) 2 Multi‐user mode, but no networking 3 Full multi‐user text mode (normal) 4 Not used (user‐definable) 5 Same as 3 but with a display‐manager 6 Reboot 419 Initialization scripts • As systems administrator, you choose which ‘run level’ you wish to use for your Linux – 1 Single user (sys admin debugging) – 2 Isolated Linux – 3 or 5 – normal startup (text shells, or with display manager) – 4 locally defined setup with particular features • Init will run the appropriate scripts using data from directories like /etc/rc1.d, and scripts in /etc/init.d – The entries in rc[1‐5].d are links to the scripts in /etc/init.d that are to run, the entry names incorporate a numbering scheme that determines the order in which the scripts will run 420 nabg 71 Initialization scripts • You can read these shell scripts, e.g. the /etc/init.d/apche2 script – Checks configuration data in /etc/apache2/ – Checks that you have the executable installed! – (Other checks) – Checks to see if there is already an Apache tribe running – All being well, it starts the httpd daemon process 421 upstart • The trouble with init is that it is slow – It runs a long sequence of shell commands • Many of these involve creating a subprocess that does some checking and returns a status result • Init must wait each time for the result of the sub‐process – It must interrogate all disks, to find partitions and incorporate all file systems that it finds into the file hierarchy – lots of waiting for disks – When setting up network connections, it will need to converse with other machines – lots of waiting for network traffic • So, post 2000, increasing interest in alternatives – ‘upstart’ (~2006) becoming possibly most popular of these 422 Upstart ‘an event driven program’ • Upstart (which is probably what your Linux system will use) uses .conf files in /etc/init to hold data about the various services that must be started. • In principle, it can start lots of sub‐processes, each running one of the .conf scripts that it finds in /etc/init • Since there are interdependencies among services, some sub‐ processes must complete (or partially complete) before others can start • Upstart uses ‘events’ to manage synchronization of its sub‐ processes 423 Upstart events • As they complete various actions, scripts place ‘event’ records in upstart’s control data structure • The scripts used to start services specify what events they need before proceeding with a next step 424 Upstart example • Part of the upstart script for ssh.conf (which will start secure shell remote login daemon) is start on filesystem stop on runlevel [!2345] respawn • It can start once the ‘filesystem event’ has been posted (i.e. the disks have been interrogated and init has built the OS’s model for the file‐hierarchy), it stops if change ‘runlevel’ to 6 ‘reboot’ 425