CMPSC 311, Introduction to Sockets CMPSC 311, Introduction to Systems Programming Introduction to Sockets Reading CS:APP Sec. 1.8, Systems Communicate with Other Systems Using Networks Ch. 11, Network Programming, Intro Sec. 11.1, The Client-Server Programming Model Sec. 11.2, Networks Sec. 11.3, The Global IP Internet Sec. 11.4, The Sockets Interface APUE, Ch. 16 Ethernet Alliance ICANN = Internet Corporation For Assigned Names and Numbers IANA = Internet Assigned Numbers Authority Port Numbers List of Internet top-level domains Internet Systems Consortium, Internet Domain Survey The Cooperative Association for Internet Data Analysis Network Programming with Sockets Client-Server Model IP Networks Sockets Interface functions - socket, connect, bind, listen, accept A simple point-to-point model process --- host --- connection --- host --- process Client-Server Model for application design Client (many, ephemeral) Server (one, permanent) these are processes, not specific machines Resource (managed by the server) Service Transaction Request (client to server) Action (server) access the resource, perhaps modify it Response (server to client) Receipt (client) Server design initialize loop forever wait for client request take action respond to client cleanup Client design initialize send request to server wait for server to respond use response Networks - host, connector, "wire" a Local Area Network is constructed with one type of "wire" and one set of communication hardware and protocols for example, Ethernet with hubs and bridges a Wide Area Network is constructed from multiple Local Area Networks similar principles, everything is compatible but a larger geographic area an internet is constructed from multiple LANs and WANs, without requiring them to be directly compatible routers are used for the interconnections the routers are responsible for translating data and protocols between incompatible networks IP Network - a specific way to identify hosts and exchange data IP Internet - same but more general layered connectors, host --- hub --- bridge --- router layered protocol software use software abstractions to cover up the differences between physical networks Protocol = rules for exchanging data what to expect next? how to identify what arrives? Naming scheme internet addresses layering -- hosts are identified by internet addresses, which are uniform, not by LAN/WAN addresses, which are not this is an abstraction mechanism Delivery mechanism packet = header + payload header = source address, destination address, packet size, etc. payload = data sent from source to destination layering -- a packet at one level of the network can become the payload for a packet at the next level of the network, by adding another header this is a form of encapsulation Internet Protocol, building the Global IP Network in layers client or server code user code sockets interface system calls TCP/IP OS kernel code hardware interface interrupts network adapter hardware TCP = Transmission Control Protocol IP = Internet Protocol Recall, process --- host --- connection --- host --- process IP provides naming and delivery of packets (datagrams) host-to-host but, datagrams can be lost or duplicated UDP = Unreliable Datagram Protocol process-to-process TCP provides reliable delivery of packets process-to-process based on expectation, acknowledgement, retransmission if necessary process --- host --- connection --- host --- process The hosts might use different memory architectures. "Network Byte Order" = big-endian functions are provided to convert between host byte order (big- or little-endian) and network byte order (big-endian)
htonl (host to network, long int) htons (host to network, short int) ntohl, ntohs some of this can also be handled by the network adapter, in hardware IPv4 (Internet Protocol version 4), 32-bit IP address IPv6, 128-bit IP address A host is mapped to an IP address (numeric). An IP address is mapped to an Internet Domain Name (symbolic). Internet Address (IPv4) struct in_addr one field, unsigned int s_addr, in network byte order 130.203.16.27, etc. "dotted decimal notation" is used so people don't have to read hexadecimal hex 82cb101b --> "dotted hex" 82.cb.10.1b --> "dotted decimal" 130.203.16.27 Internet Domain Name abc.gov, etc. because no one really wants to mess with numbers ... that's why we have computers because applying some kind of hierarchy to the names helps to understand them DNS = Domain Name System Lookup Functions (old) gethostbyname(), gethostbyaddr() These are obsolete (removed from the 2008 Posix Standard), but easier to explain. #include struct hostent * gethostbyaddr(const void *addr, socklen_t len, int type); struct hostent * gethostbyname(const char *name); The return values are pointers to static data. struct hostent { name, domain name aliases, NULL-terminated array of domain names address type (AF_INET, address family, Internet) address length addresses, NULL-terminated array of struct in_addr * }; // Mac OS X struct hostent { char *h_name; /* official name of host */ char **h_aliases; /* alias list */ int h_addrtype; /* host address type */ int h_length; /* length of address */ char **h_addr_list; /* list of addresses from name server */ }; // example usage, but not complete struct in_addr addr; // internet address addr.s_addr = something; // unsigned int struct hostent *hostp; hostp = gethostbyaddr(&addr, sizeof(addr), AF_INET); printf("%s\n", hostp->h_name); Lookup Function (new) int getaddrinfo(const char * restrict nodename, const char * restrict servname, const struct addrinfo * restrict hints, struct addrinfo ** restrict res); nodename is either a valid host name or a numeric host address string consisting of a dotted decimal IPv4 address or an IPv6 address servname is either a decimal port number or a service name listed in services(5) hints is optional, information about the caller's socket; set to NULL for defaults If the call is successful, *res points to a linked list of addrinfo structures struct addrinfo { int ai_flags; /* input flags */ int ai_family; /* protocol family for socket */ int ai_socktype; /* socket type */ int ai_protocol; /* protocol for socket */ socklen_t ai_addrlen; /* length of socket-address */ struct sockaddr *ai_addr; /* socket-address for socket */ char *ai_canonname; /* canonical name for service location */ struct addrinfo *ai_next; /* pointer to next in list */ }; ai_family, ai_socktype, and ai_protocol can be used later in a call to socket(). For each addrinfo structure in the list, the ai_addr member points to a filled-in socket address structure of length ai_addrlen. struct sockaddr will be described shortly Internet Connections point-to-point, between processes full-duplex, data moves in both directions reliable, bytes arrive in the order in which they were sent, and are never lost or duplicated Socket = connection endpoint socket address = internet address (32 or 128 bits) + port number (16 bits) denoted by address:port a client's port number is assigned by the OS in response to a request, and is temporary an ephemeral port a server's port number is advertised and permanent a well-known port See /etc/services for a list of well-known ports (with web links to IANA) Connection = two socket addresses clientaddr:clientport, serveraddr:serverport Sockets overview, as a sequence of function calls Client action Server socket socket bind listen connect connection request ---> accept write service request ---> read read <--- service response write close EOF ---> read close client - read(), write() and close() using the return value from socket() server - read(), write() and close() using the return value from accept() After returning from accept(), the server could call fork() or pthread_create() to allow the transaction with this client to be handled independently of the other clients. After returning from close(), the "child server" would then call exit() or pthread_exit() to clean up this transaction. The "parent server" never terminates. Sockets The OS kernel sees an end-point for communication. The client and server programs see an open file. struct sockaddr, generic socket address, 16 bytes protocol family, 2 bytes, indicates meaning of additional data additional data, 14 bytes struct sockaddr_in, Internet-style socket address, 16 bytes address family, always AF_INET port number, 16 bits (a server's port numbers are advertised) IP address, 32 bits (IPv4) padding, 8 bytes #include #include int socket(int domain, int type, int protocol); simple usage: domain = AF_INET, type = SOCK_STREAM, protocol = 0 called by the client and by the server, as two independent actions returns a socket descriptor, partially opened, eventually used with read(), write(), etc., like a file descriptor from open() an "active socket", but it's not connected to anything yet int connect(int sockfd, struct sockaddr *server_addr, int addrlen); called by the client sockfd came from socket() wait for connection to server, or failure If successful, sockfd can now be used with read() and write() (the socket is connected). second arg, pass struct sockaddr_in * before call, set some fields with info from gethostbyname() after return, the remaining fields are set for the server third arg, pass sizeof(struct sockaddr_in) open_clientfd(), CS:APP Fig. 11.16 int bind(int sockfd, struct sockaddr *server_addr, int addrlen); called by the server sockfd came from socket() associates server's socket address with the socket descriptor int listen(int sockfd, int backlog); This call, by the server, distinguishes server from client. sockfd is now a "listening socket". Up to this point, it was an "active socket". It still isn't connected. backlog indicates how many outstanding requests to keep (pick a large number) open_listenfd(), CS:APP Fig. 11.17 int accept(int listenfd, struct sockaddr *addr, int addrlen); listenfd came from listen(), it's a "listening descriptor". Wait for a connection request, return a "connected descriptor" for use with read() and write(). The client's info is stored through addr. client read(), write() and close() using the return value from socket() after connect() has returned server read(), write() and close() using the return value from accept() after socket(), bind() and listen() have returned Summary Client create a socket by calling socket(), save return value as clientfd identify server by host address, port number associate server with clientfd by using connect() use clientfd with read(), write(), close() Server once only, create a socket with socket(), save return value as listenfd identify server by host address, port number associate server with listenfd by using bind() and listen() in a loop, call accept() with listenfd upon return, the server has a new client save return value as connfd use connfd with read(), write(), close() The point-to-point client-server socket connection is between clientfd (in the client) and connfd (in the server). This connection is ephemeral. Example, an iterative echo server From CS:APP Sec. 11.4.9 For the Rio functions, see Unix System-Level I/O Client echoclient.c Server echo.c echoserveri.c Example, a concurrent echo server based on processes From CS:APP Sec. 12.1.1 #include "csapp.h" void sigchld_handler(int sig) { while (waitpid(-1, 0, WNOHANG) > 0) ; return; } void echo(int connfd); // see above, echo.c int main(int argc, char *argv[]) { int listenfd, connfd, port; int clientlen = sizeof(struct sockaddr_in); struct sockaddr_in clientaddr; if (argc != 2) { fprintf(stderr, "usage: %s \n", argv[0]); exit(0); } port = atoi(argv[1]); Signal(SIGCHLD, sigchld_handler); listenfd = Open_listenfd(port); while (1) { connfd = Accept(listenfd, (SA *) &clientaddr, &clientlen); if (Fork() == 0) { Close(listenfd); /* Child closes its listening socket */ echo(connfd); /* Child services client */ Close(connfd); /* Child closes connection with client */ exit(0); /* Child exits */ } Close(connfd); /* Parent closes connected socket (important!) */ } } Example, a concurrent echo server based on threads From CS:APP Sec. 12.3.8 #include "csapp.h" void echo(int connfd); // see above, echo.c void *thread(void *vargp); int main(int argc, char *argv[]) { int listenfd, *connfdp, port; int clientlen = sizeof(struct sockaddr_in); struct sockaddr_in clientaddr; pthread_t tid; if (argc != 2) { fprintf(stderr, "usage: %s \n", argv[0]); exit(0); } port = atoi(argv[1]); listenfd = Open_listenfd(port); while (1) { connfdp = Malloc(sizeof(int)); *connfdp = Accept(listenfd, (SA *) &clientaddr, &clientlen); Pthread_create(&tid, NULL, thread, connfdp); } } /* thread routine */ void *thread(void *vargp) { int connfd = *((int *)vargp); Pthread_detach(pthread_self()); Free(vargp); echo(connfd); Close(connfd); return NULL; } Example, adapted from Mac OS X man page getaddrinfo(3) The following code tries to connect to host "www.kame.net" and service "http" via a stream socket. It loops through all the addresses available, regardless of address family. If the destination resolves to an IPv4 address, it will use an AF_INET socket. Similarly, if it resolves to IPv6, an AF_INET6 socket is used. Observe that there is no hardcoded reference to a particular address family. The code works even if getaddrinfo() returns addresses that are not IPv4/v6. struct addrinfo hints, *res; memset(&hints, 0, sizeof(hints)); hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; int error = getaddrinfo("www.kame.net", "http", &hints, &res); if (error) { errx(1, "%s", gai_strerror(error)); // error reporting and exit } int s = -1; const char *cause = NULL; for (struct addrinfo *p = res; p != NULL; p = p->ai_next) { s = socket(p->ai_family, p->ai_socktype, p->ai_protocol); if (s < 0) { cause = "socket"; continue; } if (connect(s, p->ai_addr, p->ai_addrlen) < 0) { cause = "connect"; close(s); s = -1; continue; } break; /* okay, we got one */ } if (s < 0) { err(1, "%s", cause); // error reporting and exit } freeaddrinfo(res); Last revised, 14 May 2012