Java程序辅导

C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor Email:itutor@qq.com
wx: cjtutor
QQ: 2653320439
4.5. TCP Socket Programming: HTTP — Computer Systems Fundamentals Contents Chapter 1    1.1. Introduction to Concurrent Systems    1.2. Systems and Models    1.3. Themes and Guiding Principles    1.4. System Architectures    1.5. State Models in UML    1.6. Sequence Models in UML    1.7. Extended Example: State Model Implementation Chapter 2    2.1. Processes and OS Basics    2.2. Processes and Multiprogramming    2.3. Kernel Mechanics    2.4. System Call Interface    2.5. Process Life Cycle    2.6. The UNIX File Abstraction    2.7. Events and Signals    2.8. Extended Example: Listing Files with Processes Chapter 3    3.1. Concurrency with IPC    3.2. IPC Models    3.3. Pipes and FIFOs    3.4. Shared Memory With Memory-mapped Files    3.5. POSIX vs. System V IPC    3.6. Message Passing With Message Queues    3.7. Shared Memory    3.8. Semaphores    3.9. Extended Example: Bash-lite: A Simple Command-line Shell Chapter 4    4.1. Networked Concurrency    4.2. The TCP/IP Internet Model    4.3. Network Applications and Protocols    4.4. The Socket Interface    4.5. TCP Socket Programming: HTTP    4.6. UDP Socket Programming: DNS    4.7. Application-Layer Broadcasting: DHCP    4.8. Extended Example: CGI Web Server Chapter 5    5.1. The Internet and Connectivity    5.2. Application Layer: Overlay Networks    5.3. Transport Layer    5.4. Network Security Fundamentals    5.5. Network Layer: IP    5.6. Link Layer    5.7. Wireless Connectivity: Wi-Fi, Bluetooth, and Zigbee    5.8. Extended Example: DNS client Chapter 6    6.1. Concurrency with Multithreading    6.2. Processes vs. Threads    6.3. Race Conditions and Critical Sections    6.4. POSIX Thread Library    6.5. Thread Arguments and Return Values    6.6. Implicit Threading and Language-based Threads    6.7. Extended Example: Keyboard Input Listener    6.8. Extended Example: Concurrent Prime Number Search Chapter 7    7.1. Synchronization Primitives    7.2. Critical Sections and Peterson's Solution    7.3. Locks    7.4. Semaphores    7.5. Barriers    7.6. Condition Variables    7.7. Deadlock    7.8. Extended Example: Event Log File Chapter 8    8.1. Synchronization Patterns and Problems    8.2. Basic Synchronization Design Patterns    8.3. Producer-Consumer Problem    8.4. Readers-Writers Problem    8.5. Dining Philosophers Problem and Deadlock    8.6. Cigarette Smokers Problem and the Limits of Semaphores and Locks    8.7. Extended Example: Parallel Modular Exponentiation Chapter 9    9.1. Parallel and Distributed Systems    9.2. Parallelism vs. Concurrency    9.3. Parallel Design Patterns    9.4. Limits of Parallelism and Scaling    9.5. Timing in Distributed Environments    9.6. Reliable Data Storage and Location    9.7. Consensus in Distributed Systems    9.8. Extended Example: Blockchain Proof-of-Work Appendix A    A.1. C Language Reintroduction    A.2. Documentation and Debugging    A.3. Basic Types and Pointers    A.4. Arrays, Structs, Enums, and Type Definitions    A.5. Functions and Scope    A.6. Pointers and Dynamic Allocation    A.7. Strings    A.8. Function Pointers    A.9. Files Show Source «  4.4. The Socket Interface   ::   Contents   ::   4.6. UDP Socket Programming: DNS  » 4.5. TCP Socket Programming: HTTP¶ Processes running at the application layer of the protocol stack are not fundamentally different from non-networked concurrent applications. The process has a virtual memory space, can exchange data through IPC channels, may interact with users through STDIN and STDOUT, and so on. The primary differences between such distributed application processes and non-networked processes are that the data is exchanged via an IPC channel based on a predefined communication protocol, and that channel has a significantly higher likelihood of intermittent communication failures. The peer process on the other host may be built by the same development team, it may be a customized open-source server, or it may be a proprietary network service. So long as both processes agree to abide by the protocol specification, writing distributed applications is not drastically different from other concurrent applications with IPC. In this section, we will demonstrate how to use TCP sockets to implement the basic functionality of HTTP, the protocol that underlies web-based technologies. 4.5.1. Hypertext Transfer Protocol (HTTP)¶ Figure 4.5.1: Basic request-response structure of HTTP running on top of TCP HTTP is the protocol that defines communication for web browsers and servers. Readers who have built personal or professional web pages have relied on this protocol, even if they were unaware of the details of its operation. HTTP is a simple request-response protocol, defined in RFC 2616. To be precise, HTTP is a stateless protocol, in the sense that neither the client nor the server preserves any state information between requests; the server processes each request independently from those that arrived previously. HTTP applications use TCP connections for their transport layer, and Figure 4.5.1 shows the basic structure of HTTP in relation to the functions that establish the socket connection. The client—a web browser—sends an HTTP request to the server and receives a response. Example 4.5.1 Both HTTP requests and responses begin with a sequence of header lines, each ending in a two-character sequence denoted as CRLF (carriage return-line feed, or "\r\n" in C strings). The first line of requests must be a designated Request or Response line, which must adhere to a given structure. After the first line, all other headers are optional, but they provide the client and server with additional useful information. At the end of the header lines, there is a single blank line (consisting of only CRLF). The figure below shows a sample HTTP header for a GET request, which is the type of request that indicates the client is asking for a copy of a file; in contrast a POST request occurs when the client is writing data back to the server. In the figure below, the client is requesting http://example.com/index.html, based on a link from https://link.from.com. Figure 4.5.3: Sample HTTP headers for a GET request The netcat tool is a useful way to explore the details of HTTP without a web browser. [1] Using netcat, you can interact directly with a remote HTTP server, typing the lines of the protocol itself. This tool is useful for text-based protocols like HTTP but cannot easily be used for protocols that use binary-formatted data. Consider the following example of a command-line session with netcat: $ netcat -v example.com 80 Warning: Inverse name lookup failed for `93.184.216.34' example.com [93.184.216.34] 80 (http) open GET / HTTP/1.1 Host: example.com Connection: close HTTP/1.1 200 OK [...more lines here, omitted for brevity...] To use netcat, you specify the hostname (example.com) and the port number (80) to access. After the command prompt, the first two lines are printed by netcat (in verbose mode with the -v flag) to indicate that it has connected to the server. The next four lines (the GET, Host, Connection, and blank lines) were typed manually by the user to request the contents of http://example.com/. The Host is required for HTTP/1.1, as many web servers are operated by third-party providers. In the case of example.com, the web server is operated by a cloud service provider, fastly.net. That is, the server at 93.184.216.34 is not serving content exclusively for example.com; there are several other domains that can be accessed from the same IP address. The Host header, then, tells fastly.net which specific domain name you are trying to reach. The lines beginning with HTTP/1.1 200 OK are the response from the server. The structure of an HTTP response is explained below. We omit the full response here, as it consists of several lines of HTTP headers and HTML code that are not critical to the current discussion. Writing the messages for an HTTP header is straightforward, as the headers are just concatenated text output. Code Listing 4.11 illustrates the general structure of this task. The client creates a buffer and copies the required Request line into the beginning. The string concatenation function, strncat(), appends the other lines to the buffer, and the buffer is written to the socket. Note that the length variable is used to keep track of how much available space is remaining in the buffer, which is always the capacity (500) minus the length of the existing string in the buffer. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 /* Code Listing 4.11: Constructing and sending an HTTP GET request */ size_t length = 500; char buffer[length + 1]; memset (buffer, 0, sizeof (buffer)); /* Copy first line in and shrink the remaining length available */ strncpy (buffer, "GET /web/index.html HTTP/1.0\r\n", length); length = 500 - strlen (buffer); /* Concatenate each additional header line */ strncat (buffer, "Accept: text/html\r\n", length); length = 500 - strlen (buffer); /* Other lines are similar and omitted... */ write (socketfd, buffer, strlen (buffer)); Bug Warning C’s string functions are notorious sources of buffer overflow vulnerabilities. One common way these vulnerabilities arise is with repeated calls to strncat(), such as the omitted lines in Code Listing 4.11. The problem is that each call reduces the amount of space left in the buffer. As this happens, the length parameter passed to strncat() each time must shrink to match only the remaining size of the buffer, not the original size. Using the original size each time would create the possibility that strings would be concatenated beyond the end of the buffer. Example 4.5.2 The figure below shows a sample response from a web server for the request in Example 4.5.1. The response begins with a required Response line that lets the client know the request was successful. The optional headers indicate that the body of the message (after the blank line) consists of 37 bytes of HTML text. (Note that the body of the message in Example 4.5.1 was empty, which is typical for GET requests.) The body of the response is the contents of the file index.html stored in the web server’s designated root directory. The newline character at the end of the HTML code is not required by the HTTP protocol; rather, it is simply a character stored in the file, as most text editors place a newline at the end of the file. From the perspective of HTTP, the body of the message is a meaningless stream of bytes; the content type only matters to the client (the web browser) so that the client knows how to handle the data. Specifically, the second line of the message body is an HTML header, demarcated with the ... tag structure. This header has no meaning to HTTP itself. Figure 4.5.6: Sample HTTP response to the request from Example 4.5.1 4.5.2. BNF Protocol Specification¶ The key features of the HTTP specification in RFC 2616 are structured as BNF declarations. To understand how these declarations structure the protocol, consider the required request and response lines. Every HTTP request must begin with a Request-Line and every response must begin with a Status-Line: Request-Line = Method SP Request-URI SP HTTP-Version CRLF Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF The SP designates a space while CRLF designates the carriage return-line feed. Although whitespace is typically insignificant in HTML, it is significant when processing HTTP headers; spaces and CRLF characters are required in particular places to facilitate correct interpretations. For requests, there are several valid Methods that can be used, with GET and POST being the two most common. A GET request corresponds to reading a file to display in the web browser; the body of the request would be empty in that case. A POST request, on the other hand, occurs when the web browser is sending data back to the server, such as when a user enters data in a form and submits it; the message body after the blank line contain the data to send, which would contain the form contents. Readers with experience writing HTML code may be familiar with query strings and cookies. As described previously, a query is part of the standard URI structure that begins with a '?' and can provide information to the server about how to process the request. For instance, the URL http://example.com/help.html?topic=login indicates that the user is looking for help logging in. The Request-URI in this case is /help.html?topic=login, containing the query string. [2] Cookies, on the other hand, are another technology used to provide data to the server; for instance, a cookie may contain an authentication token or a username to keep track of the user from one request to the next. Cookies are stored in their own HTTP header, but they are not described in RFC 2616. Instead, there is a separate RFC 6265 that defines the structure of cookies and how to use them. Ultimately, though, from the perspective of HTTP describe above, cookies are simply small pieces of data stored in another optional header field. 4.5.3. HTTP/1.1 Persistent Connections¶ The last part of an HTTP Request-Line is the version, which corresponds to the first field of the Status-Line that begins the response. The examples in Example 4.5.1 and Example GetResponseEx used the version HTTP/1.0, which is the basic request-response protocol we have discussed so far. HTTP/1.1 introduces persistent connections, which are commonly used in modern web applications. With a standard HTTP/1.0 request, the TCP socket connection is closed when the server sends the response. As such, if a client needs to request more data, the client must establish a new connection and start over. With an HTTP/1.1 persistent connection, the TCP connection is only closed after the client sends a request that explicitly asks to close the connection. 1 2 3 4 5 6 7 8 9 10