Java程序辅导

C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor Email:itutor@qq.com
wx: cjtutor
QQ: 2653320439
mTCP - Scalable User-level TCP Stack A Highly Scalable User-level TCP Stack for Multicore Systems What is mTCP? mTCP is a high-performance user-level TCP stack for multicore systems. Scaling the performance of short TCP connections is fundamentally challenging due to inefficiencies in the kernel. mTCP addresses these inefficiencies from the ground up - from packet I/O and TCP connection management all the way to the application interface. Figure 1. mTCP overview Besides adopting well-known techniques, our mTCP stack (1) translates expensive system calls to shared memory access between two threads within the same CPU core, (2) allows efficient flow-level event aggregation, and (3) performs batch processing of RX/TX packets for high I/O efficiency. mTCP on an 8-core machine improves the performance of small message transactions by a factor 25 (compared with the latest Linux TCP stack (kernel version 3.10.12)) and 3 (compared with with the best-performing research system). It also improves the performance of various popular applications by 33% (SSLShader) to 320% (lighttpd) compared with those on the Linux stack. Why User-level TCP? Many high-performance network applications spend a significant portion of CPU cycles for TCP processing in the kernel. (e.g., ~80% inside kernel for lighttpd) Even worse, these CPU cycles are not utilized effectively; according to our measurements, Linux spends more than 4x the cycles than mTCP in handling the same number of TCP transactions. Then, can we design a user-level TCP stack that incorporates all existing optimizations into a single system? Can we bring the performance of existing packet I/O libraries to the TCP stack? To answer these questions, we build a TCP stack in the user level. User-level TCP is attractive for many reasons. Easily depart from the kernel's complexity Directly benefit from the optimizations in the high performance packet I/O libraries Naturally aggregate flow-level events by packet-level I/O batching Easily preserve the existing application programming interface Event-driven Packet I/O Library Several packet I/O systems allow high-speed packet I/O (~100M packets/s) from a user-level application. However, they are not suitable for implementing a transport layer because (i) they waste CPU cycles by polling NICs and (ii) they do not allow multiplexing between RX and TX. To address these challenges, we extend PacketShader I/O engine (PSIO) for efficient event-driven packet I/O. The new event-driven interface, ps_select(), works similarly to select() except that it operates on TX/RX queues of interested NIC ports. For example, mTCP specifies interested NIC interfaces for RX and/or TX events with a timeout in microseconds, and ps_select() returns immediately if any events of interests are available. The use of PSIO brings the opportunity to amortize the overhead of various system calls and context switches throughout the system, in addition to eliminating the per-packet memory allocation and DMA overhead. For more detail about the PSIO, please refer to the PacketShader project page. User-level TCP Stack mTCP is implemented as a separate-TCP-thread-per-application-thread model.Since coupling TCP jobs with the application thread could break time-based operations such as handling TCP retransmission timeouts, we choose to create a separate TCP thread for each application thread affinitized to the same CPU core. Figure 2 shows how mTCP interacts with the application thread. Applications can communicate with the mTCP threads via library functions that grant safe sharing of the internal TCP data. Figure 2. Thread model of mTCP While designing the TCP stack, we consider following primitives for performance scalability and efficient event delivery. Thread mapping and flow-level core affinity Multicore and cache-friendly data structures Batched event handling Optimizations for short-lived connections Our TCP implementation follows the original TCP specification, RFC793. It supports basic TCP features such as connection management, reliable data transfer, flow control, and congestion control. mTCP also implements popular options such as timestamp, MSS, and window scaling. For congestion control, mTCP implements NewReno. Application Interface Our programming interface preserves as much as possible the most commonly-used semantics for easy migration of applications. We introduce our user-level socket API and an event system as below. User-level socket API mTCP provides a BSD-like socket interface; for each BSD socket function, we have a corresponding function call (e.g., accept() -> mtcp_accept()). In addition, we provide some of the fcntl() or ioctl() functionalities that are frequently used with sockets (e.g., setting socket as nonblocking, getting/setting the socket buffer size) and event systems as below. User-level event system As shown in Figure 3, we provide an epoll-like event system. Applications can fetch the events through mtcp_epoll_wait() and register events through mtcp_epoll_ctl(), which correspond to epoll_wait() and epoll_ctl() in Linux. Figure 3. Sample event-driven mTCP application As in Figure 2, you can program with mTCP just as you do with Linux epoll and sockets. One difference is that the mTCP functions require mctx (mTCP thread context) for all functions, managing resources independently among different threads for core-scalability. Performance We first show mTCP's scalability with a benchmark for a server sending a short (64B) message. All servers are multi-threaded with a single listening port. Figure 3 shows the performance as a function of the number of CPU cores. While Linux shows poor scaling due to a shared accept queue, and Linux with SO_REUSEPORT scales but not linearly, mTCP scales almost linearly with the number of CPU cores. On 8 cores, mTCP shows 25x, 5x, 3x higher performance over Linux, Linux+SO_REUSEPORT, and MegaPipe, respectively. Figure 4. Small message transaction benchmark To gauge the performance of lighttpd in a realistic setting, we run a test by extracting the static file workload from SpecWeb2009 as Affinity-Accept and MegaPipe did. Figure 4 shows that mTCP improves the throughput by 3.2x, 2.2x, 1.5x over Linux, REUSEPORT, and MegaPipe, respectively. For lighttpd, we changed only ~65 LoC to use mTCP-specific event and socket function calls. For multi-threading, a total of ~800 lines were modified out of lighttpd's ~40,000 LoC. Figure 5. Performance of lighttpd for static file workload from SpecWeb2009 Experiment setup: 1 Intel Xeon E5-2690 @ 2.90 GHz (octacore) 32 GB RAM (4 memory channels) 1~2 Intel dual port 82599 10 GbE NIC Linux 2.6.32 (for mTCP), Linux 3.1.3 (for MegaPipe), Linux 3.10.12 ixgbe-3.17.3 Publications mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems EunYoung Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park In Proceedings of USENIX NSDI 2014 Source code and Documentation Checkout the latest release of mTCP at our github! Manual pages of the mTCP library is available at: http://shader.kaist.edu/mtcp/index_man.html Our release contains the source code of mTCP, extended io_engine, sample applications (a simple web server and a http request generator), and ported applications (lighttpd and ApacheBench (ab)). Press Coverage Intel Developer Zone People Students: EunYoung Jeong, Shinae Woo, Muhammad Asim Jamshed, and Haewon Jeong Faculty: KyoungSoo Park, Dongsu Han, and Sunghwan Ihm We are collectively reached by our mailing list: mtcp-user at list.ndsl.kaist.edu. Subscribe to our mailing list here. Last modified: April 2, 2014 / Networked & Distributed Computing Systems Lab