Internet Engineering Task Force M. Welzl Internet-Draft University of Oslo Intended status: Experimental D. Damjanovic Expires: April 9, 2010 University of Innsbruck October 6, 2009 MulTFRC: TFRC with weighted fairness draft-welzl-multfrc-01.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 9, 2010. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This document specifies the MulTFRC congestion control mechanism. MulTFRC is derived from TFRC and can be implemented by carrying out a Welzl & Damjanovic Expires April 9, 2010 [Page 1] Internet-Draft MulTFRC October 2009 few small and simple changes to the original mechanism. Its behavior differs from TFRC in that it emulates a number of TFRC flows with more flexibility than what would be practical or even possible using multiple real TFRC flows. Additionally, MulTFRC better preserves the original features of TFRC than multiple real TFRCs do. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Section 3 of RFC 5348 . . . . . . . . . . . . . . . . . . 4 2.2. Section 4 of RFC 5348 . . . . . . . . . . . . . . . . . . 5 2.3. Section 5 of RFC 5348 . . . . . . . . . . . . . . . . . . 6 2.4. Section 6 of RFC 5348 . . . . . . . . . . . . . . . . . . 7 2.5. Section 8 of RFC 5348 . . . . . . . . . . . . . . . . . . 8 2.6. Appendix A of RFC 5348 . . . . . . . . . . . . . . . . . . 9 3. Usage Considerations . . . . . . . . . . . . . . . . . . . . . 9 3.1. Which applications should use MulTFRC? . . . . . . . . . . 9 3.2. Setting N . . . . . . . . . . . . . . . . . . . . . . . . 9 4. Security Considerations . . . . . . . . . . . . . . . . . . . 10 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.1. Normative References . . . . . . . . . . . . . . . . . . . 11 6.2. Informative References . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Welzl & Damjanovic Expires April 9, 2010 [Page 2] Internet-Draft MulTFRC October 2009 1. Introduction "TCP-friendliness", the requirement for a flow to behave under congestion like a flow produced by a conformant TCP (introduced by the name of "TCP-compatibility" in [RFC2309]), has been put into question in recent years (cf. [Bri07]). As an illustrative example, consider the fact that not all data transfers are of equal importance to a user. A user may therefore want to assign different priorities to different flows between two hosts, but TCP(-friendly) congestion control would always let these flows use the same sending rate. Users and their applications are now already bypassing TCP- friendliness in practice: since multiple TCP flows can better saturate a bottleneck than a single one, some applications open multiple connections as a simple workaround. The "GridFTP" [All03] protocol explicitly provides this function as a performance improvement. Some research efforts were therefore carried out to develop protocols where a weight can directly be applied to the congestion control mechanism, allowing a flow to be as aggressive as a number of parallel TCP flows at the same time. The first, and best known, such protocol is MulTCP [Cro+98], which emulates N TCPs in a rather simple fashion. Improved versions were later published, e.g. Stochastic TCP [Hac+04] and Probe-Aided (PA-)MulTCP [Kuo+08]. These protocols could be called "N-TCP-friendly", i.e. as TCP-friendly as N TCPs. MulTFRC, defined in this document, does with TFRC [RFC5348] what MulTCP does with TCP. In [Dam+09], it was shown that MulTFRC achieves its goal of emulating N flows better than MulTCP (and improved versions of it) and has a number of other benefits. For instance, MulTFRC with N=2 is more reactive than two real TFRC flows are, and it has a smoother sending rate than two real MulTFRC flows do. Moreover, since it is only one mechanism, a protocol that uses MulTFRC can send a single data stream with the congestion control behavior of multiple data streams without the need to split the data and spread it over separate connections. Depending on the protocol in use, N real TFRC flows can also be expected to have N times the overhead for, e.g., connection setup and teardown, of a MulTFRC flow with the same value of N. The core idea of TFRC is to achieve TCP-friendliness by explicitly calculating an equation which approximates the steady-state throughput of TCP and sending as much as the calculation says. The core idea of MulTFRC is to replace this equation in TFRC with the algorithm from [Dam+08], which approximates the steady-state throughput of N TCP flows. MulTFRC can be implemented via a few simple changes to the TFRC code. It is therefore defined here by specifying how it differs from the TFRC specification [RFC5348]. Welzl & Damjanovic Expires April 9, 2010 [Page 3] Internet-Draft MulTFRC October 2009 2. Specification This section lists the changes to [RFC5348] that must be applied to turn TFRC into MulTFRC. The small number of changes ensures that many original properties of a single TFRC flow are preserved, which is often the most appropriate choice (e.g. it would probably not make sense for a MulTFRC flow to detect a data-limited interval differently than a single TFRC flow would). It also makes MulTFRC easy to understand and implement. Experiments have shown that these changes are enough to attain the desired effect. 2.1. Section 3 of RFC 5348 While the TCP throughput equation requires the loss event rate, round-trip time and segment size as input, the algorithm to be used for MulTFRC additionally needs the number of packets lost in a loss event. The equation, specified in [RFC5348] as s X_Bps = ---------------------------------------------------------- R*sqrt(2*b*p/3) + (t_RTO * (3*sqrt(3*b*p/8)*p*(1+32*p^2))) is replaced with the following algorithm, which returns X_Bps, the average transmit rate of N TCPs in bytes per second: If (N < 12) { af = N * (1-(1-1/N)^j); } Else { af = j; } af=max(min(af,ceil(N)),1); a = p*b*af*(24*N^2+p*b*af*(N-2*af)^2); x= (af*p*b*(2*af-N)+sqrt(a))/(6*N^2*p); q=min(2*j*b/(x*(1+3*N/j)),N); z=t_RTO*(1+32*p^2)/(1-p); If (q*z/(x*R) >= N) { q = N; } Else { q = q*z/(x*R); } X_Bps = ((1-q/N)/(p*x*R)+q/(z*(1-p)))*s; Where: s is the segment size in bytes (excluding IP and transport protocol headers). Welzl & Damjanovic Expires April 9, 2010 [Page 4] Internet-Draft MulTFRC October 2009 R is the round-trip time in seconds. b is the maximum number of packets acknowledged by a single TCP acknowledgement. p is the loss event rate, between 0 and 1.0, of the number of loss events as a fraction of the number of packets transmitted. j is the number of packets lost in a loss event. t_RTO is the TCP retransmission timeout value in seconds. N is the number of TFRC flows that MulTFRC should emulate. N is a positive rational number; a discussion of appropriate values for this parameter, and reasons for choosing them, is provided in Section 3.2. ceil(N) gives the smallest integer greater than or equal to N. x, af, a, z and q are temporary floating point variables. Section 3.1 of [RFC5348] contains a recommendation for setting a parameter called t_RTO. Here, the use of the algorithm for calculating RTO specified in [RFC2988] is RECOMMENDED instead. Further, this section proposes a simplification of the equation as a result of setting t_RTO in a specific way. This part of the TFRC specification is irrelevant here. Section 3.1 of [RFC5348] also contains a discussion of the parameter b for delayed acknowledgements and concludes that the use of b=1 is RECOMMENDED. This is also the case for MulTFRC. Section 3.2.2 of [RFC5348] specifies the contents of feedback packets. In addition to the information listed there, a MulTFRC feedback packet also carries j, the number of packets lost in a loss event. 2.2. Section 4 of RFC 5348 The procedure for updating the allowed sending rate in section 4.3 of [RFC5348] ("action 4") contains the statement: Calculate X_Bps using the TCP throughput equation. which is replaced with the statement: Calculate X_Bps using the algorithm defined in section 3. Welzl & Damjanovic Expires April 9, 2010 [Page 5] Internet-Draft MulTFRC October 2009 2.3. Section 5 of RFC 5348 Section 5.2 of [RFC5348] explains how a lost packet that starts a new loss event should be distinguished from a lost packet that is a part of the previous loss event interval. Here, additionally the number of packets lost in a loss event is counted, and therefore this section is extended with: If S_new is a part of the current loss interval LP_0 (the number of lost packets in the current interval) is increased by 1. On the other hand, if S_new starts a new loss event, LP_0 is set to 1. Section 5.4 of [RFC5348] contains the algorithm for calculating the average loss interval that is needed for calculation of the loss event rate, p. MulTFRC also requires the number of lost packets in a loss event, j. In [RFC5348] the calculation of the average loss interval is done using a filter that weights the n most recent loss event intervals, and setting n to 8 is RECOMMENDED. The same algorithm is used here for calculating the average loss interval. For the number of lost packets in a loss event interval, j, the weighted average number of lost packets in the n most recent loss intervals is taken and the same filter is used. For calculating the average number of packets lost in a loss event interval we use the same loss intervals as for the p calculation. Let LP_0 to LP_k be the number of lost packets in the k most recent loss intervals. The algorithm for calculating I_mean in Section 5.4 of [RFC5348] (page 23) is extended by adding, after the last line ("p = 1 / I_mean;"): LP_tot = 0; If (I_tot0 > I_tot1) { for (i = 0 to k-1) { LP_tot = LP_tot + (LP_i * w_i); } } Else { for (i = 1 to k) { LP_tot = LP_tot + (LP_i * w_i); } } j = LP_tot/W_tot; In section 5.5 of [RFC5348] (page 25), the algorithm that ends with "p = min(W_tot0/I_tot0, W_tot1/I_tot1);" is extended by adding: Welzl & Damjanovic Expires April 9, 2010 [Page 6] Internet-Draft MulTFRC October 2009 LP_tot = 0; If (I_tot0 > I_tot1) { for (i = 0 to k-1) { LP_tot = Lp_tot + (LP_i * w_i * DF_i * DF); } j = LP_tot/W_tot0; } Else { for (i = 1 to k) { LP_tot = LP_tot + (LP_i * w_(i-1) * DF_i); } j = LP_tot/W_tot1; } 2.4. Section 6 of RFC 5348 The steps to be carried out by the receiver when a packet is received in section 6.1 of [RFC5348] ("action 4") contain the statement: 3) Calculate p: Let the previous value of p be p_prev. Calculate the new value of p as described in Section 5. which is replaced with the statement: 3) Calculate p and j: Let the previous values of p and j be p_prev and j_prev. Calculate the new values of p and j as described in Section 5. The steps to be carried out by the receiver upon expiration of feedback timer in section 6.2 of [RFC5348] ("action 1") contain the statement: 1) Calculate the average loss event rate using the algorithm described in Section 5. which is replaced with: 1) Calculate the average loss event rate and average number of lost packets in a loss event using the algorithm described in Section 5. This statement is added at the beginning of the list of initial steps to take when the first packet is received, in section 6.3 of [RFC5348]: o Set j = 0. Welzl & Damjanovic Expires April 9, 2010 [Page 7] Internet-Draft MulTFRC October 2009 Section 6.3.1 of [RFC5348] discusses how the loss history is initialized after the first loss event. TFRC approximates the rate to be the maximum value of X_recv so far, assuming that a higher rate introduces loss. Therefore j for this rate is approximated by 1 and the number of packets lost in the first interval is set to 1. This is accomplished by the following change. The first sentence of the fourth paragraph (in section 6.3.1) is: TFRC does this by finding some value, p, for which the throughput equation in Section 3.1 gives a sending rate within 5% of X_target, given the round-trip time R, and the first loss interval is then set to 1/p. which is replaced with: TFRC does this by finding some value, p, for which the throughput equation in Section 3.1 gives a sending rate within 5% of X_target, given the round-trip time R, and j equal to 1. The first loss interval is then set to 1/p. The second last paragraph in section 6.3.1 ends with: Thus, the TFRC receiver calculates the loss interval that would be required to produce the target rate X_target of 0.5/R packets per second, for the round-trip time R, and uses this synthetic loss interval for the first loss interval. The TFRC receiver uses 0.5/R packets per second as the minimum value for X_target when initializing the first loss interval. which is replaced with: Thus, the TFRC receiver calculates the loss interval that would be required to produce the target rate X_target of 0.5/R packets per second, for the round-trip time R, and for j equal to 1. This synthetic loss interval is used for the first loss interval. The TFRC receiver uses 0.5/R packets per second as the minimum value for X_target when initializing the first loss interval. 2.5. Section 8 of RFC 5348 Section 8.1 explains details about calculating the original TCP throughput equation, which was replaced with a new algorithm in this document. It is therefore obsolete. Welzl & Damjanovic Expires April 9, 2010 [Page 8] Internet-Draft MulTFRC October 2009 2.6. Appendix A of RFC 5348 This section provides a terminology list for TFRC, which is extended as follows: N: number of emulated TFRC flows. j: number of packets lost in a loss event. 3. Usage Considerations The "weighted fairness" service provided by a protocol using MulTFRC is quite different from the service provided by traditional Internet transport protocols. This section intends to answer some questions that this new service may raise. 3.1. Which applications should use MulTFRC? Like TFRC, MulTFRC is suitable for applications that require a smoother sending rate than standard TCP. Since it is likely that these would be multimedia applications, TFRC has largely been associated with them (and [RFC5348] mentions "streaming media" as an example). Since timely transmission is often more important for them than reliability, multimedia applications usually do not keep retransmitting packets until their successful delivery is ensured. Accordingly, TFRC usage was specified for the Datagram Congestion Control Protocol (DCCP) [RFC4342], but not for reliable data transfers. MulTFRC, on the other hand, provides an altogether different service. For some applications, a smoother sending rate may not be particularly desirable but might also not be considered harmful, while the ability to emulate the congestion control of N flows may be useful for them. This could include reliable transfers such as the transmission of files. Possible reasons to use MulTFRC for file transfers include the assignment of priorities according to user preferences, increased efficiency with N > 1, the implementation of low-priority "scavenger" services, and resource pooling [Wis+08]. 3.2. Setting N N MUST be set at the beginning of a transfer; it MUST NOT be changed while a transfer is ongoing. The effects of changing N during the lifetime of a MulTFRC session on the dynamics of the mechanism are yet to be investigated; in particular, it is unclear how often N could safely be changed, and how "safely" should be defined in this context. Further research is required to answer these questions. Welzl & Damjanovic Expires April 9, 2010 [Page 9] Internet-Draft MulTFRC October 2009 N is a positive floating point number which can also take values between 0 and 1, making MulTFRC applicable as a mechanism for what has been called a "Lower-than-Best-Effort" (LBE) service. Since it does not reduce its sending rate early as delay increases like some alternative proposals for such a service do (e.g. TCP-LP [Kuz+06], TCP Nice [Ven+02] or 4CP [Liu+07]), it can probably be expected to be more aggressive than these mechanisms if they share a bottleneck at the same time. This also means that MulTFRC is less likely to be prone to starvation. Values between 0 and 1 could also be useful if MulTFRC is used across multiple paths to realize resource pooling [Wis+08]. Setting N to 1 is also possible. In this case, the only difference between TFRC and MulTFRC is that the underlying model of TFRC assumes that all remaining packets following a dropped packet in a "round" (less than one round-trip time apart) are also dropped, whereas the underlying model of MulTFRC does not have this assumption. In other words, other than the equation used in MulTFRC, the underlying equation of TFRC assumes that loss always occurs in bursts. Which choice is better depends on the specific network situation; large windows and other queuing schemes than Drop-Tail seem to make it less likely for the burst-loss assumption to match reality. This document does not make any recommendation about which mechanism to use if only one flow is desired. Since TCP has been extensively studied, and the aggression of its congestion control mechanism is emulated by TFRC, we can look at the behavior of a TCP aggregate in order to find a reasonable upper limit for N in MulTFRC. From [Alt+06], N TCPs (assuming non-sychronized loss events over connections) can saturate a bottleneck link by roughly 100-100/(1+3N) percent. This means that a single flow can only achieve 75% utilization, whereas 3 flows already achieve 90%. The theoretical gain that can be achieved by adding a flow declines with the total number of flows - e.g., while going from 1 to 2 flows is a 14.3% performance gain, the gain becomes less than 1% beyond 6 flows (which already achieve 95% link utilization). Since the link utilization of MulTFRC can be expected to be roughly the same as the link utilization of multiple TCPs, the approximation above also holds for MulTFRC. Thus, setting N to a much larger value than the values mentioned above will only yield a marginal benefit in isolation but can significantly affect other traffic. Therefore, the maximum value that a user can set for MulTFRC SHOULD NOT exceed 6. 4. Security Considerations It is well known that a single uncontrolled UDP flow can cause significant harm to a large number of TCP flows that share the same Welzl & Damjanovic Expires April 9, 2010 [Page 10] Internet-Draft MulTFRC October 2009 bottleneck. This potential danger is due to the total lack of congestion control. Because this problem is well known, and because UDP is easy to detect, UDP traffic will often be rate limited by service providers. If MulTFRC is used within a protocol such as DCCP, which will normally not be considered harmful and will therefore typically not be rate-limited, its tunable aggression could theoretically make it possible to use it for a Denial-of-Service (DoS) attack. In order to avoid such usage, the maximum value of N MUST be restricted. If, as recommended in this document, the maximum value for N is restricted to 6, the impact of MulTFRC on TCP is roughly the same as the impact of 6 TCP flows would be. It is clear that the conjoint congestion control behavior of 6 TCPs is far from being such an attack. With transport protocols such as TCP, SCTP or DCCP, users can already be more aggressive than others by opening multiple connections. If MulTFRC is used within a transport protocol, this effect becomes more pronounced - e.g., 2 connections with N set to 6 for each of them roughly exhibit the same congestion control behavior as 12 TCP flows. The N limit SHOULD therefore be implemented as a system wide parameter such that the sum of the N values of all MulTFRC connections does not exceed it. Alternatively, the number of connections that can be opened could be restricted. 5. Acknowledgements This work was partially funded by the EU IST project EC-GIN under the contract STREP FP6-2006-IST-045256. 6. References 6.1. Normative References [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 5348, September 2008. 6.2. Informative References [All03] Allcock, W., "GridFTP: Protocol Extensions to FTP for the Grid", Open Grid Forum Document GFD.20, 2003. [Alt+06] Altman, E., Barman, D., Tuffin, B., and M. Vojnovic, "Parallel TCP Sockets: Simple Model, Throughput and Validation", Proceedings of Infocom 2006, April 2006. Welzl & Damjanovic Expires April 9, 2010 [Page 11] Internet-Draft MulTFRC October 2009 [Bri07] Briscoe, B., "Flow rate fairness: dismantling a religion", ACM SIGCOMM Computer Communication Review vol. 37, no. 2, (April 2007), pp. 63-74, April 2007. [Cro+98] Crowcroft, J. and P. Oechslin, "Differentiated end-to-end Internet services using a weighted proportional fair sharing TCP", ACM SIGCOMM Computer Communication Review vol. 28, no. 3 (July 1998), pp. 53-69, 1998. [Dam+08] Damjanovic, D., Welzl, M., Telek, M., and W. Heiss, "Extending the TCP Steady-State Throughput Equation for Parallel TCP Flows", University of Innsbruck, Institute of Computer Science, DPS NSG Technical Report 2, August 2008. [Dam+09] Damjanovic, D. and M. Welzl, "MulTFRC: Providing Weighted Fairness for Multimedia Applications (and others too!)", ACM SIGCOMM Computer Communication Review vol. 39, issue 9 (July 2009), 2009. [Hac+04] Hacker, T., Noble, B., and B. Athey, "Improving Throughput and Maintaining Fairness using Parallel TCP", Proceedings of Infocom 2004, March 2004. [Kuo+08] Kuo, F. and X. Fu, "Probe-Aided MulTCP: an aggregate congestion control mechanism", ACM SIGCOMM Computer Communication Review vol. 38, no. 1 (January 2008), pp. 17-28, 2008. [Kuz+06] Kuzmanovic, A. and E. Knightly, "TCP-LP: low-priority service via end-point congestion control", IEEE/ACM Transactions on Networking (ToN) Volume 14, Issue 4, pp. 739-752., August 2006, . [Liu+07] Liu, S., Vojnovic, M., and D. Gunawardena, "Competitive and Considerate Congestion Control for Bulk Data Transfers", Proceedings of IWQoS 2007, June 2007. [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., and L. Zhang, "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, April 1998. [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission Timer", RFC 2988, November 2000. Welzl & Damjanovic Expires April 9, 2010 [Page 12] Internet-Draft MulTFRC October 2009 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for Datagram Congestion Control Protocol (DCCP) Congestion Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, March 2006. [Ven+02] Venkataramani, A., Kokku, R., and M. Dahlin, "TCP Nice: a mechanism for background transfers", Proceedings of OSDI '02, 2002. [Wis+08] Wischik, D., Handley, M., and M. Braun, "The Resource Pooling Principle", ACM Computer Communication Review Volume 38, Issue 5 (October 2008), October 2008, . Authors' Addresses Michael Welzl University of Oslo PO Box 1080 Blindern Oslo, N-0316 Norway Phone: +47 22 85 24 20 Email: michawe@ifi.uio.no Dragana Damjanovic University of Innsbruck Technikerstr. 21 A Innsbruck, A-6020 Austria Phone: +43 512 507 96803 Email: dragana.damjanovic@uibk.ac.at Welzl & Damjanovic Expires April 9, 2010 [Page 13]