Network Working Group A. El-SherbinyInternet-DraftRequest for Comments: 5564 M. FarahIntended status:Category: Informational UN-ESCWAExpires: August 9, 2009I. Oueichek Syrian Telecom Establishment A. Al-Zoman SaudiNIC, CITCFebruary 5,May 2009 Linguistic Guidelines for the Use of the Arabic Language in Internet Domainsdraft-farah-adntf-ling-guidelines-04.txtStatus ofthisThis Memo ThisInternet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents ofmemo provides information for the InternetEngineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time.community. Itis inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The listdoes not specify an Internet standard ofcurrent Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The listany kind. Distribution ofInternet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.this memo is unlimited. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents(http://trustee.ietf.org/license-info)in effect on the date of publication of thisdocument.document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. IESG Note ThisInternet-Draft will expireRFC is not a candidate for any level of Internet Standard. The IETF disclaims any knowledge of the fitness of this RFC for any purpose and notes that the decision to publish is not based onAugust 9, 2009.IETF review apart from IESG review for conflict with IETF work. The RFC Editor has chosen to publish this document at its discretion. See RFC 3932 for more information. Abstract This document constitutes technical specifications for the use of Arabic in InternetDomaindomain names and provides linguistic guidelines for ArabicDomain Names.domain names. It addresses Arabic-specific linguistic issues pertaining to the use of Arabic language in domain names. Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3....................................................2 2. Arabic Language-Specific Issues. . . . . . . . . . . . . . . 4.................................3 2.1. Linguistic Issues. . . . . . . . . . . . . . . . . . . . 4..........................................4 2.1.1. Diacritics(tashkeel)(Tashkeel) and Shadda. . . . . . . . . . . 5....................4 2.1.2. Kasheeda or Tatweel (Horizontal Character Size Extension). . . . . . . . . . . . . . . . . . . . . . 5.....................................5 2.1.3. Character Folding. . . . . . . . . . . . . . . . . . 5...................................5 2.2. Supported Character Set. . . . . . . . . . . . . . . . . 6....................................6 2.3. Arabic Linguistic Issues Affected By Technical Constraints. . . . . . . . . . . . . . . . . . . . . . . 8................................................7 2.3.1. Numerals. . . . . . . . . . . . . . . . . . . . . . . 8............................................7 2.3.2. The Space Character. . . . . . . . . . . . . . . . . 8.................................8 3. Summary and Conclusion. . . . . . . . . . . . . . . . . . . . 9..........................................8 4. Security Considerations. . . . . . . . . . . . . . . . . . . 9.........................................8 5.IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 6.Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . 9 7..................................................9 6. References. . . . . . . . . . . . . . . . . . . . . . . . . . 10 7.1.......................................................9 6.1. Normative References. . . . . . . . . . . . . . . . . . . 10 7.2........................................9 6.2. Informative References. . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Intellectual Property and Copyright Statements . . . . . . . . . . 12.....................................9 1. Introduction The Internet Engineering Task Force (IETF) issued in March 2003 a set of RFCs for Internationalized Domain Names (IDN)[1],[2], [3]([1], [2], and [3]), which were planned to become the de facto standard for all languages. In 2007 and 2008,new versions of the internet-drafts proposingthe following working drafts were released that propose revisions to the IDNAprotocol have been released and are as follows:protocol: oInternationalizingInternationalized Domain Names for Applications (IDNA):IssuesBackground, Explanation, and Rationale [5] oInternationalizingInternationalized Domain Names in Applications (IDNA): Protocol [6] o An updated IDNAproblem incriterion for right-to-left scripts [7] o The UnicodeCodepointscode points andIDNIDNA [8]ThoseThese documents are known collectively as "IDNA2008". This document constitutes a technical specification for the implementation of the IDN standards in the case of the ArabicLanguage.language. It will allow the use of standard language tables to write domain names in Arabic characters. Therefore, it should be considered as a logical extension to the IDN standards. It thus presents guidelines for the proper use of Arabic characters with the IDN standards in an Arabic language context. This document reflects the recommendations of the Arab Working Group on Arabic Domain Names(AWG-ADN)(AWG-ADN), established by the League of Arab States (LAS), based on standardisation efforts of the United Nations Economic and Social Commission for Western Asia (UN-ESCWA) andits Internet- Draft,on that group's document, "Guidelines for an Arabic Internet Domain Name" [9].ItThis document is also in full harmony with recent rigorous discussions that took placewithwithin the major language communities thatalsouse the Arabic script in their languages. This document provides guidelines for the ways Arabic characters may be used for registering InternetDomain Namesdomain names and howlinguisticlinguistic- specific issues should be handled. A few rules are recommended for application at the protocol level. The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" in this document are to be interpreted as described in RFC 2119 [4]. Comments on this document are solicited and should be addressed to the working group's mailing list at ESCWA-ICTD@un.org and/or the author(s). 2. Arabic Language-Specific Issues The main objective of the creation of ArabicDomain Namesdomain names is to have a vehicle to increase Internet use amongst all strata of the Arabic- speaking communities. Furthermore, anon-user friendly Domain Namenon-user-friendly domain name would further add to the ambiguity and the eccentricity of the Internet to the Arabic-speaking communities, thus contributing negatively to the spread of the Internet and leading to further isolation of these communities at the global level. Hence, there have been intensive effortsespecially(especially those spearheaded by Dr. Al-Zoman and contributed to by UN-ESCWA and its Arabic Domain Names Task Force(ADN-TF)(ADN-TF)) to reach consensus on a multitude of linguistic issues with the following goals: o To define the accepted Arabic character set to be used for writing domain names inArabic;Arabic, which is the subject of this document. o To define the top-level domains of the Arabic domain name tree structure (i.e., Arabic gTLDs and ccTLDs). This goal will be handled in a separate document. The first meeting of the AWG-ADN, held in DamascusJanuary-Februaryfrom January- February 2005, gave special attention to the following:a.o Simplification of the domain names, whenever possible, to facilitate the interaction of the Arabic user with the Internet.b.o Adoption of solutions that do not lead to confusion either in reading or in writing, provided that this does not compromise the linguistic correctness of used words.c.o Mixing Arabic and non-Arabic letters in the domain name label is not acceptable. 2.1. Linguistic Issues There are a number of linguistic issues that have been proposed with respect to the use of the Arabic language in domain names. This section will highlight some of them. This section is based on the papers of Dr. Al-Zoman[10] [11]([10] and [11]) and on the report of the first meeting of AWG-ADN [12]. Fordetailsdetails, the reader is encouraged to reviewthethese references. 2.1.1. Diacritics(tashkeel)(Tashkeel) and Shadda Tashkeel and Shadda are accent marks placed above or below Arabic letters to produce proper pronunciation. They are thus used to differentiate different meanings for different words with the same base characters. Neither Tashkeel nor Shadda are permitted in zone files when registering domain names in the Arabic language, although they are permitted in the current edition of IDNA2008. They can be supported or ignored, if necessary, in the user interface with local mappings and can be stripped before IDNA processing. The following are their Unicode presentations: U+064B ARABIC FATHATAN U+064C ARABIC DAMMATAN U+064D ARABIC KASRATAN U+064E ARABIC FATHA U+064F ARABIC DAMMA U+0650 ARABIC KASRA U+0651 ARABIC SHADDA U+0652 ARABIC SUKUN 2.1.2. Kasheeda or Tatweel (Horizontal Character Size Extension) Kasheeda (U+0640 ARABIC TATWEEL) must not be used in Arabic domain names and should be disallowed for Arabic language domain names. The Kasheeda is not a letter and does not have an effect on pronunciation. It is used to extend the horizontal length or change the shape of the preceding letter for graphical representation purposes in Arabic writing. Accordingly, it has no value for the writing of domain names. The same applies to all languages using the Arabic script. The authors recommend that it should be disallowed at the protocol level. 2.1.3. Character Folding Character folding is the process where multiple letters (that may have some similarity with respect to their shapes) are folded into one shape. Examples of such Arabic characters include: o Folding Teh Marbuta (U+0629) and Heh (U+0647) at the end of aword;word o Folding different forms of Hamzah (U+0622, U+0623, U+0625,U+0627);U+0627) o Folding Alef Maksura (U+0649) and Yeh (U+064A) at the end of aword;word o Folding Waw with Hamzah Above (U+0624) and Waw(U+0648).(U+0648) With respect to the Arabic language, character folding is not acceptable because it changes the meaning of words anditis against the principle of spelling rules. Replacing a character valid for use in domain names with another character also valid for use in domain names, which may have a similar shape, will give a different meaning. This will lead tohaveonly one word representing several words consisting of all the combinations of folded characters. Hence, the other words will be masked by a single word [10]. Mis-spelling or handwriting errors dooccuroccur, leading to mixing different characters despite the fact that this is not the case in published and printed materials. One of the motivations of this effort is to preserve thelanguagelanguage, particularly with the spread of the globalization movement. Within this context, character folding is working against this motivation since it is going to have a negativeaffecteffect on the principle and ethics of the language. Technology should workfor preservingto preserve the language and notfor destroyingto destroy it. Thus, character folding should not be allowed. The case of digits is treated in a separate section below. 2.2. Supported Character Set A domain name to be written in Arabic must be composed of a sequence of the following UNICODE characters and the FULL STOP (u+002E) toseperateseparate the labels. These are based on UNICODE version 5.0. The tables below are constructed using an inclusion-based approach. Thus, characters that are not part ofthe tablethese tables are prohibited. +---------+-------------------------------------+ | Unicode | Character Name | +---------+-------------------------------------+ | 0621 | ARABIC LETTER HAMZA | | 0622 | ARABIC LETTER ALEF WITH MADDA ABOVE | | 0623 | ARABIC LETTER ALEF WITH HAMZA ABOVE | | 0624 | ARABIC LETTER WAW WITH HAMZA ABOVE | | 0625 | ARABIC LETTER ALEF WITH HAMZA BELOW | | 0626 | ARABIC LETTER YEH WITH HAMZA ABOVE | | 0627 | ARABIC LETTER ALEF | | 0628 | ARABIC LETTER BEH | | 0629 | ARABIC LETTER TEH MARBUTA | | 062A | ARABIC LETTER TEH | | 062B | ARABIC LETTER THEH | | 062C | ARABIC LETTER JEEM | | 062D | ARABIC LETTER HAH | | 062E | ARABIC LETTER KHAH | | 062F | ARABIC LETTER DAL | | 0630 | ARABIC LETTER THAL | | 0631 | ARABIC LETTER REH | | 0632 | ARABIC LETTER ZAIN | | 0633 | ARABIC LETTER SEEN | | 0634 | ARABIC LETTER SHEEN | | 0635 | ARABIC LETTER SAD | | 0636 | ARABIC LETTER DAD | | 0637 | ARABIC LETTER TAH | | 0638 | ARABIC LETTER ZAH | | 0639 | ARABIC LETTER AIN | | 063A | ARABIC LETTER GHAIN | | 0641 | ARABIC LETTER FEH | | 0642 | ARABIC LETTER QAF | | 0643 | ARABIC LETTER KAF | | 0644 | ARABIC LETTER LAM | | 0645 | ARABIC LETTER MEEM | | 0646 | ARABIC LETTER NOON | | 0647 | ARABIC LETTER HEH | | 0648 | ARABIC LETTER WAW | | 0649 | ARABIC LETTER ALEF MAKSURA | | 064A | ARABIC LETTER YEH | | 0660 | ARABIC-INDIC DIGIT ZERO | | 0661 | ARABIC-INDIC DIGIT ONE | | 0662 | ARABIC-INDIC DIGIT TWO | | 0663 | ARABIC-INDIC DIGIT THREE | | 0664 | ARABIC-INDIC DIGIT FOUR | | 0665 | ARABIC-INDIC DIGIT FIVE | | 0666 | ARABIC-INDIC DIGIT SIX | | 0667 | ARABIC-INDIC DIGIT SEVEN | | 0668 | ARABIC-INDIC DIGIT EIGHT | | 0669 | ARABIC-INDIC DIGIT NINE | +---------+-------------------------------------+ Source: Supporting the Arabic Language in Domain Names [10] Table 1: CHARACTERS FROM UNICODE ARABIC TABLE (0600-06FF) +---------+-----------------+ | Unicode | Digit Name | +---------+-----------------+ | 0030 | DIGIT ZERO | | 0031 | DIGIT ONE | | 0032 | DIGIT TWO | | 0033 | DIGIT THREE | | 0034 | DIGIT FOUR | | 0035 | DIGIT FIVE | | 0036 | DIGIT SIX | | 0037 | DIGIT SEVEN | | 0038 | DIGIT EIGHT | | 0039 | DIGIT NINE | | 002D | HYPHEN-MINUS | +---------+-----------------+ Source: Supporting the Arabic Language in Domain Names[11][10] Table 2: CHARACTERS FROM UNICODE BASIC LATIN TABLE (0000-007F) 2.3. Arabic Linguistic Issues Affected By Technical Constraints In this section, technical aspects of some linguistic issues are discussed. 2.3.1. Numerals In the Arab countries, there are two sets of numerical digits used: o Set I: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) mostly used in the western part of the Arab world. o Set II: (u+0660, u+0661, u+0662, u+0663, u+0664, u+0665, u+0666, u+0667, u+0668, u+0669) mostly used in the eastern part of the Arab world. Both sets may be supported in the user interface; however, the rule of numeral homogeneity must be observed. The rule specifies that digits from the Arabic-Indic set of numerals (u+0660 to u+0669) should not be allowed to mix with ASCII digits (u+0030 to u+0039) within the same Arabic domain name label.ThusThus, the appearance of a digit from one set prevents the use of any other digit from the other set. 2.3.2. The Space Character The space character is strictly disallowed in domain names, as it is a control character. Instead, the hyphen(Al-sharta) (i.e.u+02D)(Al-sharta, i.e., u+02D) is proposed as a separator between Arabic words to avoid confusion that can take place if the words are typed without a separator. It is acceptable to use the hyphen to separate between words within the same domain name label. 3. Summary and Conclusion The proposed guidelines are in full accordance with the IETF IDN standards and take into accountArabic language-specificArabic-language-specific issues within a compromise between grammatical rules of the Arabic language andtheease of use ofthethat language on the Internet. In summary, the guidelines specifythatthat, in Arabic domain names: o Accent marks (Tashkeel and Shadda) are not permitted. o Character folding is not permitted. o If a numeral from the Arabic-Indic or ASCII digit sets appears in a label, numeral homogeneity is required. o The hyphen must be used as a word separator instead of space. 4. Security Considerations No particular security considerations could be identified regarding the use of Arabic characters in writing domain names. In particular, any potential visual confusion between different character strings is avoided using the guidelines proposed in this document. 5.IANA Considerations This document has no action for IANA. 6.Acknowledgments ESCWA ICT Division provided support and funding for the development of this document with the objective of reaching a standard foracomprehensive ArabicDomain Names.domain names. Thanks are due to SaudiNIC for its continuous efforts in supporting the development of ArabicDomain Names.domain names. John Klensin and Harald Alvestrand reviewed the document and provided useful editorial and substantive support to enrich it.7.6. References7.1.6.1. Normative References [1] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. [2] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003. [3] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003. [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.7.2.6.2. Informative References [5] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Definitions, Background and Rationale",draft-ietf-idnabis-rationale-06 (workWork inprogress),Progress, September 2008. [6] Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol",draft-ietf-idnabis-protocol-08 (workWork inprogress),Progress, September 2008. [7] Alvestrand, H. and C. Karp, "An updated IDNA criterion for right-to-left scripts",draft-ietf-idnabis-bidi-03 (workWork inprogress),Progress, July 2008. [8] Faltstrom, P., "The Unicode Codepoints and IDNA",draft-ietf-idnabis-tables-05 (workWork inprogress),Progress, July 2008. [9] United Nations Economic and Social Commission for Western Asia (UN-ESCWA), "Guidelines for an Arabic Domain Name System (ADNS)",Internet-Draft farah-adntf-adns-guidelines-03.txt,Work in Progress, November 2007. [10] Al-Zoman, A., "Supporting the Arabic Language in Domain Names", October 2003,<http://www.arabic-domains.org/docs/NIC-docs/ SupportingArabicDomainNmaes.pdf>.<http://www.arabic-domains.org/docs/ NIC-docs/SupportingArabicDomainNmaes.pdf>. [11] Al-Zoman, A., "Arabic Top-Level Domains",July 2003.Paper presented inEGMExpert Group Meeting onpromotionPromotion of Digital Arabic Content, the United Nations,ESCWA, BeirutEconomic and Social Commission for Western Asia, Beirut, June 2003. [12] League of Arab States, "Report of the first meeting of AWG-ADN, Damascus", February 2005,<http://www.arabic-domains.org/ar/intrnational-entites.php>. This document is in Arabic.<http://www.arabic- domains.org/ar/intrnational-entites.php>. Authors' Addresses Ayman El-Sherbiny Information and Communication Technology Division ESCWA UN-House P.O. Box 11-8575 Beirut LebanonEmail:EMail: El-sherbiny@un.org Mansour Farah Information and Communication Technology Division ESCWA UN-House P.O. Box 11-8575 Beirut LebanonEmail:EMail: farah14@un.org Ibaa Oueichek Syrian Telecom Establishment Damascus SyriaEmail:EMail: oueichek@scs-net.org Abdulaziz H. Al-Zoman, PhD SaudiNIC, General Directorate of Internet Services IT Sector, CITC King Abdulaziz City for Science and Technology PO Box 6086 Riyadh 11442 Saudi ArabiaEmail:EMail: azoman@citc.gov.saCopyright and License Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. All IETF Documents and the information contained therein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF Trust takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in any IETF Document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Copies of Intellectual Property disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement any standard or specification contained in an IETF Document. Please address the information to the IETF at ietf-ipr@ietf.org.