|
1 |
|
2 :mod:`stringprep` --- Internet String Preparation |
|
3 ================================================= |
|
4 |
|
5 .. module:: stringprep |
|
6 :synopsis: String preparation, as per RFC 3453 |
|
7 :deprecated: |
|
8 .. moduleauthor:: Martin v. Löwis <martin@v.loewis.de> |
|
9 .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
|
10 |
|
11 |
|
12 .. versionadded:: 2.3 |
|
13 |
|
14 When identifying things (such as host names) in the internet, it is often |
|
15 necessary to compare such identifications for "equality". Exactly how this |
|
16 comparison is executed may depend on the application domain, e.g. whether it |
|
17 should be case-insensitive or not. It may be also necessary to restrict the |
|
18 possible identifications, to allow only identifications consisting of |
|
19 "printable" characters. |
|
20 |
|
21 :rfc:`3454` defines a procedure for "preparing" Unicode strings in internet |
|
22 protocols. Before passing strings onto the wire, they are processed with the |
|
23 preparation procedure, after which they have a certain normalized form. The RFC |
|
24 defines a set of tables, which can be combined into profiles. Each profile must |
|
25 define which tables it uses, and what other optional parts of the ``stringprep`` |
|
26 procedure are part of the profile. One example of a ``stringprep`` profile is |
|
27 ``nameprep``, which is used for internationalized domain names. |
|
28 |
|
29 The module :mod:`stringprep` only exposes the tables from RFC 3454. As these |
|
30 tables would be very large to represent them as dictionaries or lists, the |
|
31 module uses the Unicode character database internally. The module source code |
|
32 itself was generated using the ``mkstringprep.py`` utility. |
|
33 |
|
34 As a result, these tables are exposed as functions, not as data structures. |
|
35 There are two kinds of tables in the RFC: sets and mappings. For a set, |
|
36 :mod:`stringprep` provides the "characteristic function", i.e. a function that |
|
37 returns true if the parameter is part of the set. For mappings, it provides the |
|
38 mapping function: given the key, it returns the associated value. Below is a |
|
39 list of all functions available in the module. |
|
40 |
|
41 |
|
42 .. function:: in_table_a1(code) |
|
43 |
|
44 Determine whether *code* is in tableA.1 (Unassigned code points in Unicode 3.2). |
|
45 |
|
46 |
|
47 .. function:: in_table_b1(code) |
|
48 |
|
49 Determine whether *code* is in tableB.1 (Commonly mapped to nothing). |
|
50 |
|
51 |
|
52 .. function:: map_table_b2(code) |
|
53 |
|
54 Return the mapped value for *code* according to tableB.2 (Mapping for |
|
55 case-folding used with NFKC). |
|
56 |
|
57 |
|
58 .. function:: map_table_b3(code) |
|
59 |
|
60 Return the mapped value for *code* according to tableB.3 (Mapping for |
|
61 case-folding used with no normalization). |
|
62 |
|
63 |
|
64 .. function:: in_table_c11(code) |
|
65 |
|
66 Determine whether *code* is in tableC.1.1 (ASCII space characters). |
|
67 |
|
68 |
|
69 .. function:: in_table_c12(code) |
|
70 |
|
71 Determine whether *code* is in tableC.1.2 (Non-ASCII space characters). |
|
72 |
|
73 |
|
74 .. function:: in_table_c11_c12(code) |
|
75 |
|
76 Determine whether *code* is in tableC.1 (Space characters, union of C.1.1 and |
|
77 C.1.2). |
|
78 |
|
79 |
|
80 .. function:: in_table_c21(code) |
|
81 |
|
82 Determine whether *code* is in tableC.2.1 (ASCII control characters). |
|
83 |
|
84 |
|
85 .. function:: in_table_c22(code) |
|
86 |
|
87 Determine whether *code* is in tableC.2.2 (Non-ASCII control characters). |
|
88 |
|
89 |
|
90 .. function:: in_table_c21_c22(code) |
|
91 |
|
92 Determine whether *code* is in tableC.2 (Control characters, union of C.2.1 and |
|
93 C.2.2). |
|
94 |
|
95 |
|
96 .. function:: in_table_c3(code) |
|
97 |
|
98 Determine whether *code* is in tableC.3 (Private use). |
|
99 |
|
100 |
|
101 .. function:: in_table_c4(code) |
|
102 |
|
103 Determine whether *code* is in tableC.4 (Non-character code points). |
|
104 |
|
105 |
|
106 .. function:: in_table_c5(code) |
|
107 |
|
108 Determine whether *code* is in tableC.5 (Surrogate codes). |
|
109 |
|
110 |
|
111 .. function:: in_table_c6(code) |
|
112 |
|
113 Determine whether *code* is in tableC.6 (Inappropriate for plain text). |
|
114 |
|
115 |
|
116 .. function:: in_table_c7(code) |
|
117 |
|
118 Determine whether *code* is in tableC.7 (Inappropriate for canonical |
|
119 representation). |
|
120 |
|
121 |
|
122 .. function:: in_table_c8(code) |
|
123 |
|
124 Determine whether *code* is in tableC.8 (Change display properties or are |
|
125 deprecated). |
|
126 |
|
127 |
|
128 .. function:: in_table_c9(code) |
|
129 |
|
130 Determine whether *code* is in tableC.9 (Tagging characters). |
|
131 |
|
132 |
|
133 .. function:: in_table_d1(code) |
|
134 |
|
135 Determine whether *code* is in tableD.1 (Characters with bidirectional property |
|
136 "R" or "AL"). |
|
137 |
|
138 |
|
139 .. function:: in_table_d2(code) |
|
140 |
|
141 Determine whether *code* is in tableD.2 (Characters with bidirectional property |
|
142 "L"). |
|
143 |