|
1 |
|
2 :mod:`struct` --- Interpret strings as packed binary data |
|
3 ========================================================= |
|
4 |
|
5 .. module:: struct |
|
6 :synopsis: Interpret strings as packed binary data. |
|
7 |
|
8 .. index:: |
|
9 pair: C; structures |
|
10 triple: packing; binary; data |
|
11 |
|
12 This module performs conversions between Python values and C structs represented |
|
13 as Python strings. It uses :dfn:`format strings` (explained below) as compact |
|
14 descriptions of the lay-out of the C structs and the intended conversion to/from |
|
15 Python values. This can be used in handling binary data stored in files or from |
|
16 network connections, among other sources. |
|
17 |
|
18 The module defines the following exception and functions: |
|
19 |
|
20 |
|
21 .. exception:: error |
|
22 |
|
23 Exception raised on various occasions; argument is a string describing what is |
|
24 wrong. |
|
25 |
|
26 |
|
27 .. function:: pack(fmt, v1, v2, ...) |
|
28 |
|
29 Return a string containing the values ``v1, v2, ...`` packed according to the |
|
30 given format. The arguments must match the values required by the format |
|
31 exactly. |
|
32 |
|
33 |
|
34 .. function:: pack_into(fmt, buffer, offset, v1, v2, ...) |
|
35 |
|
36 Pack the values ``v1, v2, ...`` according to the given format, write the packed |
|
37 bytes into the writable *buffer* starting at *offset*. Note that the offset is |
|
38 a required argument. |
|
39 |
|
40 .. versionadded:: 2.5 |
|
41 |
|
42 |
|
43 .. function:: unpack(fmt, string) |
|
44 |
|
45 Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the |
|
46 given format. The result is a tuple even if it contains exactly one item. The |
|
47 string must contain exactly the amount of data required by the format |
|
48 (``len(string)`` must equal ``calcsize(fmt)``). |
|
49 |
|
50 |
|
51 .. function:: unpack_from(fmt, buffer[,offset=0]) |
|
52 |
|
53 Unpack the *buffer* according to tthe given format. The result is a tuple even |
|
54 if it contains exactly one item. The *buffer* must contain at least the amount |
|
55 of data required by the format (``len(buffer[offset:])`` must be at least |
|
56 ``calcsize(fmt)``). |
|
57 |
|
58 .. versionadded:: 2.5 |
|
59 |
|
60 |
|
61 .. function:: calcsize(fmt) |
|
62 |
|
63 Return the size of the struct (and hence of the string) corresponding to the |
|
64 given format. |
|
65 |
|
66 Format characters have the following meaning; the conversion between C and |
|
67 Python values should be obvious given their types: |
|
68 |
|
69 +--------+-------------------------+--------------------+-------+ |
|
70 | Format | C Type | Python | Notes | |
|
71 +========+=========================+====================+=======+ |
|
72 | ``x`` | pad byte | no value | | |
|
73 +--------+-------------------------+--------------------+-------+ |
|
74 | ``c`` | :ctype:`char` | string of length 1 | | |
|
75 +--------+-------------------------+--------------------+-------+ |
|
76 | ``b`` | :ctype:`signed char` | integer | | |
|
77 +--------+-------------------------+--------------------+-------+ |
|
78 | ``B`` | :ctype:`unsigned char` | integer | | |
|
79 +--------+-------------------------+--------------------+-------+ |
|
80 | ``?`` | :ctype:`_Bool` | bool | \(1) | |
|
81 +--------+-------------------------+--------------------+-------+ |
|
82 | ``h`` | :ctype:`short` | integer | | |
|
83 +--------+-------------------------+--------------------+-------+ |
|
84 | ``H`` | :ctype:`unsigned short` | integer | | |
|
85 +--------+-------------------------+--------------------+-------+ |
|
86 | ``i`` | :ctype:`int` | integer | | |
|
87 +--------+-------------------------+--------------------+-------+ |
|
88 | ``I`` | :ctype:`unsigned int` | integer or long | | |
|
89 +--------+-------------------------+--------------------+-------+ |
|
90 | ``l`` | :ctype:`long` | integer | | |
|
91 +--------+-------------------------+--------------------+-------+ |
|
92 | ``L`` | :ctype:`unsigned long` | long | | |
|
93 +--------+-------------------------+--------------------+-------+ |
|
94 | ``q`` | :ctype:`long long` | long | \(2) | |
|
95 +--------+-------------------------+--------------------+-------+ |
|
96 | ``Q`` | :ctype:`unsigned long | long | \(2) | |
|
97 | | long` | | | |
|
98 +--------+-------------------------+--------------------+-------+ |
|
99 | ``f`` | :ctype:`float` | float | | |
|
100 +--------+-------------------------+--------------------+-------+ |
|
101 | ``d`` | :ctype:`double` | float | | |
|
102 +--------+-------------------------+--------------------+-------+ |
|
103 | ``s`` | :ctype:`char[]` | string | | |
|
104 +--------+-------------------------+--------------------+-------+ |
|
105 | ``p`` | :ctype:`char[]` | string | | |
|
106 +--------+-------------------------+--------------------+-------+ |
|
107 | ``P`` | :ctype:`void \*` | long | | |
|
108 +--------+-------------------------+--------------------+-------+ |
|
109 |
|
110 Notes: |
|
111 |
|
112 (1) |
|
113 The ``'?'`` conversion code corresponds to the :ctype:`_Bool` type defined by |
|
114 C99. If this type is not available, it is simulated using a :ctype:`char`. In |
|
115 standard mode, it is always represented by one byte. |
|
116 |
|
117 .. versionadded:: 2.6 |
|
118 |
|
119 (2) |
|
120 The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if |
|
121 the platform C compiler supports C :ctype:`long long`, or, on Windows, |
|
122 :ctype:`__int64`. They are always available in standard modes. |
|
123 |
|
124 .. versionadded:: 2.2 |
|
125 |
|
126 A format character may be preceded by an integral repeat count. For example, |
|
127 the format string ``'4h'`` means exactly the same as ``'hhhh'``. |
|
128 |
|
129 Whitespace characters between formats are ignored; a count and its format must |
|
130 not contain whitespace though. |
|
131 |
|
132 For the ``'s'`` format character, the count is interpreted as the size of the |
|
133 string, not a repeat count like for the other format characters; for example, |
|
134 ``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters. |
|
135 For packing, the string is truncated or padded with null bytes as appropriate to |
|
136 make it fit. For unpacking, the resulting string always has exactly the |
|
137 specified number of bytes. As a special case, ``'0s'`` means a single, empty |
|
138 string (while ``'0c'`` means 0 characters). |
|
139 |
|
140 The ``'p'`` format character encodes a "Pascal string", meaning a short |
|
141 variable-length string stored in a fixed number of bytes. The count is the total |
|
142 number of bytes stored. The first byte stored is the length of the string, or |
|
143 255, whichever is smaller. The bytes of the string follow. If the string |
|
144 passed in to :func:`pack` is too long (longer than the count minus 1), only the |
|
145 leading count-1 bytes of the string are stored. If the string is shorter than |
|
146 count-1, it is padded with null bytes so that exactly count bytes in all are |
|
147 used. Note that for :func:`unpack`, the ``'p'`` format character consumes count |
|
148 bytes, but that the string returned can never contain more than 255 characters. |
|
149 |
|
150 For the ``'I'``, ``'L'``, ``'q'`` and ``'Q'`` format characters, the return |
|
151 value is a Python long integer. |
|
152 |
|
153 For the ``'P'`` format character, the return value is a Python integer or long |
|
154 integer, depending on the size needed to hold a pointer when it has been cast to |
|
155 an integer type. A *NULL* pointer will always be returned as the Python integer |
|
156 ``0``. When packing pointer-sized values, Python integer or long integer objects |
|
157 may be used. For example, the Alpha and Merced processors use 64-bit pointer |
|
158 values, meaning a Python long integer will be used to hold the pointer; other |
|
159 platforms use 32-bit pointers and will use a Python integer. |
|
160 |
|
161 For the ``'?'`` format character, the return value is either :const:`True` or |
|
162 :const:`False`. When packing, the truth value of the argument object is used. |
|
163 Either 0 or 1 in the native or standard bool representation will be packed, and |
|
164 any non-zero value will be True when unpacking. |
|
165 |
|
166 By default, C numbers are represented in the machine's native format and byte |
|
167 order, and properly aligned by skipping pad bytes if necessary (according to the |
|
168 rules used by the C compiler). |
|
169 |
|
170 Alternatively, the first character of the format string can be used to indicate |
|
171 the byte order, size and alignment of the packed data, according to the |
|
172 following table: |
|
173 |
|
174 +-----------+------------------------+--------------------+ |
|
175 | Character | Byte order | Size and alignment | |
|
176 +===========+========================+====================+ |
|
177 | ``@`` | native | native | |
|
178 +-----------+------------------------+--------------------+ |
|
179 | ``=`` | native | standard | |
|
180 +-----------+------------------------+--------------------+ |
|
181 | ``<`` | little-endian | standard | |
|
182 +-----------+------------------------+--------------------+ |
|
183 | ``>`` | big-endian | standard | |
|
184 +-----------+------------------------+--------------------+ |
|
185 | ``!`` | network (= big-endian) | standard | |
|
186 +-----------+------------------------+--------------------+ |
|
187 |
|
188 If the first character is not one of these, ``'@'`` is assumed. |
|
189 |
|
190 Native byte order is big-endian or little-endian, depending on the host system. |
|
191 For example, Motorola and Sun processors are big-endian; Intel and DEC |
|
192 processors are little-endian. |
|
193 |
|
194 Native size and alignment are determined using the C compiler's |
|
195 ``sizeof`` expression. This is always combined with native byte order. |
|
196 |
|
197 Standard size and alignment are as follows: no alignment is required for any |
|
198 type (so you have to use pad bytes); :ctype:`short` is 2 bytes; :ctype:`int` and |
|
199 :ctype:`long` are 4 bytes; :ctype:`long long` (:ctype:`__int64` on Windows) is 8 |
|
200 bytes; :ctype:`float` and :ctype:`double` are 32-bit and 64-bit IEEE floating |
|
201 point numbers, respectively. :ctype:`_Bool` is 1 byte. |
|
202 |
|
203 Note the difference between ``'@'`` and ``'='``: both use native byte order, but |
|
204 the size and alignment of the latter is standardized. |
|
205 |
|
206 The form ``'!'`` is available for those poor souls who claim they can't remember |
|
207 whether network byte order is big-endian or little-endian. |
|
208 |
|
209 There is no way to indicate non-native byte order (force byte-swapping); use the |
|
210 appropriate choice of ``'<'`` or ``'>'``. |
|
211 |
|
212 The ``'P'`` format character is only available for the native byte ordering |
|
213 (selected as the default or with the ``'@'`` byte order character). The byte |
|
214 order character ``'='`` chooses to use little- or big-endian ordering based on |
|
215 the host system. The struct module does not interpret this as native ordering, |
|
216 so the ``'P'`` format is not available. |
|
217 |
|
218 Examples (all using native byte order, size and alignment, on a big-endian |
|
219 machine):: |
|
220 |
|
221 >>> from struct import * |
|
222 >>> pack('hhl', 1, 2, 3) |
|
223 '\x00\x01\x00\x02\x00\x00\x00\x03' |
|
224 >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03') |
|
225 (1, 2, 3) |
|
226 >>> calcsize('hhl') |
|
227 8 |
|
228 |
|
229 Hint: to align the end of a structure to the alignment requirement of a |
|
230 particular type, end the format with the code for that type with a repeat count |
|
231 of zero. For example, the format ``'llh0l'`` specifies two pad bytes at the |
|
232 end, assuming longs are aligned on 4-byte boundaries. This only works when |
|
233 native size and alignment are in effect; standard size and alignment does not |
|
234 enforce any alignment. |
|
235 |
|
236 Unpacked fields can be named by assigning them to variables or by wrapping |
|
237 the result in a named tuple:: |
|
238 |
|
239 >>> record = 'raymond \x32\x12\x08\x01\x08' |
|
240 >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record) |
|
241 |
|
242 >>> from collections import namedtuple |
|
243 >>> Student = namedtuple('Student', 'name serialnum school gradelevel') |
|
244 >>> Student._make(unpack('<10sHHb', s)) |
|
245 Student(name='raymond ', serialnum=4658, school=264, gradelevel=8) |
|
246 |
|
247 .. seealso:: |
|
248 |
|
249 Module :mod:`array` |
|
250 Packed binary storage of homogeneous data. |
|
251 |
|
252 Module :mod:`xdrlib` |
|
253 Packing and unpacking of XDR data. |
|
254 |
|
255 |
|
256 .. _struct-objects: |
|
257 |
|
258 Struct Objects |
|
259 -------------- |
|
260 |
|
261 The :mod:`struct` module also defines the following type: |
|
262 |
|
263 |
|
264 .. class:: Struct(format) |
|
265 |
|
266 Return a new Struct object which writes and reads binary data according to the |
|
267 format string *format*. Creating a Struct object once and calling its methods |
|
268 is more efficient than calling the :mod:`struct` functions with the same format |
|
269 since the format string only needs to be compiled once. |
|
270 |
|
271 .. versionadded:: 2.5 |
|
272 |
|
273 Compiled Struct objects support the following methods and attributes: |
|
274 |
|
275 |
|
276 .. method:: pack(v1, v2, ...) |
|
277 |
|
278 Identical to the :func:`pack` function, using the compiled format. |
|
279 (``len(result)`` will equal :attr:`self.size`.) |
|
280 |
|
281 |
|
282 .. method:: pack_into(buffer, offset, v1, v2, ...) |
|
283 |
|
284 Identical to the :func:`pack_into` function, using the compiled format. |
|
285 |
|
286 |
|
287 .. method:: unpack(string) |
|
288 |
|
289 Identical to the :func:`unpack` function, using the compiled format. |
|
290 (``len(string)`` must equal :attr:`self.size`). |
|
291 |
|
292 |
|
293 .. method:: unpack_from(buffer[, offset=0]) |
|
294 |
|
295 Identical to the :func:`unpack_from` function, using the compiled format. |
|
296 (``len(buffer[offset:])`` must be at least :attr:`self.size`). |
|
297 |
|
298 |
|
299 .. attribute:: format |
|
300 |
|
301 The format string used to construct this Struct object. |
|
302 |
|
303 .. attribute:: size |
|
304 |
|
305 The calculated size of the struct (and hence of the string) corresponding |
|
306 to :attr:`format`. |
|
307 |