|
1 .TH PCRESTACK 3 |
|
2 .SH NAME |
|
3 PCRE - Perl-compatible regular expressions |
|
4 .SH "PCRE DISCUSSION OF STACK USAGE" |
|
5 .rs |
|
6 .sp |
|
7 When you call \fBpcre_exec()\fP, it makes use of an internal function called |
|
8 \fBmatch()\fP. This calls itself recursively at branch points in the pattern, |
|
9 in order to remember the state of the match so that it can back up and try a |
|
10 different alternative if the first one fails. As matching proceeds deeper and |
|
11 deeper into the tree of possibilities, the recursion depth increases. |
|
12 .P |
|
13 Not all calls of \fBmatch()\fP increase the recursion depth; for an item such |
|
14 as a* it may be called several times at the same level, after matching |
|
15 different numbers of a's. Furthermore, in a number of cases where the result of |
|
16 the recursive call would immediately be passed back as the result of the |
|
17 current call (a "tail recursion"), the function is just restarted instead. |
|
18 .P |
|
19 The \fBpcre_dfa_exec()\fP function operates in an entirely different way, and |
|
20 hardly uses recursion at all. The limit on its complexity is the amount of |
|
21 workspace it is given. The comments that follow do NOT apply to |
|
22 \fBpcre_dfa_exec()\fP; they are relevant only for \fBpcre_exec()\fP. |
|
23 .P |
|
24 You can set limits on the number of times that \fBmatch()\fP is called, both in |
|
25 total and recursively. If the limit is exceeded, an error occurs. For details, |
|
26 see the |
|
27 .\" HTML <a href="pcreapi.html#extradata"> |
|
28 .\" </a> |
|
29 section on extra data for \fBpcre_exec()\fP |
|
30 .\" |
|
31 in the |
|
32 .\" HREF |
|
33 \fBpcreapi\fP |
|
34 .\" |
|
35 documentation. |
|
36 .P |
|
37 Each time that \fBmatch()\fP is actually called recursively, it uses memory |
|
38 from the process stack. For certain kinds of pattern and data, very large |
|
39 amounts of stack may be needed, despite the recognition of "tail recursion". |
|
40 You can often reduce the amount of recursion, and therefore the amount of stack |
|
41 used, by modifying the pattern that is being matched. Consider, for example, |
|
42 this pattern: |
|
43 .sp |
|
44 ([^<]|<(?!inet))+ |
|
45 .sp |
|
46 It matches from wherever it starts until it encounters "<inet" or the end of |
|
47 the data, and is the kind of pattern that might be used when processing an XML |
|
48 file. Each iteration of the outer parentheses matches either one character that |
|
49 is not "<" or a "<" that is not followed by "inet". However, each time a |
|
50 parenthesis is processed, a recursion occurs, so this formulation uses a stack |
|
51 frame for each matched character. For a long string, a lot of stack is |
|
52 required. Consider now this rewritten pattern, which matches exactly the same |
|
53 strings: |
|
54 .sp |
|
55 ([^<]++|<(?!inet))+ |
|
56 .sp |
|
57 This uses very much less stack, because runs of characters that do not contain |
|
58 "<" are "swallowed" in one item inside the parentheses. Recursion happens only |
|
59 when a "<" character that is not followed by "inet" is encountered (and we |
|
60 assume this is relatively rare). A possessive quantifier is used to stop any |
|
61 backtracking into the runs of non-"<" characters, but that is not related to |
|
62 stack usage. |
|
63 .P |
|
64 This example shows that one way of avoiding stack problems when matching long |
|
65 subject strings is to write repeated parenthesized subpatterns to match more |
|
66 than one character whenever possible. |
|
67 . |
|
68 .SS "Compiling PCRE to use heap instead of stack" |
|
69 .rs |
|
70 .sp |
|
71 In environments where stack memory is constrained, you might want to compile |
|
72 PCRE to use heap memory instead of stack for remembering back-up points. This |
|
73 makes it run a lot more slowly, however. Details of how to do this are given in |
|
74 the |
|
75 .\" HREF |
|
76 \fBpcrebuild\fP |
|
77 .\" |
|
78 documentation. When built in this way, instead of using the stack, PCRE obtains |
|
79 and frees memory by calling the functions that are pointed to by the |
|
80 \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables. By default, these |
|
81 point to \fBmalloc()\fP and \fBfree()\fP, but you can replace the pointers to |
|
82 cause PCRE to use your own functions. Since the block sizes are always the |
|
83 same, and are always freed in reverse order, it may be possible to implement |
|
84 customized memory handlers that are more efficient than the standard functions. |
|
85 . |
|
86 .SS "Limiting PCRE's stack usage" |
|
87 .rs |
|
88 .sp |
|
89 PCRE has an internal counter that can be used to limit the depth of recursion, |
|
90 and thus cause \fBpcre_exec()\fP to give an error code before it runs out of |
|
91 stack. By default, the limit is very large, and unlikely ever to operate. It |
|
92 can be changed when PCRE is built, and it can also be set when |
|
93 \fBpcre_exec()\fP is called. For details of these interfaces, see the |
|
94 .\" HREF |
|
95 \fBpcrebuild\fP |
|
96 .\" |
|
97 and |
|
98 .\" HREF |
|
99 \fBpcreapi\fP |
|
100 .\" |
|
101 documentation. |
|
102 .P |
|
103 As a very rough rule of thumb, you should reckon on about 500 bytes per |
|
104 recursion. Thus, if you want to limit your stack usage to 8Mb, you |
|
105 should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can |
|
106 support around 128000 recursions. The \fBpcretest\fP test program has a command |
|
107 line option (\fB-S\fP) that can be used to increase the size of its stack. |
|
108 . |
|
109 .SS "Changing stack size in Unix-like systems" |
|
110 .rs |
|
111 .sp |
|
112 In Unix-like environments, there is not often a problem with the stack unless |
|
113 very long strings are involved, though the default limit on stack size varies |
|
114 from system to system. Values from 8Mb to 64Mb are common. You can find your |
|
115 default limit by running the command: |
|
116 .sp |
|
117 ulimit -s |
|
118 .sp |
|
119 Unfortunately, the effect of running out of stack is often SIGSEGV, though |
|
120 sometimes a more explicit error message is given. You can normally increase the |
|
121 limit on stack size by code such as this: |
|
122 .sp |
|
123 struct rlimit rlim; |
|
124 getrlimit(RLIMIT_STACK, &rlim); |
|
125 rlim.rlim_cur = 100*1024*1024; |
|
126 setrlimit(RLIMIT_STACK, &rlim); |
|
127 .sp |
|
128 This reads the current limits (soft and hard) using \fBgetrlimit()\fP, then |
|
129 attempts to increase the soft limit to 100Mb using \fBsetrlimit()\fP. You must |
|
130 do this before calling \fBpcre_exec()\fP. |
|
131 . |
|
132 .SS "Changing stack size in Mac OS X" |
|
133 .rs |
|
134 .sp |
|
135 Using \fBsetrlimit()\fP, as described above, should also work on Mac OS X. It |
|
136 is also possible to set a stack size when linking a program. There is a |
|
137 discussion about stack sizes in Mac OS X at this web site: |
|
138 .\" HTML <a href="http://developer.apple.com/qa/qa2005/qa1419.html"> |
|
139 .\" </a> |
|
140 http://developer.apple.com/qa/qa2005/qa1419.html. |
|
141 .\" |
|
142 . |
|
143 . |
|
144 .SH AUTHOR |
|
145 .rs |
|
146 .sp |
|
147 .nf |
|
148 Philip Hazel |
|
149 University Computing Service |
|
150 Cambridge CB2 3QH, England. |
|
151 .fi |
|
152 . |
|
153 . |
|
154 .SH REVISION |
|
155 .rs |
|
156 .sp |
|
157 .nf |
|
158 Last updated: 09 July 2008 |
|
159 Copyright (c) 1997-2008 University of Cambridge. |
|
160 .fi |