photo
Jordan Sissel
geek

Wed, 04 Jun 2008

PCRE, and how to not write an API.

From the pcreapi(3) manpage:
The first two-thirds of the vector is used  to  pass  back  captured  sub-
strings,  each  substring using a pair of integers. The remaining third of
the vector is used as workspace by pcre_exec()  while  matching  capturing
subpatterns, and is not available for passing back information. The length
passed in ovecsize should always be a multiple of three. If it is not,  it
is rounded down.
The 'vector' in question is used by pcre to store offset information for captured groups. It's a good and simple way to figure out where each capture starts and ends.

What doesn't make sense is the portion I put in bold. Why wouldn't pcre_exec simply allocate that scratch space itself? This does not make sense to me. In the mean time, I'm left wondering why I am allocating parts of an array I am told are unusable. I hope there's a good reason. Perhaps some unknown efficiency is gained from doing it this way.

Comments: 2 (view comments)
Tags: ,
Permalink: /geekery/pcre-wtf
posted at: 02:11


2 responses to 'PCRE, and how to not write an API.'

Justin Mason posted at Wed Jun 4 06:00:19 2008...
Possibly, it's to allow you to perform the memory allocation upfront, so as to avoid the overhead of the internal implementation calling malloc().  I've seen that before.

In this case though, I doubt it -- I would guess they're just recursing and reusing that vector as-is, to store the opaque "capturing subpatterns" data.  Definite bad code smell off that, if so.

Jordan Sissel posted at Wed Jun 4 13:03:12 2008...
Yeah, I thought it must be to avoid additional malloc() calls, but even then it doesn't make total sense.

Is it really much more efficient to do it this way, than, say, keeping your 'magical' vector inside the pcre* struct and realloc()'ing it if num_captures > length_of_magical_vector any time it happens?

It's probably not worth speculating more, since it just makes my brain hurt ;)


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional, if you want me to be able to email you back)
URL (also optional)
Comment:


Search this site

Navigation

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< June 2008 >
SuMoTuWeThFrSa
1 2 3 4 5 6 7
8 91011121314
15161718192021
22232425262728
2930     

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati