minimail - cut+paste perl code to parse/create mbox files and mail messages
minimail is a collection of functions that parse and produce mailbox
files and individual mail messages. It is not a module (although
uncommenting one line and calling it minimail.pm
would turn it into a
module). It is intended to be compact enough to cut and paste directly into
perl scripts that don't want to require non-standard perl modules. In other
words, it is intended to be yet another alternative to MIME-tools.
MIME-tools does things that this code doesn't (such as uuencode
decoding). And minimail does things that MIME-tools doesn't such as
reading and writing mailbox files correctly (repairing incorrectly formatted
ones along the way), automatically encoding and decoding mail headers and
MIME header parameters, and transparently unravelling winmail.dat
attachments (aka MS-TNEF). minimail is much smaller (about 3% of the
size of MIME-tools and the other modules it requires and about 20% of the
size of MIME-Lite) and so takes much less time during program startup.
formail(sub { <> }, sub { $mail = shift })
From_
lines are unquoted.
mail2str($mail)
From_
are quoted and the
string result will definitely be terminated with a blank line. This means
that mailbox files with blank lines missing between mail messages and with
unquoted From_
lines will be automatically repaired with the code below
(Incidentally, malformed nested multipart body parts are also repaired).
formail(sub { <> }, sub { print mail2str(shift) });
mail2multipart($mail)
mail2singlepart($mail)
mail2mbox($mail)
insert_header($mail, $header[, $language[, $charset]])
en
and iso-8859-1
, respectively.
append_header($mail, $header[, $language[, $charset]])
replace_header($mail, $header[, $language[, $charset]])
delete_header($mail, $header, $recurse)
insert_part($mail, $part, $index)
append_part($mail, $part)
replace_part($mail, $part, $index)
delete_part($mail, $index)
header($mail, $header)
=?charset?[qb]?...?=
), and the charset is us-ascii
or iso-8859-*
,
they are decoded. They are also unfolded. If this is not what you want, use
$mail->{header} or $mail->{headers} directly (or change the code).
headers($mail)
header_names($mail)
param($mail, $header, $param)
param1*0="a" param1*1="b" param2*="charset'lang'%63"
), it is decoded (if
us-ascii
or iso-8859-*
) and reassembled.
mimetype($mail, $parent)
octet/application
when the encoding is invalid.
encoding($mail)
filename($part)
body($mail)
message/rfc822
.
message($mail)
message/rfc822
. Must not be called on a multipart message or a mail
message whose mimetype is not message/rfc822
.
parts($mail[, $part])
newparam($name, $value[, $language[, $charset]]])
"; name=value"
which can
be used as part of the $header argument in functions like
append_header() and as part of any header value in the function
newmail(). If the value contains non-ascii characters, and the
$language and $charset parameters are not supplied, they default to
en
and iso-8859-1
, respectively.
newmail(...)
[A-Z]* - Arbitrary mail headers: e.g. From To Subject type - Content-Type: e.g. image/png charset - Content-Type's charset parameter: e.g. iso-8859-1 encoding - Content-Transfer-Encoding: e.g. base64 filename - Content-Disposition's filename parameter body - body of the message (don't use with parts or message) parts - array-ref of parts (don't use with body or message) message - body of message/rfc822 message (don't use with body or parts) mbox - Mbox From_ header
Supplying body implies text/plain
. Supplying parts implies
multipart/mixed
. Supplying message implies message/rfc822
. Default
disposition is inline
for text/*
and message/rfc822
or
attachment
for all other types. The default charset is iso-8859-1
when body contains non-ascii characters, us-ascii
otherwise. Default
encoding is determined from the type and nature of the mail message and
its data. You shouldn't have to supply encoding unless you want to create
messages with 8bit
encoding. If the mail message really is a mail
message, and not just a body part, Date
, MIME-Version
and
Message-ID
headers are automatically included if they have not been
supplied by the caller.
Less important parameters are:
disposition - Content-Disposition: i.e. inline or attachment created - Content-Disposition's creation-date parameter modified - Content-Disposition's modification-date parameter read - Content-Disposition's read-date parameter size - Content-Disposition's size parameter description - Content-Description language - Content-Language duration - Content-Duration location - Content-Location base - Content-Base features - Content-Features alternative - Content-Alternative id - Content-ID md5 - Content-MD5
Note: If you supply filename
but not body
(or message
or parts
),
and the filename refers to a readable file, then the following parameters
will be determined automatically: body
, modified
, read
, size
.
The rest of the less important parameters are just shortcuts for standard MIME headers. There is no support beyond that for any of them.
A mail message (or body part) is a hash containing some of the following entries:
mbox - mailbox From_ header warn - parser errors in the form: X-Warning: ... headers - arrayref of mail headers in order of appearance header - hashref by name of arrayrefs of mail headers body - text of singlepart mail message mime_type - mimetype of the mail message or body part mime_parts - arrayref of mail messages (body parts) mime_message - message of a message/rfc822 mail message mime_boundary - boundary for a multipart mail message mime_preamble - any text before the first multipart boundary mime_epilogue - any text after the last multipart boundary mime_prev_boundary - saved boundary of message after mail2singlepart mime_prev_preamble - saved preamble of message after mail2singlepart mime_prev_epilogue - saved epilogue of message after mail2singlepart
Note that body, mime_parts and mime_message are mutually exclusive and that mime_type only exists when mime_parts or mime_message exist.
Parsing example: Repair mailbox files
formail(sub { <> }, sub { print mail2str(shift) });
Building example: A mail message with attachments
print mail2str(newmail( To => 'you@there.com', From => 'me@here.com', Subject => 'test', parts => [ newmail(body => "hi\n"), newmail(body => $png, type => 'image/png', filename => 'hi.png'), newmail(message => newmail(qw(To to@you From from@me body hi"))) ]));
The header() and headers() functions automatically decode rfc2047
encoded headers when the character set us us-ascii
or any of the
iso-8859-*
character sets. This is an attempt to satisfy the following
requirement in rfc2047:
The program must be able to display the unencoded text if the character set is "US-ASCII". For the ISO-8859-* character sets, the mail reading program must at least be able to display the characters which are also in the ASCII set.
The problem is that rather than discarding iso-8859-*
characters that are
not also us-ascii
, minimail decodes and ``displays'' them. This is
arguably more useful but knowledge of the character set is lost and the
characters will be interpreted as being in your character set. No doubt,
this could be the wrong thing if your character set is very different from
that used by the originators of the mail messages being parsed.
If this is likely to cause you problems, don't use header() or
headers(). Use $mail->{headers}
instead which is a reference to an
array of raw encoded headers. Or delete or transform any high bit characters
in the results of these functions. Or change the code so that it doesn't
automatically decode character sets that you don't want it to.
rfc2822, rfc2045, rfc2046, rfc2047, rfc2231, rfc2183 (also rfc3282, rfc3066, rfc2424, rfc2557, rfc2110, rfc3297, rfc2912, rfc2533, rfc1864, rfc2387, rfc2912, rfc2533, rfc2387, rfc2076, rfc4012).
The mailbox format used is the mboxrd format described in
http://www.qmail.org/man/man5/mbox.html
.
20070803 raf <raf@raf.org>