minimail - cut+paste perl code to parse/create mbox files and mail messages


minimail is a collection of functions that parse and produce mailbox files and individual mail messages. It is not a module (although uncommenting one line and calling it would turn it into a module). It is intended to be compact enough to cut and paste directly into perl scripts that don't want to require non-standard perl modules. In other words, it is intended to be yet another alternative to MIME-tools. MIME-tools does things that this code doesn't (such as uuencode decoding). And minimail does things that MIME-tools doesn't such as reading and writing mailbox files correctly (repairing incorrectly formatted ones along the way), automatically encoding and decoding mail headers and MIME header parameters, and transparently unravelling winmail.dat attachments (aka MS-TNEF). minimail is much smaller (about 3% of the size of MIME-tools and the other modules it requires and about 20% of the size of MIME-Lite) and so takes much less time during program startup.


formail(sub { <> }, sub { $mail = shift })
Parses a mailbox or a mail message. Calls the first argument to retrieve input lines and calls the second function with every mail message found. Terminates when the first argument returns undef or when the second function returns false. Quoted From_ lines are unquoted.
Returns a string version of a mail message. If the mail message includes a mailbox header, lines in the body starting with From_ are quoted and the string result will definitely be terminated with a blank line. This means that mailbox files with blank lines missing between mail messages and with unquoted From_ lines will be automatically repaired with the code below (Incidentally, malformed nested multipart body parts are also repaired).
 formail(sub { <> }, sub { print mail2str(shift) });
Converts a singlepart mail message into a multipart mail message with a single body part (i.e. the body of the original mail message). Returns the mail message. Does nothing to mail messages that are already multipart mail messages.
Converts a multipart mail message with a single body part into a singlepart mail message whose body is the original body part. Returns the mail message. Does nothing to mail messages that are already singlepart mail messages or multipart mail messages with multiple parts. Acts recursively.
Converts a mail message into an mailbox item. Does nothing to mail messages that are already mailbox items. This affects the result of mail2str().
insert_header($mail, $header[, $language[, $charset]])
Inserts a new mail header before any existing mail headers. If the header contains non-ascii characters, it will be encoded in accordance with RFC2047. If the $language and $charset parameters are not supplied, they default to en and iso-8859-1, respectively.
append_header($mail, $header[, $language[, $charset]])
Appends a new mail header after any existing mail headers.
replace_header($mail, $header[, $language[, $charset]])
Replaces all instances of a mail header with a new mail header.
delete_header($mail, $header, $recurse)
Deletes all headers that match the $header pattern. If the $recurse parameter is provided and non-zero, matching headers in internal body parts will also be deleted.
insert_part($mail, $part, $index)
Inserts the given body part at the given index. The $part parameter must have been produced by formail() or newmail(). The $mail parameter must already be a multipart mail message.
append_part($mail, $part)
Appends the given body part.
replace_part($mail, $part, $index)
Replaces the body part at the given index with the given body part.
delete_part($mail, $index)
Deletes the body part at the given index.
header($mail, $header)
Returns a list of values of headers with the given name. RFC2822 comments are removed. If any of the values contain RFC2047 encoded words (i.e. =?charset?[qb]?...?=), and the charset is us-ascii or iso-8859-*, they are decoded. They are also unfolded. If this is not what you want, use $mail->{header} or $mail->{headers} directly (or change the code).
Returns a list of all complete headers with decoding and unfolding performed as with header().
Returns a list of the names of headers present in the given mail message.
param($mail, $header, $param)
Returns the value of the given parameter of the given MIME header of the given mail message. header() is used for RFC2047 decoding. If the parameter has been split or encoded in accordance with RFC2231 (i.e. param1*0="a" param1*1="b" param2*="charset'lang'%63"), it is decoded (if us-ascii or iso-8859-*) and reassembled.
mimetype($mail, $parent)
Returns the declared or default mimetype of the given mail message or body part. Returns octet/application when the encoding is invalid.
Returns the declared or implied encoding of the given mail message or body part.
Returns the RFC2183 filename of the given body part. Uses param() to perform any decoding that might be necessary. Also removes any directory component of the filename and replaces any unfriendly characters with dash characters.
Returns the decoded body of the given mail message or body part. Must not be called on a multipart mail message or a mail message whose mimetype is message/rfc822.
Returns the message inside the given mail message whose mimetype is message/rfc822. Must not be called on a multipart message or a mail message whose mimetype is not message/rfc822.
parts($mail[, $part])
When no $part parameter is given, returns a reference to an array of body parts in the given multipart message. When the $parts parameter is given, it is a reference to an array of body parts, and it will replace the existing body parts. Must not be called on a singlepart mail message.
newparam($name, $value[, $language[, $charset]]])
Creates a MIME header parameter, possibly split and encoded in accordance with RFC2231. Returns a string that looks like "; name=value" which can be used as part of the $header argument in functions like append_header() and as part of any header value in the function newmail(). If the value contains non-ascii characters, and the $language and $charset parameters are not supplied, they default to en and iso-8859-1, respectively.
Creates a new mail message based on the given arguments (which take the form of a hash). It is not necessary to supply all information. Anything that needs to be added will be added automatically. The important parameters are:
 [A-Z]*      - Arbitrary mail headers: e.g. From To Subject
 type        - Content-Type: e.g. image/png
 charset     - Content-Type's charset parameter: e.g. iso-8859-1
 encoding    - Content-Transfer-Encoding: e.g. base64
 filename    - Content-Disposition's filename parameter
 body        - body of the message (don't use with parts or message)
 parts       - array-ref of parts (don't use with body or message)
 message     - body of message/rfc822 message (don't use with body or parts)
 mbox        - Mbox From_ header

Supplying body implies text/plain. Supplying parts implies multipart/mixed. Supplying message implies message/rfc822. Default disposition is inline for text/* and message/rfc822 or attachment for all other types. The default charset is iso-8859-1 when body contains non-ascii characters, us-ascii otherwise. Default encoding is determined from the type and nature of the mail message and its data. You shouldn't have to supply encoding unless you want to create messages with 8bit encoding. If the mail message really is a mail message, and not just a body part, Date, MIME-Version and Message-ID headers are automatically included if they have not been supplied by the caller.

Less important parameters are:

 disposition - Content-Disposition: i.e. inline or attachment
 created     - Content-Disposition's creation-date parameter
 modified    - Content-Disposition's modification-date parameter
 read        - Content-Disposition's read-date parameter
 size        - Content-Disposition's size parameter
 description - Content-Description
 language    - Content-Language
 duration    - Content-Duration
 location    - Content-Location
 base        - Content-Base
 features    - Content-Features
 alternative - Content-Alternative
 id          - Content-ID
 md5         - Content-MD5

Note: If you supply filename but not body (or message or parts), and the filename refers to a readable file, then the following parameters will be determined automatically: body, modified, read, size.

The rest of the less important parameters are just shortcuts for standard MIME headers. There is no support beyond that for any of them.


A mail message (or body part) is a hash containing some of the following entries:

 mbox          - mailbox From_ header
 warn          - parser errors in the form: X-Warning: ...
 headers       - arrayref of mail headers in order of appearance
 header        - hashref by name of arrayrefs of mail headers
 body          - text of singlepart mail message
 mime_type     - mimetype of the mail message or body part
 mime_parts    - arrayref of mail messages (body parts)
 mime_message  - message of a message/rfc822 mail message
 mime_boundary - boundary for a multipart mail message
 mime_preamble - any text before the first multipart boundary
 mime_epilogue - any text after the last multipart boundary
 mime_prev_boundary - saved boundary of message after mail2singlepart
 mime_prev_preamble - saved preamble of message after mail2singlepart
 mime_prev_epilogue - saved epilogue of message after mail2singlepart

Note that body, mime_parts and mime_message are mutually exclusive and that mime_type only exists when mime_parts or mime_message exist.


Parsing example: Repair mailbox files

 formail(sub { <> }, sub { print mail2str(shift) });

Building example: A mail message with attachments

 print mail2str(newmail(
  To => '', From => '', Subject => 'test',
  parts => [
        newmail(body => "hi\n"),
        newmail(body => $png, type => 'image/png', filename => 'hi.png'),
        newmail(message => newmail(qw(To to@you From from@me body hi")))


The header() and headers() functions automatically decode rfc2047 encoded headers when the character set us us-ascii or any of the iso-8859-* character sets. This is an attempt to satisfy the following requirement in rfc2047:

 The program must be able to display the unencoded text if the
 character set is "US-ASCII".  For the ISO-8859-* character sets,
 the mail reading program must at least be able to display the
 characters which are also in the ASCII set.

The problem is that rather than discarding iso-8859-* characters that are not also us-ascii, minimail decodes and ``displays'' them. This is arguably more useful but knowledge of the character set is lost and the characters will be interpreted as being in your character set. No doubt, this could be the wrong thing if your character set is very different from that used by the originators of the mail messages being parsed.

If this is likely to cause you problems, don't use header() or headers(). Use $mail->{headers} instead which is a reference to an array of raw encoded headers. Or delete or transform any high bit characters in the results of these functions. Or change the code so that it doesn't automatically decode character sets that you don't want it to.


rfc2822, rfc2045, rfc2046, rfc2047, rfc2231, rfc2183 (also rfc3282, rfc3066, rfc2424, rfc2557, rfc2110, rfc3297, rfc2912, rfc2533, rfc1864, rfc2387, rfc2912, rfc2533, rfc2387, rfc2076, rfc4012).

The mailbox format used is the mboxrd format described in


20070803 raf <>