- Home
- /
- 01 – Reguläre Ausdrücke
- /
- 01.02 – Regular Expressions...
- /
- 01.02.06 How to use...
Finally, we can try to use our new language in TB. First of all we have to know which tools are available to work with regular expressions. These tools are TB’s macros.
Macros
Not
all of TB’s macros support the use of regex. Most of the macros have
nothing to do with regex, but you can use regex on them to extract or
modify the information.
And that is one feature of TB that makes it so powerful.
The first macro we will look at is: %REGEXPTEXT=“regex“
What does it do? It searches for the pattern „regex“ within the
original text of a mail and returns the matched characters. The syntax
is quite straightforward, look at the following example:
%REGEXPTEXT=“[\d\.]+“
This macro used in a quick template and applied to a mail returns digits and dots.
Let’s have a look at a fairly similar macro: %REGEXPQUOTES=“regex“ This macro does exactly the same as the first one except that the returned text is not plain text but quoted text.
That was nice and easy. But when it comes to the extraction of text from
the header of a mail (kludges) or address book entries we need to
combine some macros:
The first one we will need for that is %SETPATTREGEXP.
It is used to define the search pattern in the way
%SETPATTREGEXP=“regex“. „regex“ is the regular expression you created to
match the text.
The second one is %REGEXPMATCH. Again, this is easily
defined: %REGEXPMATCH=“string“ with „string“ being any text. It can be a
template, which means that any generic text can be used, so almost any
TB macro can be used to provide the text here.
The definition of a regex through %SETPATTREGEXP is valid unless it is
overwritten by a second appearance of a %SETPATTREGEXP. This means you
can use the same pattern on several different generic texts in one go.
Before we have a look at another example I have to correct something.
Did I say the syntax is quite easy earlier in this chapter? Well, that’s
true as long as one only looks at one macro. But let’s see how this
changes when we let the macro parse some text:
We already know the macro %REGEXPQUOTES. This could be written in a
different way. Let’s assume that we receive Mails from a feedback form.
Part of the content is „newsletter: yes“ or „newsletter: no“. We would
like to create an autoresponder that uses exactly this information in a
reply template, for example:
„Thank you for filling out our feedback form. You entered ’newsletter: yes/no‘. Are you sure?“
You can create more sophisticated text and a better filter to use
different templates for the reply, but for the moment let’s stick to
this example;-).
The macro %QUOTES defines what text is to be used as quoted text in a
reply. The only problem is that we have to tell %QUOTES which text
should be used. After that we can copy it to the reply template, add our
standard text and save it.
Ok, first the regex: „^newsletter:\s*(yes|no)“. This has to be defined by %SETPATTREGEXP=“^newsletter:\s*(yes|no)“.
We already know that %REGEXPMATCH applies the search pattern on any
generic text, so we need a macro that provides the original text of the
mail and that is %TEXT. Now we have to put it all together and create a
template that uses the macros in the correct order.
The only thing that makes it difficult to use these macros are the
„-characters which are used as delimiters for the definition part. In
%SETPATTREGEXP the search pattern is defined between these and in
%QUOTES the text that will be inserted as quoted is defined. Once you
start to combine the macros you have to tell TB which „-character is
delimiter of which macro: the first macro must know whether the second
„-character is the end of the macro or the beginning of the second
macro. The same applies at the end of the second macro and so on. This
can be achieved by doubling the „-character (escaping) or using
different delimiters.
Simply, this looks like:
%M1=“%M2=““Def2″“%M3=““Def3″““. This is getting a bit confusing and hard to follow, so we could instead say:
%M1=“%M2=’Def2’%M3=’Def3′“. The example above would look like:
%QUOTES=“%SETPATTREGEXP=’^newsletter:\s*(yes|no)’%REGEXPMATCH=’%TEXT'“
This example could be written in a simpler way:
%REGEXPQUOTES=“^newsletter:\s*(yes|no)“, but this is because we extracted text out of the original text with %TEXT.
Next comes a macro combination that allows the extraction of several
parts of the text. We know that we could define subpatterns in the regex
by grouping sections with parentheses. We must now find a way to
address them within TB.
TB provides a macro for this %REGEXPBLINDMATCH=“string“. But this does
not return anything useful. Of course, we wanted to extract parts of the
text not the whole text itself. So we still need a macro that allows us
to tell the macro which of the subpatterns are to be used. And this is
%SUBPATT=“n“. ’n‘ denotes the n-th subpattern in the regex.
Now this combination will be quite difficult to read and understand. So I
will explain it using an example and will generate the whole macro
combination bit by bit. After that I will combine everything.
From the original date of a mail we want to extract the year, two digits
only, and use it as quoted text. The date is provided by %ODATE. The
regex is „\d{2}(\d{2})\b“. That means we want to extract only two digits
if they are preceded by two digits and followed a word boundary. Thus
the first macro is: %SETPATTREGEXP=“\d{2}(\d{2})\b“.
The text that is used to find the date is defined using the macro
%REGEXPBLINDMATCH=“%ODATE“. We are looking for the first subpattern, so
%SUBPATT=“1″.
Now we put all together, we don’t forget to use the alternate ‚-characters:
%QUOTES=“%SETPATTREGEXP=’\d{2}(\d{2})\b’%-
%REGEXPBLINDMATCH=’%ODATE’%SUBPATT=’1′“
[Note: the regex is split using the %- macro and can be entered as two lines!]
Another example? There is a regex for reply templates that modifies the
name of the recipient. Instead of ‚Gerd Ewald‘ we would like to have
‚Gerd Ewald at TBUDL…..‘ Well, we could download this regex somewhere,
but let us try to create it ourselves.
%OFROMNAME will give us the name.
The reply address is given by %OREPLYADDR. We will extract the list’s
name with a regex. Usually the name of the list precedes the
@-character: %SETPATTREGEXP=“(.*?)\@“
This is used in combination with %REGEXPBLINDMATCH=“%OREPLYADDR“ of which we only want subpattern one : %SUBPATT=“1″
The result is then the contents of the TO-field. Watch out, before you
can enter text this field has to be cleared. This is done by an initial
assignment which is void.
%TO=““%TO='“%OFROMNAME at %-
%SETPATTREGEXP=_(.*?)\@_%-
%REGEXPBLINDMATCH=_%OREPLYADDR_%-
%SUBPATT=_1_“ <%OREPLYADDR>‘
[Note 1: the regex is split using the %- macro and can be entered as
seen! Note2: the regex makes use of a feature of recent versions of TB
where any character may be used as a quoting delimiter, in this case the
underscore and single quote as well as double quote. Users of earlier
versions will have to resort to using the clumsier double delimiter
syntax]
The original reply address has to be added enclosed in „<>“-characters at the end.
As you can see, the syntax is quite easy and stereotypical. The only
difficult thing is to find out which macro provides the necessary
information and how to extract it with the regex.
Here another example that is available at Marck’s FAQ-page at www.silverstones.com.
%WRAPPED=’Historians believe that on %ODATE%-
%SETPATTREGEXP=“(?m-s)Date\:\s*?((.*?[\d]{4})\s*?([\d]{0,2}\:%-
[\d]{0,2}\:[\d]{0,2})\s*?(.*))“%-
%REGEXPBLINDMATCH=“%HEADERS“ , at %SUBPATT=“3″[GMT%SUBPATT=“4″]%-
(which was %OTIME where I live) you wrote:’%-
Here, once again, the %- macro is used to make the whole combination easier to read.
This has no special meaning except that it tells TB that the following
line should be treated as a continuation of the first line. The %WRAPPED
means that the result of the macro combination will be word wrapped at
the defined column in TB.
What does the macro do?
The first part „%WRAPPED=’Historians believe that on %ODATE%-“ is just
some kind of a link up: on every reply the date of the original mail
should be added to the text ‚Historians believe that on ‚.
The second part contains the regex that is much more interesting to us (I deleted the %- macro to show the regex in one line):
„(?m-s)Date\:\s*?((.*?[\d]{4})\s*?([\d]{0,2}\:[\d]{0,2}\:[\d]{0,2})\s*?(.*))“
The option multiline is switched on and DotAll is switched off: (?m-s)
Then the regex looks for ‚Date:‘, which may be followed by any number of
whitespaces. Due to the greediness of the star a question mark follows.
The author escaped the colon with a backslash that isn’t necessary. I
don’t know why he did that but it won’t cause problems, so we’ll leave
it alone.
Now the first parenthesis follows. There is no need to group this part
and I assume it is done for easier reading. You may delete it but then
bear in mind that the total number of subpatterns has changed.
The second parenthesis looks for anything that consists of four digits.
We know that the regex will look in the kludges (%HEADERS) for the date.
So we guess that the author will look for something like ‚year‘. This
may be followed by whitespaces.
Now we come to the third parenthesis. This is the one the author needs.
He searches for three numbers with zero, one or two digits. These
numbers are separated with colons. That is obviously the time.
Whitespace may follow and with the fourth subpattern all of the rest is
matched: this is nomore than the GMT-information.
A closer look on the regex shows that it is applied to the header lines
and only that only subpattern three and four are really needed.
The result could be:
‚Historians believe that on Sonntag, 7. April 2002 , at 11:22:59[GMT
+0200](which was 11:22 where I live) you wrote:‘
It works although the layout would need a bit DIY.
Other Possibilities to Use Regular Expressions in TB
There are other possibilities for using regex in TB than macros.
For example the text search option for in the mail editor. It is
especially useful to search for strings in long mails with the special
features that regex offers.
This
window may be opened with Ctrl-F or using the ‚Edit Find‘ menu entry in
the mail editor. Just enter the regex in the text line. Don’t forget to
check the ‚regular expressions‘ box in the Options section.
In almost the same way I can search for text within stored mail, I can
search text mails in folders using regex. Just press F7 while in folder
view. This opens a search window, which offers the facility to search
for text in mails. In the ‚Options‘ tab panel you can enter the regex in
the ‚Search for‘ field.
Goto the ‚Advanced‘ tab panel and check „Regular Expressions“.
You can use regex in filter conditions to optimise the organisation of your inbox. This is a field where regex are as efficient as in macros. Go to the ‚Account, Sorting Office/Filters‘ menu item. Open the filter definition, go to the ‚Options‘ tab panel and check ‚Regular Expressions‘.
Overview and Summary
What did we learn in this final chapter?
There are several ways to use regular expressions in TB, which are:
- TB offers macros that can use regular expressions to find, extract and modify mail text:
- %REGEXPTEXT=“regex“: returns the matched string within a mail as text
- %REGEXPQUOTES=“regex“ returns the matched string within a mail as quoted text
- %REGEXPMATCH=“string“ defines the generic text in which the regex should match the specified string and return the matched text. Any macro or text may be used for ’string‘
- %REGEXPBLINDMATCH=“string“ is used to define the generic text in which the regex should match the specified string. It does not return any text. The %SUBPATT is needed to return the text. Any macro may be used for ’string‘.
- %SETPATTREGEXP=“regex“ defines the regex for %REGEXPMTACH and %REGEXPBLINDMATCH. The definition is valid if not overridden by a subsequent %SETPATTREGEXP
- %SUBPATT=“n“ returns the n-th subpattern when used with %SETPATTREGEXP and %REGEXPBLINDMATCH
- You can use regular expressions to look for specific messages as well as to search strings within mails. Furthermore you can use them for defining filters.
Exercises
1.
You remember the regex we wrote to clean the subject line?
„^Re(.*?):\s*(.*?)\s*(\(was:.*\))*$“ . Try to improve this one: instead
of ‚(was:xyz)‘ PGP-users will find ‚(PGP Decrypted)‘. The regex should
find these kinds of subject as well. Furthermore the regex should be
available within a reply template.
2. In the last chapter I described a macro that modifies the TO-address for mailing lists:
%TO=““%TO='“%OFROMNAME at %-
%SETPATTREGEXP=_(.*?)\@_%-
%REGEXPBLINDMATCH=_%OREPLYADDR-%-
%SUBPATT=_1_“ <%OREPLYADDR>‘
Try to change it in such a way that it is no longer necessary to use
%REGEXPBLINDMATCH and %SUBPATT but %REGEXPMATCH. You will need to modify
the regex. Hint wanted? Ok: The subpattern was created because
otherwise the @-character would have been included in the match. The
only thing you have to do is to find a regex that does not match the
@-character and has no subpattern.
1. Example
Well, that is not too difficult. You only expand the last part of the regex with an alternative
„^Re(.*?):\s*(.*?)\s*(\was:.*\)|\(PGP Decrypted\))*$“
But now we have a look at the template. We would like to create a new
subject. The macro we need is %SUBJECT because we use it when we reply
to a message and we want to have a proper subject line it should start
with:
%SUBJECT=“Re
Then we add the regex:
%SUBJECT=“Re: %SETPATTREGEXP=““^Re(.*?):\s*(.*?)\s*
(\was:.*\)|\(PGP Decrypted\))*$““
[Note: the regex is wrapped due to layout reasons. All must be used as a single long line!]
We will apply it to the original subject %OFULLSUBJ and need the second subpattern. %SUBPATT=“2″
%SUBJECT=“Re: %SETPATTREGEXP=““
^Re(.*?):\s*(.*?)\s*(\was:.*\)|\(PGP Decrypted\))*$““
%REGEXPBLINDMATCH=““%OFULLSUBJ““%SUBPATT=““2″““
[Note: the regex is wrapped due to layout reasons. All must be used as a single long line!]
Ok, that’s it. Additional exercise: what if the subject does not have
any ‚Re‘ but ‚AW‘, ‚FWD‘ or anything else? Go, try to add further
alternatives at the start of the regex.
2. Example
A
positive lookahead assertion will help „.*?(?=\@)“ The assertion will
look for the @-character but won’t include it in the match. Therefore,
the template is easier to write:
%TO=““%TO='“%OFROMNAME at %-
%SETPATTREGEXP=_.*?(?=\@)_%-
%REGEXPMATCH=_%-
%OREPLYADDR_“ <%OREPLYADDR>‘