Regular expression matches and substitutions have a whole set of options which you can toggle on by appending one or more of the i, m, s, g, e or x modifiers to the end of the operation. See Programming Perl Page 153 for more information. Some example:
$string = 'Big Bad WOLF!'; print "There's a wolf in the closet!" if $string =~ /wolf/i; # i is used for a case insensitive match |
Adding the g modifier to the pattern causes the match to be global. Called in a scalar context (such as an if or while statement), it will match as many times as it can.
This will match all codons in a DNA sequence, printing them out on separate lines:
Code:
$sequence = 'GTTGCCTGAAATGGCGGAACCTTGAA';
while ( $sequence =~ /(.{3})/g ) {
print $1,"\n";
} |
Output:
GTT GCC TGA AAT GGC GGA ACC TTG |
If you perform a global match in a list context (e.g. assign its result to an array), then you get a list of all the subpatterns that matched from left to right. This code fragment gets arrays of codons in three reading frames:
@frame1 = $sequence =~ /(.{3})/g;
@frame2 = substr($sequence,1) =~ /(.{3})/g;
@frame3 = substr($sequence,2) =~ /(.{3})/g;
|
The position of the most recent match can be determined by using the pos function.
| Code: |
#file:pos.pl
my $seq = "XXGGATCCXX";
if ( $seq =~ /(GGATCC)/gi ){
my $pos = pos($seq);
print "Our Sequence: $seq\n";
print '$pos = ', "1st postion after the match: $pos\n";
print '$pos - length($1) = 1st postion of the match: ',($pos-length($1)),"\n";
print '($pos - length($1))-1 = 1st postion before the the match: ',($pos-length($1)-1),"\n";
}
|
| Output: |
~]$ ./pos.pl Our Sequence: XXGGATCCXX $pos = 1st postion after the match: 8 $pos - length($&) = 1st postion of the match: 2 ($pos - length($&))-1 = 1st postion before the the match: 1 |
If you use a variable inside a pattern template, as in /$pattern/ be aware that there is a small performance penalty each time Perl encounters a pattern it hasn't seen before. If $pattern doesn't change over the life of the program, then use the o ("once") modifier to tell Perl that the variable won't change. The program will run faster:
$codon = '.{3}';
@frame1 = $sequence =~ /($codon)/og;
|
To be sure that you are getting what you think you want you can use the following "Magic" Perl Automatic Match Variables $&, $`, and $'
| Code: |
#file:matchTest.pl
if ("Hello there, neighbor" =~ /\s(\w+),/){
print "That actually matched '$&'.\n";
print "That was ($`) ($&) ($').\n";
}
|
| Output: |
That actually matched ' there,'. That was (Hello) ( there,) ( neighbor). |
|
| Contents |