A regular expression is a string template against which you can match a piece of text. They are something like shell wildcard expressions, but much more powerful.
This bit of code loops through each line of a file. Finds all lines containing an EcoRI site, and bumps up a counter:
Code:
#!/usr/bin/perl -w
#file: EcoRI1.pl
use strict;
my $filename = "example.fasta";
open (FASTA , "$filename") or print "$filename does not exist\n";
my $sites;
while (my $line = <FASTA>) {
chomp $line;
if ($line =~ /GAATTC/){
print "Found an EcoRI site!\n";
$sites++;
}
}
if ($sites){
print "$sites EcoRI sites total\n";
}else{
print "No EcoRI sites were found\n";
}
#note: if $sites is declared inside while loop you would not be able to
#print it outside the loop
|
| Output: |
~]$ ./EcoRI1.pl Found an EcoRI site! Found an EcoRI site! . . . Found an EcoRI site! Found an EcoRI site! 34 EcoRI sites total |
|
| Code: |
#file:EcoRI2.pl
while ( <FASTA> ) {
chomp;
if ($_ = /GAATTC/){
print "Found an EcoRI site!\n";
$sites++;
}
}
|
| Output: |
~]$ ./EcoRI1.pl Found an EcoRI site! Found an EcoRI site! . . . Found an EcoRI site! Found an EcoRI site! 34 EcoRI sites total |
|
| Code: |
#file:EcoRI.pl
while ( <FASTA> ) {
chomp;
if (/GAATTC/){
print "Found an EcoRI site!\n";
$sites++;
}
}
|
| Output: |
~]$ ./EcoRI1.pl Found an EcoRI site! Found an EcoRI site! . . . Found an EcoRI site! Found an EcoRI site! 34 EcoRI sites total |
| Code: |
#file:methy.pl
while (<FASTA>) {
chomp;
if (/[GA]C.?G/){ #What Happens If Your File Is Not All In CAPS
#print "Found a Methylation Site!\n";
$sites++;
}
}
if ($sites){
print "$sites Methylation Sites total\n";
}else{
print "No Methylation Sites were found\n";
}
|
| Output: |
~]$ ./methy.pl 723 Methylation Sites total |
A regular expression is normally delimited by two slashes ("/"). Everything between the slashes is a pattern to match. Patterns can be made up of the following Atoms:
Examples:
/g..t/ matches "gaat", "goat", and "gotta get a goat" (twice)
/g[gatc][gatc]t/ matches "gaat", "gttt", "gatt", and
"gotta get an agatt" (once)
/\d\d\d-\d\d\d\d/ matches 376-8380, and 5128-8181, but not
055-98-2818.
/^\d\d\d-\d\d\d\d/ matches 376-8380 and 376-83801, but not
5128-8181.
/^\d\d\d-\d\d\d\d$/ only matches telephone numbers.
/\bcat/ matches "cat", "catsup" and "more catsup please"
but not "scat".
/\bcat\b/ only text containing the word "cat".
By default, an atom matches once. This can be modified by following the atom with a quantifier:
Examples:
/goa?t/ matches "goat" and "got". Also any text that
contains these words.
/g.+t/ matches "goat", "goot", and "grant", among others.
/g.*t/ matches "gt", "goat", "goot", and "grant", among others.
/^\d{3}-\d{4}$/ matches US telephone numbers (no extra
text allowed.
A set of alternative patterns can be specified with the | symbol:
/wolf|sheep/; # matches "wolf" or "sheep"
/big bad (wolf|sheep)/; # matches "big bad wolf" or "big bad sheep"
You can combine parenthesis and quantifiers to quantify entire subpatterns:
/Who's afraid of the big (bad )?wolf\?/; # matches "Who's afraid of the big bad wolf?" and # "Who's afraid of the big wolf?"
This also shows how to literally match the special characters -- put a backslash (\) in front of them.
Regular expressions will attempt to match $_ by default. To specify another string variable, use the =~ (binding) operator:
$h = "Who's afraid of Virginia Woolf?"; print "I'm afraid!\n" if $h =~ /Woo?lf/;
There's also an equivalent "not match" operator !~, which reverses the sense of the match:
$h = "Who's afraid of Virginia Woolf?"; print "I'm not afraid!\n" if $h !~ /Woo?lf/;
If you want to match slashes in the pattern, you can backslash them:
$file = '/usr/local/blast/cosmids.fasta'; print "local file" if $file =~ /^\/usr\/local/;
This is ugly, so you can specify any match delimiter with the m (match) operator:
$file = '/usr/local/blast/cosmids.fasta'; print "local file" if $file =~ m!^/usr/local!;
The punctuation character that follows the m becomes the delimiter. In fact // is just an abbreviation for m//. Almost any punctuation character will work:
m!^/usr/local!
m#^/usr/local#
m@^/usr/local@
m,^/usr/local,
m{^/usr/local}
m[^/usr/local]
The last two examples show that you can use left-right bracket pairs as well.
You can use a scalar variable for all or part of a regular expression. For example:
See the o flag for important information about using variables inside patterns.$pattern = '/usr/local'; print "matches" if $file =~ /^$pattern/;
|
| Contents |
Next |