Chapters 4, 6, 7 & 8 of Learning Perl.
These problems are to be done over the course of several workshops, depending on time. The lecturer will tell you which problems to attempt during the workshop!
(~) wordcount.pl word_list.txt
word appears 14 times
the appears 10 times
...
Your results will be substantially different from this. This is just an example.
You'll find a test file named example1.fasta in /net/share/perl_problems
% count_lines example1.fasta
TOTAL_LINES = 1392
TOTAL_CHARACTERS = 97441
Note, your results will differ from this -- this is just an example for output format.
% line_distribution example1.fasta
TOTAL_LINES = 1392
Length Count
12 1
20 2
28 1
36 1
40 3
60 89
Note, your results will differ from this -- this is just an example for output format.
% line_distribution2 example1.fasta TOTAL_LINES = 1392 Length Count 60 89 40 3 20 2 12 1 28 1 36 1
% unwrap example1.fasta M43911 GATTCCGATCCCCCCCCCAGTTTGACCAAAGTTCAGAGGAAATCCCAGACCAAC.... L54931 GGGTGGTGGTGAGAGAGAGCGATTGAAAGCTATATATATGACCGATTCACAGGT.... L54932 TAGTTGATTCAGTCCGATTTCAATTGATTTCCCGTATATCCTTAAGGGTTTAAA....
% unwrap example1.fasta | reverse_complement
M43911 GTTGGTCTGGGATTTCCTCTGAACTTTGGTCAAACTGGGGGGGGGATCGGAATC...
L54931 ACCTGTGAATCGGTCATATATATAGCTTTCAATCGCTCTCTCTCACCACCACCC...
L54932 TTTAAACCCTTAAGGATATACGGGAAATCAATTGAAATCGGACTGAATCAACTA...
% unwrap example1.fasta | codons
M43911 GAT TCC GAT CCC CCC CCC AGT TTG ACC AAA GTT CAG AGG AAA...
L54931 GGG TGG TGG TGA GAG AGA GCG ATT GAA AGC TAT ATA TAT GAC...
L54932 TAG TTG ATT CAG TCC GAT TTC AAT TGA TTT CCC GTA TAT CCT...
% unwrap example1.fasta | codons_threeframe M43911.1 GAT TCC GAT CCC CCC CCC AGT TTG ACC AAA GTT CAG AGG AAA... M43911.2 ATT CCG ATC CCC CCC CCA GTT TGA CCA AAG TTC AGA GGA AAC... M43911.3 TTC CGA TCC CCC CCC CAG TTT GAC CAA AGT TCA GAG GAA ACC... L54931.1 GGG TGG TGG TGA GAG AGA GCG ATT GAA AGC TAT ATA TAT GAC... L54931.2 GGT GGT GGT GAG AGA GAG CGA TTG AAA GCT ATA TAT ATG ACT... ...
% unwrap example1.fasta | codons_threeframe | ribosome
M43911.1 P G G * U L L M X X X X X X
M43911.2 ....
M43911.3
L54931.1
L54931.2
To help you, cut and paste this translation table:
%CODON_TABLE = ( TCA => 'S',TCG => 'S',TCC => 'S',TCT => 'S', TTT => 'F',TTC => 'F',TTA => 'L',TTG => 'L', TAT => 'Y',TAC => 'Y',TAA => '*',TAG => '*', TGT => 'C',TGC => 'C',TGA => '*',TGG => 'W', CTA => 'L',CTG => 'L',CTC => 'L',CTT => 'L', CCA => 'P',CCG => 'P',CCC => 'P',CCT => 'P', CAT => 'H',CAC => 'H',CAA => 'Q',CAG => 'Q', CGA => 'R',CGG => 'R',CGC => 'R',CGT => 'R', ATT => 'I',ATC => 'I',ATA => 'I',ATG => 'M', ACA => 'T',ACG => 'T',ACC => 'T',ACT => 'T', AAT => 'N',AAC => 'N',AAA => 'K',AAG => 'K', AGT => 'S',AGC => 'S',AGA => 'R',AGG => 'R', GTA => 'V',GTG => 'V',GTC => 'V',GTT => 'V', GCA => 'A',GCG => 'A',GCC => 'A',GCT => 'A', GAT => 'D',GAC => 'D',GAA => 'E',GAG => 'E', GGA => 'G',GGG => 'G',GGC => 'G',GGT => 'G');
% unwrap example1.fasta | codons_threeframe | ribosome | longest_orf
ID Range(nt) Length (aa)
M43911 2-480 160
L54931 31-638 202
L54932 1-1032 344
% unwrap example1.fasta | gc_content --window 50
1 48.1
2 48.2
3 48.1
...
1000 55.8