Retrieving Data from POST Scripts

This is harder for two reasons:

  1. You have to use the full object-oriented LWP library to do a POST.
  2. It's harder to figure out what the magic URL is.

Doing POST with LWP

Here's the broad outline:

  1. Use the LWP module.
  2. Create a new LWP::UserAgent object. This is a "virtual browser" that knows how to contact remote sites and retreive URLs.
  3. Create a new HTTP::Request::Common object. This contains the URL of the thing you want to retreive and the method to retrieve it with.
  4. Send the request via the user agent's request() method, receiving an HTTP::Response object as the result.
  5. Check the error code.
  6. Call the response object's content() method to get the returned page.
This program runs Chris Burge's GenScan program on your FASTA file. It simulates submitting the fill-out form on Chris's MIT web server.

Code:

#!/usr/bin/perl -w
# file: post_gs.pl
# genscan runner

use LWP;
use HTTP::Request::Common;

my $GS_URL    = 'http://genes.mit.edu/cgi-bin/genscanw.cgi';

my $file = shift or die "Provide a Fasta file to run Genscan on";

my $agent   = new LWP::UserAgent;
my $request = POST($GS_URL,
		   Content_Type => 'form-data',
		   Content => [ 
			       -o    => 'Vertebrate',
			       -e    => '1.00',
			       -n    => 'MySequence',
			       -p    => 'Predicted peptides only',
			       -u    => [$file,'C:/Windows/Fonts/$file'],
			      ]
		  );

my $response = $agent->request($request);
die "request failed" unless $response->is_success;

my $record = $response->content;
print $record;

Output:

(~) 69% post_gs.pl test.fa
<HTML>
<HEAD>
<TITLE>GENSCAN output</TITLE>
<META HTTP-EQUIV="OWNER" CONTENT="GENSCAN">
<LINK REL="made" HREF="mailto:cburge@mit.edu">
<BASE HREF="http://genes.mit.edu/GS/gs_out.html">
</HEAD>
<BODY>
<BODY BGCOLOR="#00336677" link="#FFFF00" vlink="#77FFFF77" alink="#FFFF00" text="#FFFFFF">
<BR>
<CENTER><H3>GENSCANW output for sequence MySequence</H3></CENTER>
<BR>
<BR>
<pre>
<EM>
GENSCAN 1.0	Date run: 24-Oct-102	Time: 15:57:15

Sequence MySequence : 100000 bp : 34.63% C+G : Isochore 1 ( 0 - 43 C+G%)

Parameter matrix: HumanIso.smat

</EM><b>Predicted genes/exons:
</b><STRONG>

Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..
----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------

 1.01 Intr +    158    227   70  0  1   46   92    94 0.403   2.92
 1.02 Term +    883   1210  328  1  1   34   38   400 0.918  22.80
 1.03 PlyA +   1301   1306    6                               1.05

 2.00 Prom +   1496   1535   40                             -10.94
 2.01 Init +   1599   1653   55  2  1   65   34    87 0.921   0.53
 2.02 Intr +   1711   1851  141  2  0   54   81   263 0.999  21.60
 2.03 Intr +   1908   2035  128  1  2   52   92   198 0.999  16.08
 2.04 Term +   2104   2553  450  0  0   52   48   654 0.999  52.00
 2.05 PlyA +   2821   2826    6                               1.05

 3.00 Prom +   2986   3025   40                             -14.06

How Do You Figure out the Magic POST URL?

For POST scripts this can be hard. You can either:

  1. Reverse engineer the field names by examining the HTML.
  2. Download the source code for the site's fill-out form and change the <form> tag's action attribute to point to a CGI test script, such as http://stein.cshl.org/cgi-bin/test-cgi.pl

A Modified Version of Chris Burge's BLAST Page:

-->

The New GENSCAN Web Server at MIT


Identification of complete gene structures in genomic DNA


 
               \\|//          
               (o o)
-. .-.   .-oOOo~(_)~oOOo-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-. 
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-'


For information about Genscan, click here



This server provides access to the program Genscan for predicting the locations and exon-intron structures of genes in genomic sequences from a variety of organisms.

News:

This server can accept sequences up to 1 million base pairs (1 Mbp) in length. If you have trouble with the web server or if you have a large number of sequences to process, request a local copy of the program (see instructions at the bottom of this page) or use the GENSCAN email server. If your browser (e.g., Lynx) does not support file upload or multipart forms, use the older version.

Organism: Suboptimal exon cutoff (optional):

Sequence name (optional):

Print options:

Upload your DNA sequence file (one-letter code, upper or lower case, spaces/numbers ignored):

Or paste your DNA sequence here (one-letter code, upper or lower case, spaces/numbers ignored):

To have the results mailed to you, enter your email address here (optional):



Back to the top



This server was kindly donated by COMPAQ



GENSCAN was developed by Chris Burge in the research group of Samuel Karlin, Department of Mathematics, Stanford University. The program and the model that underlies it are described in:

Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94.

The splice site models used are described in more detail in:

Burge, C. B. (1998) Modeling dependencies in pre-mRNA splicing signals. In Salzberg, S., Searls, D. and Kasif, S., eds. Computational Methods in Molecular Biology, Elsevier Science, Amsterdam, pp. 127-163.

See also:

Burge, C. B. and Karlin, S. (1998) Finding the genes in genomic DNA. Curr. Opin. Struct. Biol. 8, 346-354.

This web server is located in the Burge laboratory at the MIT Department of Biology.

Address any comments/questions/suggestions to: Chris Burge (cburge@mit.edu)

Please notify me by email (cburge@mit.edu) if: 1) the web/email server is not working; 2) you find a bug in GENSCAN; or 3) you have a suggestion for how to make the program more "user friendly". In your email, please specify which Genscan server you had trouble with (e.g., new MIT web server, MIT email server) and the nature of the problem.

NOTE. This server is for the program "GENSCAN", developed by Chris Burge at Stanford University, not to be confused with the Applied Biosystems sequencing software called "GENESCAN".


Copyright © 1997-2002 Christopher Burge

GENSCAN is freely available for academic use. Executables are currently available for the following Unix platforms: Sun/Solaris, SGI/Irix, DEC/Tru64, Intel/Linux, and Intel/Solaris. Platforms not listed are not currently supported. To obtain a copy of GENSCAN for academic use, go to the academic license agreement web site and download the executable after completing the form.

For commercial use, contact Imelda Oropeza at the Stanford University Office of Technology Licensing.


<< Previous
Contents >>  

Lincoln D. Stein, lstein@cshl.org
Cold Spring Harbor Laboratory
Last modified: Fri Oct 22 12:12:36 EDT 1999