Lincoln Stein, How to Set Up and Maintain a Web Site, 2nd Ed.,
HTML Quick Reference (next to last page).
Followup Reading
Lincoln Stein, How to Set Up and Maintain a Web Site, 2nd Ed., Chapter 5 (Creating Hypertext Documents), and Chapter 2 (Unraveling the Web: How it All Works) through the subsection "MIME Types and File Extensions".
This lecture assumes that you can use a working web server that is already available on your system.
From the Unix side a simple web site is a set of Unix directories
containing documents that the
A more sophisticated web site can allow browsers to return information back to the server. (This is what happens when you fill out a web form and then press the "submit" button.) The web server passes this information to programs that the web site designers wrote, and then returns any results produced by the program back to the browser. (We will learn to write such server-side or CGI programs in a subsequent lecture.) A sophisticated web site can also send whole programs to the browser to be executed there. Often these are Java, Javascript, or Active- Xprograms.
Web Servers The most commonly used Unix web server is Apache (www.apache.org). Windows and Mac systems have their own web servers.
Where to Create Your Site To create a simple web site
you ask your system administrator to tell you where you should place
your web documents (and to make sure you have Unix permissions to put
documents there). These documents are likely to be visible to the whole
Internet (at least to people who know where to look for them) unless
they are behind a firewall. You also have to ask your system
administrator what the
For our course create the directory ~/public_html/. Make sure that both your home directory and ~/public_html are world readable and executable (so that the web server, which runs as separate Unix user in a separate Unix group, can read them):
chmod a+rx ~ chmod a+rx ~/public_htmlA document
X created in your
"public_html"
directory will
have URL
http://bush1/~your_user_name/X (e.g.
bush1/~srozen/example.html), and will
be behind a firewall, so only your fellow classmates and others
at CSHL will be able to see them.
A URL (e.g. http://bush1/~srozen/example.html) is composed of three parts:
http.
(Netscape assumes your protocol is http
unless you specify one.)
bush1.
~srozen/example.html
The path does not correspond exactly to a Unix directory path, but
they are usually related by reasonable rules. For example, if
you create a subdirectory problem1 of
~/public_html and put file example2.html there
you can access the URL http://bush1/~your_user_name/problem1/example2.html.
(Make sure to set the permissions so that the web server can read and
execute the directory and read the file, as shown above.)
MIME Types Each web document has a text/plain.
To create a text/plain file simply create a file in your web
directory with the extension .txt and put some text in it.
The browser presents the text from such a file without any modification or formating.
The web server or the browser often determine the MIME type of
document from its extension (called a "suffix" in Netscape
preferences). For example, JPEG images have
MIME type image/jpeg
and have the extensions jpeg, jpg, etc.
Some of the other mime types that you are likely to encounter as a web
user (and as a web site designer) are
text/html, image/gif, and application/pdf (the
format that Acrobat reader uses, and which is
popular among electronic journal publishers).
Some mime types can be handled by the browser itself.
Others must be sent to a plug-in or helper application
(plug-in's are somewhat more integrated than helper applications).
For example, you need Acrobat reader to view
application/pdf files.
To provide maximum hassle free usability of your web site stick to MIME types
that are built in to Netscape. (For example, many biology lab Macs do not
even have Netscape configured to read pdf files.)
HTML (HyperText Markup Language) (MIME type text/html,
file extensions html, htm)
is still the central MIME type for web documents.
Core HTML provides formatting and hyperlinks (and "forms", covered
in another lecture.) You can create very complex sites (and many commercial
web are very complex), but but core HTML allows you to provide a huge amount
of functionality for very little effort.
This is (slightly simplified) a fragment of the HTML from the beginning of this document:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html><head> <title>Providing Web Information Services I</title> </head> <body> <center> <h1> Providing Web Information Services I</h1> <h3> Steve Rozen</h3> <h3><a href="../index.html">Genome Informatics</a></h3> </center> <a name="reference_materials"><h2>Reference Materials</h2></a<p> Lincoln Stein, <i>How to Set Up and Maintain a Web Site, 2nd Ed.</i>, HTML Quick Reference (next to last ... </body> </html>
Note the following:
| Comments | <!...>
| ||||
| Document "outline" | <html><head><title>...</title></head><body>...</body></html>
| ||||
| Pairs of tags | | ||||
| Paired tags must nest | E.g. Heading format instructions E.g. |
The example above contains a hyperlink:
<a href="../index.html">Genome Informatics</a>
In this example the URL refers to the file index.html
in the parent directory of to the
current directory. (Files named index.html often
have a special role; web servers are often configured so that
e.g. the URL http://bush1/
refers to http://bush1/index.html.)
The text Genome Informatics between
<a href="../index.html"> and </a>
is called an anchor, and gives the reader something to
click on.
Hyperlinks can also refer to an entirely different web site, for example
this is a link the Apache web site:
<a href="http://www.apache.org">Apache</a> (Apache).
The href can
be either relative (i.e. a path relative to the
protocol, host, and path of the current document), or can specify a full protocol,
host, and (optional) path.
Some additional important basic HTML tags:
<p>
</p>.)
<hr>
</hr>.)
<strong>...</strong>
<pre>...</pre>
<pre>
These
line breaks
are not wrapped.
</pre>
gets presented like this:
These
line breaks
are not wrapped.
To include an image such as the freehand web architecture diagram above, use, for example
<img src="webarch-vsmall.jpg" alt="[Freehand Sketch of Web Architecture]">
The image is just a file, in this case a "JPEG" image. You can even make a hyperlink from an image, for example, the HTML
<a href="webarch-med.jpg">
<img src="webarch-vsmall.jpg" align="middle"
height=75 alt="[Freehand Sketch of Web Architecture]">
</a>
produces
.
If you click on the image you get a bigger version of the image; the small image is the anchor.
The target of the hyperlink can be anything, of course. For example
<a href="webarch.html">
<img src="webarch-vsmall.jpg" align="middle"
height=75 alt="[Freehand Sketch of Web Architecture]">
</a>
produces
,
and if you click on it you get a new page HTML document with some text on web architecture.
Note that an image to be displayed in-line in the web page and a link to an image are different things in HTML. Of course you can also create a text anchor for a hyperlink to an image: For example Click here for a really big picture, coded by
<a href="webarch.jpg">Click here for a really big picture</a>
It is even possible to associate different hyperlinks with different parts of an
image. See the discussion of clickable image maps in How to Set Up and Maintain a Web Site.
(The HTML tags used for clickable image maps are <img>...</img>,
<map>...</map> and <area>...</area>.)
As a first rule of thumb,
do not get carried away with images. Some pages are basically a mosaic
of images (and other byte-rich do-dads).
(An example of
what you might want to avoid is is Fox Kids:
http://www.foxkids.com/index.asp.)
Images can be big and take long time to download.
Lots of
people still need to use slow Internet connections.
Developers, sitting at machines on the same LAN
as the web server often seem to forget this.
(Actually the Foxkids page is clever. It really
starts with an eye-catching animated graphic
http:/www.foxkids.com
that loads pretty quickly, then
automatically goes to the slower-loading page,
http:/www.foxkids.com/index.asp.)
A subsequent lecture will cover other HTML formatting capabilities, including lists, tables, and forms.
We include a brief overview here because controlling access to your web data is sometimes essential. There are several approaches (which can be combined). The approaches that will be available to you will depend on how your site is administered, and all require you to get help from your site administrator.
In addition, allowing web browsers to launch programs on your home machine via the web server can introduce the risk of outsiders viewing data they are not supposed to, or even of having them execute programs on the server that can do damage. We will discuss these risks in a subsequent lecture when we discuss "CGI" programming.
Create a text/plain document called EUREKA.txt in your
public_html directory and view it in your browser as a file. (In
Netscape, go to "File", "Open Page...", "Choose File...", and then
choose EUREKA.txt in the file chooser.) When
prompted, choose "Open in Navigator".
text/plain via
the web server (http://bush1/~your_user_name/EUREKA.txt).
MY_info_svcs.html.) Edit it to remove
the line
<link rel="stylesheet" href="./standard.css">
(near the top).
Also delete this problem.
Then view the new file in your browser (as a file) to make sure it looks OK.
Once it looks OK put the modified page in your web site
and view it via the web server
(http://bush1/~your_user_name/MY_info_svcs.html).
What happens to the images? Why?
http://www.cshl.org.
Insert it into the top of http://bush1/~your_user_name/MY_info_svcs.html.
Create your own file http://bush1/~your_user_name/index.html.
If you use xemacs it will likely prompt you for the
document title and then create
a HTML document template including
<html><head><title>...</title></head><body>...</body></html>
as well as other stuff for you.
Otherwis type that yourself.
Try the various HTML constructs discussed in this lecture.
Create a link to the main course page and to ~/public_html/MY_info_svcs.html.
Create one link that uses a full URL (with protocol and host portions), and one "relative" URL
that refers to a file in your public_html directory or a subdirectory.