Anatomy of an HTML Document

 

Appearances: Backend vs. Frontend

HTML documents never look alike when displayed in an editor and compared to its appearance in the browser.  You can use tabs, returns, spaces, leading spaces all of which are ignored by the browser in the display of the page - HTML ignores white space.  You will notice a lot of extra text in the html source code, display tags, interactivity markers and their parameters.

 

Important to adopt a clear presentation style for your source code, for you and others to read and importantly making it easier for you to repair and edit code.

 

 

Structure of an HTML Document:

The document consists of text, defining content and tags, formatting that content.  The code consists of an outer tag, html enclosing the head and body of the code.

 

Head: give your document a title, and indicate parameters the browser may use when displaying the document as well as meta data so that search engines will find and properly catalog your site.

 

Body: where you put the contents of the document; it is the body of the document that is presented on a browser.

 

Tags: consists of a name tag, beginning a formatting or structural instruction which encases content text with an instruction to end or stop the tag procedure.  Tags are enclosed by brackets - <tag>.  Attributes are characteristics which you add to the tag to further define its operation <tag attribute="value">.  Attributes are separated from tags by a space.  Attribute values are separated by an = sign and exist within quotations.  It does not matter if there are additional spaces around the = sign.  Tag and attributes should always be in lower case, as are URLÕs and file names, all locations are in lower case!

 

 

 

 

 

 

 

 

 

<html> tag

delimits the beginning and end of an html document

contains the documentÕs head and body

            head identifies the document and its place within the document collection

            body contains the content of the document structured by tags

 

<head> tag has no attributes, only encapsulates other header tags such as title or meta tags or language declaration

           

<title> for the title of the web page, important to be descriptive for the search engines

 

<link> and <base> define the documentÕs base location and relationship to other documentÕs

 

<isindex> creates automatic document indexing forms, allowing users to search databases of information using the current document as a querying tool.

 

<nextid> makes creation of unique document labels easier when using document automation tools

 

<meta> provides additional document data not supplied by any other head tags

 

<style> lets you create cascading style sheet properties to control body – content display characteristics of the entire document.

 

Dublin Core Metadata Standard - dublincore.org

<meta name="title" lang="en" content="Title">

<meta name="creator" lang="en" content="Name">

<meta name="subject" lang="en" content="Photography Portfolio">

<meta name="description" lang="en" content="Search engine description.">

<meta name="keywords" content="photographer, artist " />

<meta name="DC.Date.Modified" content="2007-05-05">

<meta name="DC.Format" content="text/html, text/html; charset=iso-8859-1">

<meta name="DC.Format.extent" content="3 Kbytes">

<meta name="DC.Identifier" content="URL">

<meta name="DC.Language" content="en">

 

<body> tag

á       where you put the contents of you document

á       has a number of attributes defining global parameters of document content

 

appearance attributes: alink, background, bgcolor, bgproperties, left margin, link, text, topmargin, vlink

 

programmatic attributes: onblur, onFocus, onLoad, onUNload

 

 

Text Basics in HTML

 

I.               Divisions and Paragraphs

<div> tag

divides document into separate, distinct sections

can be used strictly as an organizational tool, without any formatting associated with it, can be combined with the align attribute

      <div align=Óleft/center/rightÓ>

 

<p></p> tag

signals the start and stop of a paragraph

browsers ignore repeated <p> tags

 

browsers automatically generate line breaks relative to window size unless the <br /> <nobr></nobr> <wbr> tags are used within the paragraph

or using <pre> all letter spacing you set in the editor is observed

 

<p align=Óleft/center/rightÓ></p>

 

II.              Headings

HTML gives you six levels of headings

<h1></h1> to <h6></h6> with 6 being the smallest

headers are used as they are in outlines in relationship of size to importance of the material being ÒheadedÓ i.e. heads, subheads and sub-subheads

 

alignment: <h1 align=Óleft/center/rightÓ>

headers also add extra leading space below their line

 

headers add structure and scannability to the document, for reader to capture meaning easily and quickly

 

Not for use of changing font sizes, use the font size tag for that, but to create headings the heading signifies the ending of a paragraph

 

You can add small images to heading, bullets or icons for instance

 

III.            Changing Text Appearance

Content-based text styles

Informs the browser that the enclosed text has a specific meaning, context or usage.  We wonder why we should use content-based tags as they often repeat physical styles, in the future browsers will treat content-based styles with more and more specific characteristics.  Also, it allows automated systems to extract information according to the parameters described in these tags which are content indicators.

 

<cite></cite>bibliographical citation

<code></code> monospaced teletype font, like courier

<dfn></dfn> definitions for special terms or phrases

<em></em> gives emphasis to text, usually with italics

<kbd></kbd> indicates that text is typed on a keyboard used for computer related documentation and manuals.

<samp></samp> indicates a sequence of literal characters that should have no other interpretation by the user.  This tag is most often used when a sequence of characters is taken out of normal context – where special emphasis needs to be placed on small character sequences taken out of their normal context.

<strong></strong> for emphasizing text with more gusto, usually made bold by browsers

<var></var> indicates a variable name or user-supplied value.  Most often used in conjunction with the <code> and <pre> tags for displaying particular elements of computer programming code samples and the like.  Typically rendered in mospaced format.

 

Physical text styles

<b></b> bold

<big></big> increases size of font, one size larger than surrounding text up to size 7

<blink></blink> makes text blink on and off

<I></I> to create italicized text

<small></small> to create one size smaller than the surrounding text

<strike></strike> to create text with a strike through

<sub></sub> displays the character half a height lower, but at the same size

<sup></sup> displays the character half a height higher, but at the same size

<tt></tt> displays teletype monospaced text

<u></u> underlines the text

 

Font Size

<basefont size=n></basefont> determines basic size for the font that the browser will use to render normal document text.  With IE you can also indicate color and face as an attribute.  Usually put in the head tag.

 

<font></font> has attributes of color, face size

size can be relative, ie, +1 or –1 to the basefont size which is 3 by default

size can be absolute, Òsize=3Ó

limit of 7 in size in either case

 

IV.           Precise Spacing and Layout

<br> tag inserts a line break into text

normally you leave word wrapping, character and line spacing and other presentation details up to the browser

 

<br clear=Óleft/right/allÓ>

to be used with text in relation to images.  When the specified margin is clear of images, the browser resumes text flow. For example,

<img src=Ódog.gifÓ align=ÓleftÓ>

this text will wrap around the dog image, flowing between the image and the right margin of the document.

 

<br clear left>

This text will resume when the image no longer takes up the left margin, putting the text below the image extending across the full width of the page, leaving white space above this text and to the right of the image.

 

<nobr></nobr> tag

For text to appear unbroken, like using nowrap in tables

Suppress an automatic line break when the current line reaches the right margin.

 

<wbr> tag

used within the <nobr></nobr> tags the <wbr> causes a line to break only if the current line extends beyond the browsers display window margins.

 

<pre></pre> tag

defines a segment inside which the browser renders text in exactly the characer and line spacing defined in the source html.  Used to create ASCII art.  Used to retain integrity of columns and rows of characters in tables of numbers that must line up correctly and to insert blank segments in the document display.  Tab characters have their display within the <pre> tag also.  Normal word wrapping and paragraph filling are disabled and extraneous leading and trailing spaces are honored.  Text is displayed in a monospaced font.

<pre> has the width attribute to determine the number of characters to fit on a single line within the <pre> block.  The browser may choose to ignore this direction and very often does.

 

<div align=ÓcenterÓ><div> tag

centers whatever text, images, tables contained within.

 

<blockquote></blockquote> tag

for quotes three lines of longer -- will put into an indented block in the center of the page.  You can nest this tag to indent your text further in.

 

V.             Addresses

<address></address>

tells browser that the enclosed tag is an address.

Can use style sheets to define style.

 

VI.           Special Character Encoding

Special characters need either a special name or a numeric character encoding for inclusion in an HTML document.

 

Characters entities are for special characters such as ~ in html

You enclose wither its standard entity nage or a pund sign# and its numeric position in ISO Latin-1 standard character set inside an ampersand & and a semicolon; without any spaces in between.

Windows and Unix systems will automatically display the proper character, but Macintosh and DOS will not so you must enter the special characters with either name or number codes.