Last updated: March 03, 2003

HTML Tidy

Introduction

HTML Tidy is a utility to clean up and pretty print HTML, XHTML, and XML files. This version was downloaded and installed on the SENS Solaris/UBiquity systems on March 3, 2003. A version for Microsoft Windows is available here, with a user interface available from this site.

Usage and Examples

Care should be taken, as this program can overwrite your web files, and its method of formatting pages might be much different from yours.

To check a page without changing anything, use this command:

tidy -errors mypage.html

To convert a file in-place:

tidy -modify mypage.html

To convert a file from HTML to XHTML, placing the output in a new file:

tidy -o mynewpage.html -asxhtml mypage.html

Documentation

The following online documents are available:

Options

Here is the full list of command-line options, obtained by running the command "tidy -help":

tidy [option...] [file...] [option...] [file...]
Utility to clean up and pretty print HTML/XHTML/XML
see http://tidy.sourgeforge.net/

Options for HTML Tidy for Solaris released on 1st March 2003:

File manipulation
-----------------
  -out or -o  specify the output markup file
  -config     set configuration options from the specified 
  -f          write errors to the specified 
  -modify or -m     modify the original input files

Processing directives
---------------------
  -indent  or -i    indent element content
  -wrap     wrap text at the specified  (default is 68)
  -upper   or -u    force tags to upper case (default is lower case)
  -clean   or -c    replace FONT, NOBR and CENTER tags by CSS
  -bare    or -b    strip out smart quotes and em dashes, etc.
  -numeric or -n    output numeric rather than named entities
  -errors  or -e    only show errors
  -quiet   or -q    suppress nonessential output
  -omit             omit optional end tags
  -xml              specify the input is well formed XML
  -asxml            convert HTML to well formed XHTML
  -asxhtml          convert HTML to well formed XHTML
  -ashtml           force XHTML to well formed HTML
  -access    do additional accessibility checks ( = 1, 2, 3)

Character encodings
-------------------
  -raw              output values above 127 without conversion to entities
  -ascii            use US-ASCII for output, ISO-8859-1 for input
  -latin0           use US-ASCII for output, ISO-8859-1 for input
  -latin1           use ISO-8859-1 for both input and output
  -iso2022          use ISO-2022 for both input and output
  -utf8             use UTF-8 for both input and output
  -mac              use MacRoman for input, US-ASCII for output
  -win1252          use Windows-1252 for input, US-ASCII for output
  -ibm858           use IBM-858 (CP850+Euro) for input, US-ASCII for output
  -utf16le          use UTF-16LE for both input and output
  -utf16be          use UTF-16BE for both input and output
  -utf16            use UTF-16 for both input and output
  -big5             use Big5 for both input and output
  -shiftjis         use Shift_JIS for both input and output
  -language   set the two-letter language code  (for future use)

Miscellaneous
-------------
  -version  or -v   show the version of Tidy
  -help, -h or -?   list the command line options
  -help-config      list all configuration options
  -show-config      list the current configuration settings

Use --blah blarg for any configuration option "blah" with argument "blarg"

Input/Output default to stdin/stdout respectively
Single letter options apart from -f may be combined
as in:  tidy -f errs.txt -imu foo.html
For further info on HTML see http://www.w3.org/MarkUp

tab2space

Another command, "tab2space", was distributed with tidy. This command converts all tab characters in a file to a corresponding number of spaces, preserving indentation. Its usage is very simple:

tab2space: [options] [infile [outfile]] ...
Utility to expand tabs and ensure consistent line endings
options for tab2space vers: 6th February 2003
  -help or -h     display this help message
  -dos  or -crlf  set line ends to CRLF (PC-DOS/Windows - default)
  -mac  or -cr    set line ends to CR (classic Mac OS)
  -unix or -lf    set line ends to LF (Unix)
  -tabs           preserve tabs, e.g. for Makefile
  -t           set tabs to  (default is 4) spaces

Note this utility doesn't map spaces to tabs!


Please click here to return to the advanced web page page.

The Powered by UBiquity logo, a link to the UBiquity home page

Valid HTML 4.01! Valid CSS!