HTML Tidy is a utility to clean up and pretty print HTML, XHTML, and XML files. This version was downloaded and installed on the SENS Solaris/UBiquity systems on March 3, 2003. A version for Microsoft Windows is available here, with a user interface available from this site.
Care should be taken, as this program can overwrite your web files, and its method of formatting pages might be much different from yours.
To check a page without changing anything, use this command:
tidy -errors mypage.html
To convert a file in-place:
tidy -modify mypage.html
To convert a file from HTML to XHTML, placing the output in a new file:
tidy -o mynewpage.html -asxhtml mypage.html
The following online documents are available:
Here is the full list of command-line options, obtained by running the command "tidy -help":
tidy [option...] [file...] [option...] [file...] Utility to clean up and pretty print HTML/XHTML/XML see http://tidy.sourgeforge.net/ Options for HTML Tidy for Solaris released on 1st March 2003: File manipulation ----------------- -out or -ospecify the output markup file -config set configuration options from the specified -f write errors to the specified -modify or -m modify the original input files Processing directives --------------------- -indent or -i indent element content -wrap wrap text at the specified (default is 68) -upper or -u force tags to upper case (default is lower case) -clean or -c replace FONT, NOBR and CENTER tags by CSS -bare or -b strip out smart quotes and em dashes, etc. -numeric or -n output numeric rather than named entities -errors or -e only show errors -quiet or -q suppress nonessential output -omit omit optional end tags -xml specify the input is well formed XML -asxml convert HTML to well formed XHTML -asxhtml convert HTML to well formed XHTML -ashtml force XHTML to well formed HTML -access do additional accessibility checks ( = 1, 2, 3) Character encodings ------------------- -raw output values above 127 without conversion to entities -ascii use US-ASCII for output, ISO-8859-1 for input -latin0 use US-ASCII for output, ISO-8859-1 for input -latin1 use ISO-8859-1 for both input and output -iso2022 use ISO-2022 for both input and output -utf8 use UTF-8 for both input and output -mac use MacRoman for input, US-ASCII for output -win1252 use Windows-1252 for input, US-ASCII for output -ibm858 use IBM-858 (CP850+Euro) for input, US-ASCII for output -utf16le use UTF-16LE for both input and output -utf16be use UTF-16BE for both input and output -utf16 use UTF-16 for both input and output -big5 use Big5 for both input and output -shiftjis use Shift_JIS for both input and output -language set the two-letter language code (for future use) Miscellaneous ------------- -version or -v show the version of Tidy -help, -h or -? list the command line options -help-config list all configuration options -show-config list the current configuration settings Use --blah blarg for any configuration option "blah" with argument "blarg" Input/Output default to stdin/stdout respectively Single letter options apart from -f may be combined as in: tidy -f errs.txt -imu foo.html For further info on HTML see http://www.w3.org/MarkUp
Another command, "tab2space", was distributed with tidy. This command converts all tab characters in a file to a corresponding number of spaces, preserving indentation. Its usage is very simple:
tab2space: [options] [infile [outfile]] ... Utility to expand tabs and ensure consistent line endings options for tab2space vers: 6th February 2003 -help or -h display this help message -dos or -crlf set line ends to CRLF (PC-DOS/Windows - default) -mac or -cr set line ends to CR (classic Mac OS) -unix or -lf set line ends to LF (Unix) -tabs preserve tabs, e.g. for Makefile -tset tabs to (default is 4) spaces Note this utility doesn't map spaces to tabs!