|
 |
|
       |
|
      |
|
|


 |
| |
Tamil Unicode FAQ
K.Kalyanasundaram, Ph.D. |
| |
This is an
attempt to provide brief answers to frequently asked questions relating
to Tamil Unicode.
Please feel free to send in your comments and additional querries that
can be listed here.
1. What is Unicode Encoding?
Unicode is an universal font encoding scheme, designed to cover all
world languages. It is a 32-bit scheme with over 65500 slots to assign
to various languages. Each language (except few like chinese) is given a
128-slot block.
2. What is meant by Character encoding?
Tamil is a language where, in addition to the basic vowels (uyir) and
consonants (mei), the compounded (uyirmei) characters all have unique
glyph forms. Popular Tamil font encoding schemes like TAB, TSCII, TAM
are glyph based ones. As many of these unique uyirmeis with distinct
glyph forms are directly encoded in the scheme. Thus uyirmeis like ku,
pU etc are directly encoded.
Unicode, on the contrary, encodes only basic uyir and mei characters and
a set of modifiers to represent situations where the uyir/mei pair
appear as a combination (uyirmei). Unicode file stores textual
information solely at this "character" level. It does not care about the
actual form of the glyphs. Rendering of the glyphs corresponding to
stored characters is left to softwares.
3. How is Tamil Language encoded in Unicode?
All indic languages are allocated 128-slots each. Assignment of
characters to specifc slots within this block is based on ISCII (Indian
Standard Code for Information Interchange) scheme, that uses Devanagari
as the basic reference language. Thus the vowels, consonants and their
modifiers of each indic language appears at the same slot. "ka" of Tamil
and Telugu are separated by same 128 slots, greatly facilitating
programming.
4. How do Unicode fonts work?
As stated in (2), in Unicode, unique glyph forms of uyirmeis are stored
separately and are "rendered" on the screen when a unicode-based text
file is displayed using softwares.
The process of picking up these unique glyph forms of uyirmeis stored in
the font and rendering them on the screen is called "glyph substitution
(GSUB)".A new Font technology called "OpenTrueType" (OTT) has been
developed for use with Unicode.
Different platforms/Operating systems use different font-rendering
engines to handle these Unicode OTT-type fonts (use of GSUB, GPOS
tables).
To use a Unicode Tamil text, you need to have a Unicode OTT-type font
that has Tamil block (yes many unicode fonts carry only few languages)
and also the font-rendering tool/engine (a DLL) of respective platform.
5. What Operating System do I need to use Tamil Unicode?
On Windows platform, only Windows 2000 and Win XP come with the required
.dll file to handle Tamil characters. Windows ME and 98 though they are
"unicode-intelligent", they do not have the specific .dll file support
required for Tamil. So unicode Tamil texts will be rendered in a
"linear" fashion as stored in character-based scheme without glyph
substitution. Latha, Arial UnicodeMS, Code2000 are some of the Unicode
fonts that carry Tamil block.
Apple uses a different font-rendering engine called ATSUI to handle GSUB,
GPOS tables of unicode OTT fonts. Though Mac OS 9.x and X fully support
Devanagari, Gujarati and Gurmuki, their ATSUI does not support Tamil.
Tamil Linux group has developed necessary tools to enable Unicode Tamil
in this platform.
6. What application softwares do I need for Tamil Unicode in Windows
?
Even if you use Win 2K/XP, you need "compatible" application softwares
to handle Tamil Unicode in these. MS Office 2000 appeared before
Windows2000 release and hence displays unicode Tamil text in linear
fashion even when used in Windows 2000 OS!. So you need to use recent
Office XP package with Win 2000.
Alternate choice is to use a simple text editor like Notepad or WordPad.
7. What keyboard do I need to input Tamil Unicode in Windows?
Windows 2000 and Win XP come with a special "on-screen keyboard"
(available under "accessories") that allows unicode Tamil Text input.
This keyboard is based on "inscript" keyboard layout used widely in
India for use with ISCII-based softwares. Thus the key to type "ka" is
the same whether you type in Tamil, Hindi or Telugu unicode Text.
8. How do I know if a given Tamil font is of Unicode kind and also
includes Tamil block?
On Windows 2000/XP, you can use the "character-map" utility available
within /accessories/system tools/ to look at the contents of all fonts
installed in your computer. Select the font you are interested on the
top and also "unicode-Tamil" as the block you want to look at. If your
font is based on Unicode and has Tamil support, you should be able to
see a set of all basic characters that are defined in the Unicode Tamil
block and also additional Tamil glyphs stored in the font.
9. If I prepare the text in TSCII or TAB, is there a text convertor
to convert it to Unicode format?
Yes, Murasu Anjal 2000SE comes with a Text Convertor that allows you to
open any TSCII or TAB or other popular Text Encodings (has even an auto
encoding detection tool incorporated) and generate equivalent unicode
text.
Remember that, for use in Web pages, the Unicode based text must be
stored in UTF-8 format.
10. How are unicode-based Tamil texts handled on the Web?
For use in the Web/Net, Unicode based texts are to be stored in the
source/html files in a specific UTF-8 format. (Notepad allows you to
save Unicode Texts in this UTF-8 format. So if you know how to add html
tags yourself, you can prepare a Tamil webpage using Notepad alone).
11. What browsers do I need to view Unicode Tamil webpages?
As stated under (5) and (6), only few operating systems and application
softwares currently "fully" support Tamil Unicode.
Netscape browser 4.6 onwards and Internet Explorer 4 onwards are unicode-intelligent.
Hence if they are used in conjunction with Win 2k/XP, they will display
Tamil webpages correctly.
Because Unicode-based texts are stored in UTF-8 format, you need to set
the browser also correspondingly before viewing Unicode Tamil webpages.
Two things you need to do: a) select a unicode font that carries Tamil
block as the default font for use with unicode encoding/char-set and b)
set also the browser to display the webpage in UTF-8 format. If you have
done(a), reload the page if necessary?
For reasons indicated under (5) and(6), same Netscape or IE browsers
used in Win ME or 98 will display Unicode Tamil texts in "linear form"
only. It should be possible to use "dynamic fonts technology" with eot-type
fonts to render Unicode Tamil texts correctly in these platforms, but
this is yet to be demonstrated.
12. How about current support to Unicode Tamil texts in Adobe PDF?
Adobe Acrobat 4 allows you to prepare PDF files of Unicode Tamil texts
without any problem. With "font embedding" option, PDF files are
readable integrally in Windows 2000/XP and also in Macintosh OS 9 and X
(though in the latter case, ATSUI engine does not yet support Tamil).
13. What tools do I need to prepare a Unicode font with support for
Tamil block elements?
Unicode fonts are of a special kind OTT (OpenTrueType) unlike 8-bit
bilingual fonts used for TAB &TSCII (Truetype). Preparation of an OTT
font proceeds in two distinct steps:
stage i) preparation of a TT font with all glyphs you want to include in
the font using one of the Font-editing softwares that support Unicode
encoding. Currently these are Font Creator, FontEdit and Fontographer.
With these you can name the glyphs to have Unicode-based naming and
numbering. stage ii) preparation of glyph positioning (GPOS) and glyph
substitution (GSUB) tables and bundle these along with the glyph outline
files to create OTT for use in Windows. Best software for this purpose
is MS VOLT, distributed free by Microsoft to registered software
professionals.
நன்றி: மின்மஞ்சரி |
| |
|
|
 |
|
|