MozillaZine

Are all HTML character entities valid in unicode?

Discuss how to use and promote Web standards with the Mozilla Gecko engine.
chimerical
 
Posts: 215
Joined: November 15th, 2003, 2:50 am
Location: SF, CA
November 6th, 2009, 12:42 pm

Post Posted November 6th, 2009, 12:42 pm

Are all HTML character entities valid in unicode? Are they ASCII only?

Additionally, how would they fare with encodings like these:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />

trolly
Moderator

User avatar
 
Posts: 33196
Joined: August 22nd, 2005, 7:25 am
November 6th, 2009, 1:12 pm

Post Posted November 6th, 2009, 1:12 pm

Yes.
UTF-8 is a unicode encoding. ISO-8859-1 not so it can only represent a small subset of unicode except using HTML entities. HTML entities are an ascii encoding of unicode characters.
Think for yourself.
Otherwise you have to believe what other people tell you.

chimerical
 
Posts: 215
Joined: November 15th, 2003, 2:50 am
Location: SF, CA
November 6th, 2009, 3:34 pm

Post Posted November 6th, 2009, 3:34 pm

When you say ISO-8859-1... "except" using HTML entities, do you mean that ISO-8859-1 represents HTML entities + subset of unicode? And using HTML entities is acceptable practice in both cases?

trolly
Moderator

User avatar
 
Posts: 33196
Joined: August 22nd, 2005, 7:25 am
November 7th, 2009, 7:08 am

Post Posted November 7th, 2009, 7:08 am

HTML entities are special codes like &amp; for & .There are lots (better most) of unicode characters which can only be written as entities in ISO. In UTF-8 you can (theoretically) write them directly without using HTML entities. ISO has a range of 256 characters while unicode has a range of (theoretically) 4 billion characters.
Think for yourself.
Otherwise you have to believe what other people tell you.

daniel219
 
Posts: 1
Joined: November 11th, 2009, 1:55 am
November 11th, 2009, 2:03 am

Post Posted November 11th, 2009, 2:03 am

HTML codes are valid only in unicode they dont have any relationship with ASCII

peter.reisio

User avatar
 
Posts: 3029
Joined: March 3rd, 2004, 6:57 pm
November 11th, 2009, 8:01 am

Post Posted November 11th, 2009, 8:01 am

Just to clarify, since daniel219's post is misleading...

Many are defined in HTML.

You don't need any if you use UTF-8 and send your code as UTF-8 (meta elements are a truly gimpy alternative), though for certain situations you may well save time by typing &copy; rather than knowing how to input ©.

Pim

User avatar
 
Posts: 1542
Joined: May 17th, 2004, 2:04 pm
Location: Netherlands
November 13th, 2009, 2:34 am

Post Posted November 13th, 2009, 2:34 am

I don't think daniel219's post is misleading at all in relationship with the original question.
chimerical wrote:Are all HTML character entities valid in unicode? Are they ASCII only?

daniel219 wrote:HTML codes are valid only in unicode they dont have any relationship with ASCII

simply saying that you can use entities in any HTML file (no matter if the file's charset is USASCII or otherwise), and that the results are always Unicode, even with numerical entities that have the same value as a value in the charset.
For instance, even if the charset is x-ebcdic, &#80; should always yield 'p' rather than '&'.
And with 'valid in Unicode', he means that a code like &#55555; is not a valid character, even if the charset is, ehm, well, some encoding where 55555 is a valid character code.
Groetjes, Pim

peter.reisio

User avatar
 
Posts: 3029
Joined: March 3rd, 2004, 6:57 pm
November 13th, 2009, 9:19 am

Post Posted November 13th, 2009, 9:19 am

By misleading I meant wrong. :p

chimerical
 
Posts: 215
Joined: November 15th, 2003, 2:50 am
Location: SF, CA
November 13th, 2009, 11:47 am

Post Posted November 13th, 2009, 11:47 am

I see. Good to know, guys! Yeah, I was mainly asking because I wanted to know if something like &raquo; would display nothing if I declare a random encoding in the meta tag, but it seems like they would all understand this and display it properly.

peter.reisio

User avatar
 
Posts: 3029
Joined: March 3rd, 2004, 6:57 pm
November 13th, 2009, 12:38 pm

Post Posted November 13th, 2009, 12:38 pm

Yes.

dtobias
 
Posts: 1842
Joined: November 9th, 2002, 3:35 pm
Location: Boca Raton, FL
November 13th, 2009, 1:21 pm

Post Posted November 13th, 2009, 1:21 pm

Some old broken browsers may have treated entities and numeric references differently based on the character encoding, but that's not what the standards call for, and not what current browsers such as Firefox do.
Dan's Web Tips: http://webtips.dan.info/
Dan's Domain Site: http://domains.dan.info/
Dan's Mail Format Site: http://mailformat.dan.info/

Franadora

User avatar
 
Posts: 992
Joined: July 18th, 2004, 8:52 pm
Location: NC
November 13th, 2009, 11:15 pm

Post Posted November 13th, 2009, 11:15 pm

by "old broken browsers" do you mean IE6 - I only ask because it's still around and is sometimes not acting like modern browsers.
Wenn es nicht spassig ist, ist es nicht richtig gemacht.
--WindowsXP: Thunderbird 2.0.0.19; Asus Linux (Xandro)

Pim

User avatar
 
Posts: 1542
Joined: May 17th, 2004, 2:04 pm
Location: Netherlands
November 14th, 2009, 1:38 am

Post Posted November 14th, 2009, 1:38 am

I don't have IE6 here, so someone correct me if I'm wrong, but I'm fairly sure it treated *most* entities correctly. Exceptions are the ones between &#x80; and &#x9F; it will act as if those are in the current charset. (e.g. &#x9F; will display a 'Ÿ' when the current charset is windows-1252).
Groetjes, Pim

dtobias
 
Posts: 1842
Joined: November 9th, 2002, 3:35 pm
Location: Boca Raton, FL
November 16th, 2009, 11:34 am

Post Posted November 16th, 2009, 11:34 am

You might have to go back to really ancient browsers like Netscape 3.0 to find really broken behavior in this area, but there was a time when it was still a problem.
Dan's Web Tips: http://webtips.dan.info/
Dan's Domain Site: http://domains.dan.info/
Dan's Mail Format Site: http://mailformat.dan.info/

Return to Web Development / Standards Evangelism


Who is online

Users browsing this forum: No registered users and 3 guests