Skip to main content

in reply to 𝕕𝕚𝕒𝕟𝕒 🏳️‍⚧️🦋

I break the chain. If I get an email that makes me think I need to log in to a site, I open a separate tab in the browser and log in directly, without clicking the link.
in reply to 𝕕𝕚𝕒𝕟𝕒 🏳️‍⚧️🦋

Yes, directly is good advice.

But i'm wondering about apps being able to restrict text to ascii. This substitution thing seems to be such an obvious attack vector, you'd think app standards would restrict non-ascii substitutions.

in reply to 𝕕𝕚𝕒𝕟𝕒 🏳️‍⚧️🦋

That would discriminate against other languages and cultures, where other characters are necessary and appropriate. The link destination should show the actual URL where the domain will have an "xn--" prefix if it contains Punycode encoded characters, and there are also browser plugins that will intercept links leading to suspicious destinations.
in reply to 𝕕𝕚𝕒𝕟𝕒 🏳️‍⚧️🦋

We wrestled with this at ICANN when trying to establish internationalized domain names. For a while there were a lot of scams in which looks-the-same characters from different alphabets were being used to confuse and mislead victims. For instance pretty much every language alphabet has a circular character that looks like - but is not the same as - the letter "O" in ASCII.

We ended up essentially requiring registrars/registries to refuse to accept 2nd level domain names (i.e. the name before the top level segment, e.g abc.2ndLevel.com) in which there is a mix of "scripts". (A script being a particular alphabet, such as ASCII or Cyrillic or whatever.) In other words, the names in the examples shown ought not to have been permitted by an ICANN regulated registry - which are the only registries allowed to put names into a give top level domain.

Basically if you register a name in .com, .org, etc (which are regulated by ICANN) then that name must be in one script in order to be allowed into the DNS system. (But bets are off in the country-code spaces, such as .co, .me, .us, which are not regulated by ICANN.)

Now, it is still possible in the non-ICANN-regulated deeper name spaces such as the third and more deeply nested names, e.g. fourth-level.third-level.second-level.top-level

in reply to 𝕕𝕚𝕒𝕟𝕒 🏳️‍⚧️🦋

@Will - there is a move in the ICANN universe to have "universal acceptance" - its proponents argue that every system must be able to accept all alphabets. Because I have done work in the IoT space - tiny machines - I find that to be a potentially obligation to impose on small machines that often operate in very restricted, closed environments in which pure ASCII is just fine.
@Will
in reply to 𝕕𝕚𝕒𝕟𝕒 🏳️‍⚧️🦋

Oh I forgot to add - the encoding of URLs is such that what our eyeballs perceive as the destination domain name may be overridden deeper into the URL, potentially in what looks like gibberish sequences of characters. I think, but do not know for a fact, the address bars on modern browsers can help dig through that to show the true target domain name. (BTW, because few clients walk-up the chain of TLS certificates, the fact that the target is "secure"/HTTPS may be misleading - so many of them lead to Let's Encrypt rather than have one of the better - and far from $$free - kinds of certificates that has strong authentication of server identity.)
in reply to 𝕕𝕚𝕒𝕟𝕒 🏳️‍⚧️🦋

Non-ascii characters were prohibited in RFC 1738.

URLs are written only with the graphic printable characters of the US-ASCII coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent control characters; these must be encoded.


This post makes it clear why non-ascii was prohibited.

As long as security is not the highest priority, we will be victimized.

in reply to 𝕕𝕚𝕒𝕟𝕒 🏳️‍⚧️🦋

@David - The DNS query/response protocol is full 8-bit transparent - it can carry any of the 256 possible patterns in a byte/octet. That's different from the "hostname" obligation to restrict the character set.

A lot of DNS is used not for host names - and thus is technically not subject to the "hostname" character subset. DNS is used for example to do crypto lookups or to do service location discovery. I even once had the entire text of the Magna Carta under my cavebear.com domain.

I have a domain zone that I use for DNS testing. It is a zone that has records that are 100% compliant with the DNS protocols but can cause errors, or worse, on lookups.

(For instance the null character is completely allowed by the DNS protocols, so I've got names with nulls in the middle, or CRLF in them. The nulls cause clients that use C-library string manipulation routines to terminate early and cause errors.)

The CNAME machinery is a wonderful way to turn nice looking domain names into ugly ones - for instance I have one with the revealing CNAME of "maps-to-non-ascii" that, as expected leads to a name with all kinds of strange content in the name.