(Also filed as psf/requests#5845 on GitHub.)
requests Python HTTP library currently uses the
idna library to check input URLs for IDNA2008 compliance, and rejects URLs that don’t comply. This breaks non-compliant URLs with emoji characters, like http://☃.net/, which you all said was intentional (more background), since those domains’ time is arguably limited, ie they’re effectively “dead domains walking.” Understood.
However, not all TLDs require IDNA2008 compliance. Unlike gTLDs, ccTLDs generally get to choose their own domain policies – background from Wikipedia, ICANN, a GoDaddy representative – and a handful of them have stuck with IDNA2003, UTS#46, or related variants. (Not to mention older proprietary schemes like ThaiURL 😁.) For example, .ws, .la, .ai, .to, and .fm evidently explicitly allow emoji.
Similarly, afaik domain owners can do whatever they want with their own subdomains. So thanks to Punycode, third level (and beyond) hostnames like 🌏➡➡❤🔒.ayeshious.com and 🔒🔒🔒.scotthelme.co.uk seem to not be at risk of breaking due to gTLD registries enforcing IDNA2008 on pay-level domain registrations.
Any chance you all could relax the IDNA2008 requirement so that you support both of those kinds of domains?
Right now, I’m working around this with code like this, using the
domain2idna library, to support at least IDNA2003 in addition to IDNA2008. It’d be nice not to have to.
try: resp = requests.get(url, ...) except requests.exceptions.InvalidURL: punycode = domain2idna(url) if punycode != url: # the domain is valid idna2003 but not idna2008. encode and try again. resp = requests.get(punycode, ...)
Thanks again for listening, and for maintaining requests!