• Kerb@discuss.tchncs.de
      link
      fedilink
      arrow-up
      72
      ·
      1 year ago

      unicode is great isnt it,
      it supports almost all writing systems

      for example here is the transcript of one of the complaints about ea-nasirs shitty copper :

      𒀀 𒈾 𒂍 𒀀 𒈾 𒍢 𒅕

      𒀀 𒈾 𒂍 𒀀 𒈾 𒍢 𒅕
      𒆠 𒉈 𒈠
      𒌝 𒈠 𒈾 𒀭 𒉌 𒈠
      𒀀 𒉡 𒌑 𒈠 𒋫 𒀠 𒇷 𒆪
      𒆠 𒀀 𒄠 𒋫 𒀝 𒁉 𒄠
      𒌝 𒈠 𒀜 𒋫 𒀀 𒈠
      𒄖 𒁀 𒊑 𒁕 𒄠 𒆪 𒁴
      𒀀 𒈾 𒄀 𒅖 𒀭 𒂗𒍪 𒀀 𒈾 𒀜 𒁲 𒅔
      𒋫 𒀠 𒇷 𒅅 𒈠 𒋫 𒀝 𒁉 𒀀 𒄠
      𒌑 𒆷 𒋼 𒁍 𒍑
      𒄖 𒁀 𒊑 𒆷 𒁕 𒄠 𒆪 𒁴
      𒀀 𒈾 𒈠 𒅈 𒅆 𒅁 𒊑 𒅀
      𒋫 𒀸 𒆪 𒌦 𒈠 𒌝 𒈠 𒀜 𒋫 𒈠
      𒋳 𒈠 𒋼 𒇷 𒆠 𒀀 𒇷 𒆠 𒀀
      𒋳 𒈠 [𒆷] 𒋼 𒇷 𒆠 𒀀 𒀜 𒆷 𒅗
      𒅀 𒋾 𒀀 𒈾 𒆠 𒈠 𒈠 𒀭 𒉌 𒅎
      𒌅 𒅆 𒅎 𒈠 𒉌 𒈠
      𒆠 𒀀 𒄠 𒋼 𒈨 𒊭 𒀭 𒉌
      𒈠 𒊑 𒀀 𒉿 𒇷 𒀀 𒈾 𒆠 𒈠 𒅗 𒋾
      𒀀 𒈾 𒆠 𒋛 𒅀 𒈠 𒄩 𒊑 𒅎
      𒀸 𒁍 𒊏 𒄠 𒈠
      𒌅 𒈨 𒄿 𒊭 𒄠 𒈠
      𒄿 𒈾 𒂵 𒂵 𒅈 𒈾 𒀝 𒊑 𒅎
      𒅖 𒋾 𒅖 𒋗 𒅇 𒅆 𒉌 𒋗
      𒊑 𒆪 𒋢 𒉡 𒌅 𒋼 𒅕 𒊏 𒄠
      𒄿 𒈾 𒀀 𒇷 𒅅 𒋼 𒂖 𒈬 𒌦
      𒈠 𒀭 𒉡 𒌝 𒊭 𒆠 𒀀 𒄠
      𒄿 𒁍 𒊭 𒀭 𒉌 𒄿 𒈠
      𒀜 𒋫 𒈠 𒅈 𒅆 𒅁 𒊑 𒅀 𒌅 𒈨 𒂊 𒅖
      𒀀 𒈾 𒈠 𒆷 𒅗 𒊍 𒉿 𒅎
      𒊭 𒄿 𒈾 𒂵 𒋾 𒅀 𒌅 𒊺 𒍪 𒌑
      𒆠 𒀀 𒄠 𒋫 𒁕 𒁍 𒌒
      𒅇 𒀸 𒋳 𒄿 𒅗
      𒀀 𒈾 𒂍 𒃲 𒇷
      𒌋 𒐍 𒄘 𒍏 𒀀 𒈾 𒆪 𒀜 𒁲 𒅔
      𒅇 𒋗 𒈪 𒀀 𒁍 𒌝
      𒌋 𒐍 𒄘 𒍏 𒄿 𒁲 𒅔
      𒂊 𒍣 𒅁 𒊭 𒀀 𒈾 𒂍 𒀭 𒌓
      𒆪 𒉡 𒊌 𒅗 𒄠 𒉌 𒍣 𒁍
      𒀀 𒈾 𒉿 𒊑 𒅎 𒊭 𒀀 𒋾
      𒆠 𒄿 𒋼 𒁍 𒊭 𒀭 𒉌
      𒆠 𒋛 𒄿 𒈾 𒂵 𒂵 𒅈 𒈾 𒀝 𒊑
      𒌅 𒊌 𒋾 𒅋
      𒆠 𒋛 𒀀 𒈾 𒂵 𒋾 𒅀
      𒋗 𒇻 𒈠 𒄠 𒂊 𒇷 𒅗 𒄿 𒋗
      𒆠 𒈠 𒀭 𒉌 𒆠 𒀀 𒄠
      𒉿 𒊑 𒀀 𒄠 𒆷 𒁺 𒈬 𒂵 𒄠
      𒆷 𒀀 𒈠 𒄩 𒊒 𒅗 𒋫 𒆷 𒈠 𒀜
      𒄿 𒈾 𒆠 𒊓 𒇷 𒅀
      𒅖 𒋾 𒈾 𒀀 𒌑 𒈾 𒍝 𒀝 𒈠
      𒂊 𒇷 𒆠
      𒅇 𒀀 𒈾 𒊭 𒌅 𒈨 𒄿 𒊭 𒀭 𒉌
      𒈾 𒋛 𒄴 𒋫 𒄠 𒂊 𒁍 𒍑 𒅗

      in the original cuneiform as a copypasta

    • sarmale
      link
      fedilink
      arrow-up
      8
      ·
      1 year ago

      How many unicode characters could you add to the standard until it becomes unreliable?

      • Kerb@discuss.tchncs.de
        link
        fedilink
        arrow-up
        28
        ·
        edit-2
        1 year ago

        aparently unicode supports about 1.1 million characters, and we currently only use 96,382 as of version 4.0

        EDIT: i just read that unicode 4.0 is very outdated, current version is unicode 15.1 with 149,878 characters.

      • A Unicode character can be up to 4 bytes, so 2^32 or 4,294,967,296 potential unique characters. And it’d be easy enough to adjust the standard to allow for an extra byte(s) if necessary – it’s been done before.

        • Turun@feddit.de
          link
          fedilink
          arrow-up
          4
          ·
          edit-2
          1 year ago

          This is incorrect. While in UTF-32 a character (actually a code point) requires 4 bytes, and in UTF-8 up to 4 bytes, the Unicode standard is limited to 17*2^16 code points. (edit: apparently because that is the limit of UTF-16. 4 Byte UTF-8 can encode 2^21 code points, but it is not technically limited to four bytes, so in total is a ble to encode 2^31 code points)

          Unicode is the standard that says “the thing we call captial A is the 65th character”, literally defining a mapping from numbers to concepts.
          UTF-8 or UTF-32 are a way to encode a list of numbers in a more (UTF-8) or less (UTF-32) efficient way.