I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).

With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.

Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.

  • aesthelete@lemmy.world
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    18 hours ago

    SSNs being duplicated would be entirely expected depending upon the table’s purpose. There are many forms of normalization in database tables.

    I mean just think about this a little bit, if the purpose is transactions or something and each row has a SSN reference in it for some reason, you’d have a duplicate SSN per transaction row.

    A tiny bit of learning SQL and you could easily see transactional totals grouped by SSN (using, get this, a group by clause). This shit is all 100% normal depending upon the normalization level of the schema. There are even – almost obviously – tradeoffs between fully normalizing data and being able to access it quickly. If I centralize the identities together and then always only put the reference id in a transactional table, every query that needs that information has to go join to it and the table can quickly become a dependency knot.

    There was a “member” table for instance in an IBM WebSphere schema that used to cause all kinds of problems, because every single record was technically a “member” so everything in the whole system had to join to it to do anything useful.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      1
      ·
      11 hours ago

      had to join to it

      I don’t think I get what this means. As you describe it, that reference id sounds comparable to a pointer, and so there should be a quick look up when you need to de-reference it, but that hardly seems like a “dependency knot”?

      I feel like this is showing my own ignorance on the back end if databasing. Can you point me to references that explain this better?

      • aesthelete@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        5 hours ago

        I’m talking about a SQL join. It’s essentially combining two tables into one set of query results and there are a number of different ways to do it.

        https://www.w3schools.com/sql/sql_join.asp

        Some joins are fast and some can be slow. It depends on a variety of different factors. But making every query require multiple joins to produce anything of use is usually pretty disastrous in real-life scenarios. That’s why one of the basics of schema design is that you usually normalize to what’s called third normal form for transactional tables, but reporting schemas are often even less normalized because that allows you to quickly put together reporting queries that don’t immediately run the database into the ground.

        DB normalization and normal forms are practically a known science, but practitioners (and sometimes DBAs) often have no clue that this stuff is relatively settled and sometimes even use a completely wrong normal form for what they are doing.

        https://en.m.wikipedia.org/wiki/Database_normalization

        In most software (setting aside well-written open source), the schema was put together by someone who didn’t even understand what normal form they were targeting or why they would target it. So the schema for one application will often be at varying forms of normalization, and schemas across different applications almost necessarily will have different normal forms within them even if they’re properly designed.

        All that said, detecting, grouping, comparing, and removing duplicates is a basic function of SQL. It’s definitely not expected that, for instance, database tables would never contain a duplicate reference to a SSN. Leon is indeed demonstrating here that he’s a complete idiot when it comes to databases. (And he goes a step further by saying the government doesn’t use SQL when it obviously does somewhere. SQL databases are so ubiquitous that just about any modern software package contains one.)