I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).
With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.
Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.
Because of course the government uses SQL. It’s as stupid as saying the government doesn’t use electricity or something equally stupid. The government is myriad agencies running myriad programs on myriad hardware with myriad people. My damned computers at home are using at least 2-3 SQL databases for some of the programs I run.
SQL is damn near everywhere where data sets are found.
It’s entirely possible that the database is pre SQL.
Aha Airforce one likely uses SQL
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
SSNs being duplicated would be entirely expected depending upon the table’s purpose. There are many forms of normalization in database tables.
I mean just think about this a little bit, if the purpose is transactions or something and each row has a SSN reference in it for some reason, you’d have a duplicate SSN per transaction row.
A tiny bit of learning SQL and you could easily see transactional totals grouped by SSN (using, get this, a group by clause). This shit is all 100% normal depending upon the normalization level of the schema. There are even – almost obviously – tradeoffs between fully normalizing data and being able to access it quickly. If I centralize the identities together and then always only put the reference id in a transactional table, every query that needs that information has to go join to it and the table can quickly become a dependency knot.
There was a “member” table for instance in an IBM WebSphere schema that used to cause all kinds of problems, because every single record was technically a “member” so everything in the whole system had to join to it to do anything useful.
I don’t think I get what this means. As you describe it, that reference id sounds comparable to a pointer, and so there should be a quick look up when you need to de-reference it, but that hardly seems like a “dependency knot”?
I feel like this is showing my own ignorance on the back end if databasing. Can you point me to references that explain this better?
I’m talking about a SQL join. It’s essentially combining two tables into one set of query results and there are a number of different ways to do it.
https://www.w3schools.com/sql/sql_join.asp
Some joins are fast and some can be slow. It depends on a variety of different factors. But making every query require multiple joins to produce anything of use is usually pretty disastrous in real-life scenarios. That’s why one of the basics of schema design is that you usually normalize to what’s called third normal form for transactional tables, but reporting schemas are often even less normalized because that allows you to quickly put together reporting queries that don’t immediately run the database into the ground.
DB normalization and normal forms are practically a known science, but practitioners (and sometimes DBAs) often have no clue that this stuff is relatively settled and sometimes even use a completely wrong normal form for what they are doing.
https://en.m.wikipedia.org/wiki/Database_normalization
In most software (setting aside well-written open source), the schema was put together by someone who didn’t even understand what normal form they were targeting or why they would target it. So the schema for one application will often be at varying forms of normalization, and schemas across different applications almost necessarily will have different normal forms within them even if they’re properly designed.
All that said, detecting, grouping, comparing, and removing duplicates is a basic function of SQL. It’s definitely not expected that, for instance, database tables would never contain a duplicate reference to a SSN. Leon is indeed demonstrating here that he’s a complete idiot when it comes to databases. (And he goes a step further by saying the government doesn’t use SQL when it obviously does somewhere. SQL databases are so ubiquitous that just about any modern software package contains one.)
Oh, well another user pointed out that SSN’s are not unique, I think they are recycled after death or something. In any case, I do know that when the SSN system was first created it was created by people who said this is NOT MEANT to be treated as unique identifiers for our populace, and if it were it would be more comprehensive than an unsecure string of numbers that anyone can get their hands on. But lo and behold, we never created a proper solution and we ended up using SSN’s for identity purposes. Poop.
I’m pretty sure there is a federal statute that says ONLY the SSA may collect or use SSNs, as to federal agencies. I argued it once when a federal agency court tried to tell me that it couldn’t process part of my client’s case without it. I didn’t care but my client was crotchety and would only even give me the last four.
Edit. It’s a regulation:
https://www.law.cornell.edu/cfr/text/28/802.23
An agency cannot require disclosure of an SSN for any right or benefit unless a specific federal statute requires it or the agency required the disclosure prior to 1975.
In my case the agency got back to me with some federal statute that didn’t say what they said it said, and eventually they had to admit they were wrong.