I think this is more of a problem of knowing when a specific tool should be used. Probably most people familiar with hadoop are aware of all the overhead it creates. At the same time you hit a point in dataset sizes (I guess even more with “real time” data processing) where it’s not even feasible with a single machine. (at the same time I’m not too knowledgeable about hadoop and bigdata, so anyone else feel free to chime in)
Very well put comment, so much so that I realised we will probably need a best of community here on lemmy