![](/static/253f0d9b/assets/icons/icon-96x96.png)
![](https://beehaw.org/pictrs/image/1be75b15-2f18-429d-acf7-dcea8e512a4b.png)
I think this is more of a problem of knowing when a specific tool should be used. Probably most people familiar with hadoop are aware of all the overhead it creates. At the same time you hit a point in dataset sizes (I guess even more with “real time” data processing) where it’s not even feasible with a single machine. (at the same time I’m not too knowledgeable about hadoop and bigdata, so anyone else feel free to chime in)
Very well put comment, so much so that I realised we will probably need a best of community here on lemmy