This is exactly the sort of tradeoff I was wondering about, thank you so much for mentioning this. I think ultimately I would probably align with you in prioritizing answer quality over context length (but it sure would be nice to have both!!) I think my plan for now based on some of the other comments is to go ahead with the NAS build and keep my eyes peeled for any GPU deals in the meantime (though honestly I am not holding my breath). Once I’ve proved to myself I can something stable without burning the house down, I’ll on something more powerful for the localLLM. Thanks again for sharing!
Thanks for sharing! Will probably try to go this route once I get the NAS squared away and turn back to localLLMs. Out of curiosity, are you using the q4_k_m quantization type?