Hello, I have some letters handwritten by my great-grandfather from the Mauthausen concentration camp in 1943/1944. Few of them have been transcribed by hand. They are quite a lot and really not easy to read (you can understand the situation) also if the pen trace is good and well preserved.
I am wondering if some of these new AI tools can help me transcribe them. I don’t expect an automatic transcription, but any help would be welcome 😊
What’s your level of tech savviness?
It sounds like the letters are difficult enough to read by eye, and that provides certain challenges for transcription. However, some pre-processing of the image could sharpen that text and doesn’t even require AI.
I can’t really speak for the off-the-shelf solutions available online today because I don’t really use them. They’re virtually all privacy monsters. That doesn’t mean they won’t work or meet your needs, but be aware that if you use them then your great-grandparent’s most intimate accounts would be put on some corpo computer and later used to sell something.
If you do feel comfortable with a little scripting, then I’d recommend OpenCV or similar to sharpen your images and some open source OCR library to do the transcription. I haven’t done much OCR myself, but a quick search suggests maybe Tesseract would be easy to use.
If you’re not into scripting yourself, but that sounds appealing, I’d be happy to take a crack at it. Those documents are important and worth preserving. Note that I only speak/read English though, so I might not be great at assessing the quality of the transcription.
If you don’t feel comfortable sending a stranger your grandparents’ letters then that’s completely fair and understandable. But if that’s the case then I definitely wouldn’t be using an online tool for it
Hello, thanks for your suggestion. The letters are good quality and well preserved, except for some of them. It’s not difficult to read them, but they are hand written in italic and I have some problems understanding the calligraphy in some points. I’ll try some open source library for sure, it worth a try. Thanks! :-)
No direct help from my side but I wanted to wish you good luck!
It would perhaps help others to help you if you briefly outlined what you tried already and where it strugglesI e in theory chatgpt can transcribe and I think so can Claude. If you can give an example page even better.
If you use an already manually transcribed page you can even make a quick diff to estimate quality.
Again, good luck!
Well, you could try some of the established services:
- https://chat.mistral.ai/ (https://mistral.ai/news/pixtral-large/)
- https://claude.ai/
- https://chatgpt.com/ (or https://copilot.microsoft.com/)
They all want you to sign up but offer some free service. Try one of the letters you already have a transcription for, and see how well they fare. I don’t have the slightest clue if they can read Sütterlinschrift or whatever your great-grandfather used to write in. You’d take a picture or scan, upload the file to the chatbot and add an instruction to transcribe the handwriting. And keep us posted if you do.
Thanks, it’s a good idea. I’ll try uploading the original letter and the transcription, then I’ll try to ask to read another letters
Hope it works out. I’m not certain. I’ve seen some letters from that era and I know they can be hard to decipher. If you like, you can share a sample picture here. For us to try… But don’t do it if it’s too much personal information.
Here you can find a letter. As you can see, it’s pretty good quality but it’s difficult to read (it’s in Italian). In the next days I’ll try with chatgpt and company 😊
Well, I fed it into ChatGPT and Le Chat (Mistral). I’d say ChatGPT does a bit better. It gets a lot more words than I do. But there are quite some obvious errors. And I don’t speak italian, so it’s hard for me to judge and make sense of it. I’d say this still requires a lot of manual labor. But the AI transcription attempt will help massively.
Seems he’s talking about his health which got better. And then the work and daily routine including times. And they’re 1000 men and women in service(?) and 250 people in his barracks.
Some services I didn’t try but heard are good, too: Claude (requires sign-up with telephone number, which I refuse), and Google Cloud Vision (part of the business cloud services by Google). My traditional OCR solution (tesseract) outputs gibberish. I tried that, just to make sure.
I won’t post the output, since it’s not usable as is. But you’ll see for yourself. I’m certainly surprised by how well ChatGPT does in deciphering the words. Probably enough for an italian speaker to complete the task.
Thank you very much for your test. I tried to load (ChatGPT) another letter and the initial part was so good that I was really surprised, but other parts of the transcription was non-sense. Anyway, as you said, it will be a good starting point at least to understand some words and sentences that can make the rest of the text more understandable