Free pdf to excel converter reddit

8/17/2023

OCR would require so much cleanup that it would be faster to simply read the records and type them into Excel. You don't have a master list of all the words that could legitimately be in those records (other than a dictionary of the entire language). Your task needs to translate every character. That's the level of handwriting OCR with state-of-the-art technology and an extremely controlled range of possibilities to compare against. When it's done and the mail gets to the carrier for delivery, the carrier knows the addresses and names on their route, and they check it all to ensure that the addresses weren't misinterpreted. Even with that, a substantial portion requires human intervention. If you can only decipher half of the characters, there still may be only one or a few possible matches. The objective is more to match the handwritten addresses to viable candidates than to get every character right. The only way they are able to do it is because the addresses are in a prescribed structure and format, and they know every possible address ahead of time. But you would still have a lot of cleanup.įor perspective, the US Postal Service has some of the most advanced handwriting recognition, which it uses to read addresses on mailpieces so they can be sorted with automated equipment. If your grandparents' records are non-cursive, and neat, legible, consistent, and similar to machine printing, OCR might do a "reasonable" job on it. In image 2, most of the numbers aren't too bad, but the text would be a problem. For image 1, even humans would have trouble figuring out what some of that is, and it would involve a lot of guessing based on context and familiarity with the words. The toughest part, though, is recognizing handwriting and converting that to computer text. In the second image, the content is mostly within the bounds of the grid, but there are lots of stray markings (slashes, underlines, etc.) that would require cleanup. You would have the additional task of separating and removing the preprinted grid from the content. And if the content overlaps preprinted lines, that introduces breaks and missing data that the computer can't easily handle. Figuring out what of that is important to you, and what could potentially be some kind of character to be translated is extremely difficult. To the computer, everything that isn't the background color is "something". You can recognize how things are aligned, and what goes with what based on context. When you look at those images, your brain is very good at sorting out what is "preprinted form", what is content, what is noise, and what is human markings that aren't relevant. That's just the task of assigning text to the right cell based on its position. But even the best can be far from perfect. Recognizing the handwritten characters and translating them to text.Ĭonsumer software and online services are available and do a reasonable job of converting machine-printed text that is in clean table format to a spreadsheet file.Recognizing the layout and translating that to cell locations.Distinguishing "content" from non-content.There are at least three difficult tasks: With any computer to which you would have access, you can't do anything useful to go from handwritten records to Excel. I have to agree with music2myear’s answer.

0 Comments

BLOG

Free pdf to excel converter reddit

Leave a Reply.

Author

Archives

Categories