sön 2005-01-16 klockan 10.04 skrev Erik B. Pedersen:
One possible problem which this procedure does not address, I think, is what in a database is sometimes called the updating anomaly. Suppose Jörg Vetenskaper and Frederik Pedant both happen to download the same text for OCR conversion, and therefore duplicate the work. It may seem unlikely, but if unnecessary duplication of effort can be avoided, it would be best to do so.
Is it possible to add a mechanism to the http://runeberg.org/upload.pl?mode=ocrlist page that records which files have been "checked out" for OCR conversion, so that no one else will download the same work unnecessarily?
I'm considering this. I don't want to lock the work outright (everyone that wants to should be able to download the images), but perhaps there should be some kind of mechanism to state when you are downloading the images that you plan to run OCR, and if someone does this for a work, subsequent downloaders are told this for a certain period after that.
Another point might be to somehow record the name and email address of the person who downloads a ZIP file of images, and then have a method to automatically send that person a friendly email periodically, say every fortnight, requesting a progress report, until such time as the corresponding OCR files are eventually uploaded.
I can imagine people downloading the tiff images for other purposes than doing OCR (getting at high-resolution illustrations, perhaps), so I don't think this is a good idea.
Maybe it's a bit of trouble, but I think that, if I had the appropriate OCR software to do this sort of work, I would be reluctant to undertake it, knowing that another person was doing the same scan conversion at the same time.
I don't think this will turn out to be a problem in reality, but we'll see. For now, I'll let this be as it is for a little while and complete another related feature.
I'll get back to this feature in a little while, and hopefully it will then have seen some use so that I have some actual use cases to evaluate it with.
Hans