The past few years have witnessed an unprecedented growth in the amount of digitalized books available through different open access libraries. The benefits of the digitalization campaigns that Google among others carries out, are obvious. With a good reason, historians have cheered for the democratization of knowledge. A multitude of historical books, journals, and pamphlets are no longer accessible only for those who are affiliated to institutions that pay the hefty fees of the commercially operated and owned databases. Instead, the web is full of electronic books covering a wide range of printed material that include titles and genres that could be labelled ephemeral. Thanks to this, scholars have discovered forgotten authors and works. What is more, the digitalization allows historians also to consult books that are not stored in the libraries of their home institutions without the costly traveling to the major research libraries around the world. Hence, it is not surprising that I give my full support to this initiative; without such sites as Open Library or Internet Archive I could not even dream of a project that builds on paratexts in hundreds of late-Victorian histories and is based in Helsinki. However, the excitement that this all has caused in the academic community has led many to close their eyes from the defects of the digitalized material. Those who produce the copies have been equally blinded by the frenzy of digital humanities, data mining, and other buzz words. Following from this, the quantity has overruled quality. As a heavy user of the online Victorian histories, I address these problems with a dose of discomfort, because even with the shortcomings the e-books are vital for my project. But as a heavy user, I am also painfully aware that there are too many blunders in the digitalized copies that could have been avoided with an improved quality control. Hence, I trust that only by casting light on these issues, it is possible to increase the quality and usability of this rich material.
The infamous pink finger and other mishaps
The first time I encountered the infamous pink finger, I was amused. Since then, I have encountered too many fingers in all the colours of rainbow and fallen out of love with them. Or, with any other incident that causes the text becoming illegible:
The faulty copies
Then there are those copies that are missing pages. Of course printing caused mistakes in books during the nineteenth century and books got damaged when they were being read read. The front matter was particularly susceptible to spoiling over time. Because of this, I do not fret about missing half title pages, but what I do fret about are the obvious mistakes that careless digitalization has caused. John Richard Green’s Making of England, digitalized by Google in “Oxford Libraries,” has four times the pages 50 and 51, but lacks the pages 54, 55, 60, 61, and 63. It might lack even more pages, but at this point I gave up and hunted down another copy that had all the pages.
Unexpectedly, the digitalization can also cause books to build up extra weight. So far I have located only one online copy the very first edition of John Richard Green’s Short History of the English People (1874) and it happens to suffer from serious swelling. First, there is the front matter just as there should be. Then there are the first 39 pages. Then there is the front matter again and 23 pages of the text. And yes, you guessed it right: the front matter shows up once more before we get the entire text of 847 pages – obviously with an appropriate number of partially illegible pages.
The pink fingers, blurred text, and missing pages are a nuisance and lots of time is wasted for a search for an alternative copy that hopefully is free of blunders. Another turn side of these frequent inaccuracies is that they create problems for projects that use this material for drawing quantitative data. When the bottom of a page is blurred or part of the books is missing, counting and classifying footnotes for statistical purposes gets tricky.
Victorian historians were prone to write multivolume histories which often appeared later on as revised editions. Consequently, the volume and edition are crucial bibliographical information for anyone who is researching nineteenth-century history books. This, however, is not always obvious to those who upload the digitalized copies to the web. Mostly the details about volumes and editions are not readily available and unearthing this information requires clicking through the potentially long list of hits that the search produced. The Internet Archive holds 46 different copies of William Stubbs’s three-volume Constitutional History of England. Eighteen of them mentions the volume, but the edition can be discovered only by clicking the volume. Edward Freeman’s six-volume History of the Norman Conquest of England produces 57 results, of which only fifteen includes the number of the volume.
The mismatched titles and copies are another source of bibliographical confusion. Just another day I traced Edward Freeman’s Old English History for Children and the Internet Archive gave me five hits. I was glad that altogether five copies were produced, because two of them turned out to be the Journal of the American Geographical Society of New York. When I searched for the same title through the Open Library, I got this:
I trust that this is enough said about the blunders. It should be obvious by now that we do not need any more hastily digitized books. Instead, we need a complete volte-face in the construction of online libraries: priority should be given to quality, not to quantity.