Following the mediatic riots occurred earlier this week, because of the medical advice given to British people to be careful with overroasting their toasts and their potatoes, I posted a blog on...
As part of my project I’ve been playing around with different ways of working with the EEBO-TCP records, and I thought I’d share what I’ve found – if you have any comments, suggestions, or if you know of any other similar tools, please let me know!
TCP is the Text Creation Partnership, and they are working on transcribing books from Early English Books Online (EEBO) to make fully-searchable text files. So far they’ve done around 40,000, and the work is ongoing. The text files are available on the EEBO website – if a book is transcribed, a little page icon appears among the other options. EEBO can also search the text itself, listing results under each book title. The EEBO database is great for lots of things – you can switch easily between the text file and the page images, and it’s simple to navigate to different parts of the book. For actually reading early modern books, it’s the best.
But if you have too many results, or not much time, EEBO can be a pain. The two websites discussed here offer alternative ways of using EEBO-TCP data.
CQPweb was built by Andrew Hardie of Lancaster University, and it includes a corpus version of the EEBO-TCP files. You can search the entire TCP record for a word, or string of words – CQP lists the results in full sentences, centred on the search term. You then have various options and tools that let you play around with those results. You can breakdown the results by year, decade, or century; you can look at which books have the highest or lowest frequency of results in them; and you can look at word collocations. This final feature is my favourite. It basically shows you which words appear most frequently near your search term. So when I searched for ‘addicted’, the top collocation was ‘to’…. as in ‘addicted to’. Scrolling down the list of collocates, I found that people were addicted to ‘study’, to ‘hunting’, to ‘astrology’ and even ‘poetry’. You can also download all of the results to a spreadsheet to play around with, or make tables and graphs with.
In CQP you can order the results by the frequency of hits in each book (useful for finding key texts to focus on); you can use wildcards to expand your search options; and you can create and compare subcorpora (smaller sample sets, built using just the texts you select).
CQP also has some issues, particularly for the historian. First, it’s not at all easy to actually read a book that you’ve identified on CQP. You basically have to go back to EEBO and search for the title. And once you’ve downloaded the results to a spreadsheet, you don’t even have the title or author to search from – just the unique code assigned to that text by the software. Second, all of the texts have been through spelling regularisation software. This means you’re more likely to get a result from your search term, but it also means you aren’t reading the original text. I’ve also noticed that, for some books, the metadata on CQP is different to that on EEBO. I found one text with a different publication date, and several with missing titles.
It’s a tool that uses the same EEBO-TCP data as CQP, but with just two features. First, it lets you type in a word, or combination of words, and then displays the results on a line graph – you can also use it to compare several words or phrases at once. I only wish I’d found it earlier, because it would have saved me a lot of time downloading CQP data to make my own graphs.
Second, you can search for a word or phrase, and it displays the results on a concordance page, similar to that of CQP. The difference is that Early Print sorts the results chronologically, and it shows you the title, author and date on the main results page. It also, brilliantly, gives you the option of using standardised or original spelling.
Finally, Early Print has more discussion of TCP as a historical tool than either of the other sites. There is a graph comparing the number of transcribed texts to the number of texts in the English Short Title Catalogue (ESTC). There is a brief discussion of some of the problems with using EEBO-TCP as a resource, with links to further articles. And there’s a discussion of the the uses of N-Gram viewers, and their limitations.
Overall, Early Print seems more geared towards the historian than CQP, and it’s much easier to use. However, it doesn’t have the same range of features – there’s no option to explore collocations, for example. The EEBO website is still the best place to go if you want to actually read early English books, but CQP and Early Print extend the range of possibilities made available through the TCP data.