Creating a Digital Index of Ancient Greek Texts, Part II: Compiling TLG References

On Friday, I wrote about how to convert a list of ancient Latin references generated from the Packhard Humanities Institute’s Library of Classical Latin Texts into a digital library of citations in EndNote or Zotero. Today, we turn to the parallel process of converting citation lists from the Thesaurus Linguae Graecae database into EndNote or Zotero. I’ve copied over much of my instructions from the first post into this second one—so there’s some redundancy here.

One major difference: TLG expressly forbids the copying of Greek texts via their license agreement and copyright page. As the license agreement explains,

“Licensee make use of the Licensed Materials as is consistent with the Fair Use Provision of United States and International copyright laws. Licensee may not under any circumstances download or print large portions of a text or entire texts.”

So, it is important to note that the following steps have nothing to do with the Greek texts compiled in TLG but with the index of citations to particular words. Indeed, one should modify even the citation lists to keep within Fair Use Provision.

The steps to convert a library of citations are largely the same as those we outlined for the PHI Latin texts but require a good deal more text manipulation in MS Excel or Python (if you know it) because there is more metadata, and citations are not listed in entirely standard ways. Note that this will take some time, especially with larger libraries of several thousand entries. But the time it requires to manipulate, say 3,000 citation records, is not nearly as long as keying even 200-300 entries manually.

So, the steps mirror those we delineated for the PHI Latin texts.

Step 1: Select and Copy Greek Citations via the Simple Text Search of TLG

Run a simple text search on a key word in TLG. Set “Lines of Context” to ‘0’ so as to not capture any Greek texts (as noted above). Set “Results per page” to 1000 (cit). Order by date. Run the search and generate the results as follows. For my example, I have used the root ισθμ, so as to capture all Greek examples of isthmos, isthmiaka, isthmia, etc…

TLG_2

At the end of the results, click on the Printable Form button and copy the citations.

TLG_3

Paste into MS Excel (Paste to “Match Destination Formatting” to eliminate the hyperlinks and color).

TLG_4

If there are additional Greek keywords for your library (e.g., Kenchreai, Lechaion, Korinth-), repeat this step and dump into separate worksheets in Excel. You’ll want to keep them separate for now so that you can add English keywords before combining.

Remove the numbers generated during the TLG search. To do this, select Column A and take out the numbers listed before the author by using the REPLACE function. After you eliminate the numbers, you will be able to sort author names alphabetically.

You can select then column A and sort to get rid of the extra spaces between lines.

TLG_9

 

Step 2: Prepare Text for Comma Delimitation

Since you’ve pasted every record into a single cell, you’ll need to separate values via commas so that you can delimit into different cells.

The following record, for example, will create a division at the comma so that author is separated from work, TLG text #, date, citation.

Homerus Epic., Odyssea. {0012.002} (8 B.C.) Book 18 line 300

You’ll want to add commas after each of these to prepare the record for delimitation. Mostly this is a straightforward process. Select Column A. Use Find – Replace All to add commas before the ‘{‘, after the ‘}’, and after the ‘)’.

TLG_11

The example given above would be changed to

Homerus Epic., Odyssea. ,{0012.002}, (8 B.C.), Book 18 line 300

Note that some records will create problems for delimiting by comma:

  • Eumelus Epic., Corinthiaca (fragmenta). {0298.004} (8/7 B.C.) Volume-Jacoby#-F 3b,451,F fragment 4 line 5. 
  • Michael Psellus Phil., Theol., Polyhist., Epist. et Hagiogr., Opuscula psychologica, theologica, daemonologica. {2702.011} (A.D. 11) Page 80 line 26. 

Using the find-replace to add a comma after the ‘)’ of the date in the first example will also add one after the ‘)’ of fragmenta. Delimiting via the comma (Step 3) will create breaks in the citation after the 3b and the 451. The solution is to CONCATENATE these problem records (Step 4).

Step 3: Delimit Records into Separate Columns

Select Column A. Select the Data tab and then “Text to Columns” option. Where it says “Choose the File Type,” select the Delimited Button.

Select “Next” and check the box next to Comma as your Delimiter:

Hit “Next” and then “Finish”. After adjusting for column width, you should have something like this:

TLG_12

Step 4: Concatenate and Clean Up Text

Everything should be in 5 columns; if it is not (as in the case of Hecataeus above), there were additional commas that created breaks.

You can clean this up by deleting cells (as in the extra periods above) and using a formula with the CONCATENATE function to combine cells. Combine in a new column (K in the example below) and then copy and paste the value into column E. Fixing these glitches may take an hour or two to fix depending on the number of records (it took me two hours to edit 4,500 records).

TLG_13

You may also want to CONCATENATE columns B and E so that the citation shows up in the title field in EndNote and Zotero. To do this, use a formula with CONCATENATE so that you create a new column F which combines title (Column B) and citation (Column E).

TLG_14

Then, copy and paste value of new column F into Column B, replacing the old values of Column B. Delete columns E and F so that you have 4 columns.

TLG_15

Step 5: Combine Text, Edit, and Polish Text

At this point, you’ll want to clean up the text and prepare it for export into EndNote. If you have multiple worksheets consisting of different Greek keywords, add a fifth column in each of those worksheets, and insert the searchable keyword in English. Then combine all the worksheets into a single master worksheet.

After you’ve combined, you may want to clean up your data in a number of different ways:

  • Convert B.C. / A.D. values to positive and negative numbers. Use Find/Replace to change all dates into numbers. For example, 8 B.C. should be replaced with ‘-750’. Change 8/7 B.C. to –700. It does not really matter whether a text dates from 750, 725, or 700 BC (or was subsequently edited throughout the Classical period). You will just want to be able to sort in EndNote and Zotero and separate citations from, say, the 8th century BC from the second century AD. Of course, if you want to give precise chronological values to works and titles, now is the time to do it. Easier to do it in batches here than manually in Zotero.

TLG_16

  • You may want to replace author names (Homerus –> Homer) and titles (Ilias –> Iliad).

TLG_17

Again, it’s easier to make batch changes with different Excel functions here than change them later in Zotero or EndNote.

Insert a new row 1 at top of spreadsheet with the key heading words shown in the image below. EndNote will use these headers to interpret where the values go during the import. The spelling of these headers must be exact or there will be problems in importing.

Insert a new column 1 and title it “Reference Type.” Beneath this, for all the records, paste the value “Ancient Text” like the following (“Book” will also work as a recognized value).

When you are finished editing, save as a Text (Tab Delimited File).

Step 6: Clean Up in Word

I am not sure this step is necessary, but this YouTube tutorial video suggests you need to clean up the text by eliminating or replacing all quotations, apostrophes, wildcards, and the word “and”. I was able to import texts successfully without this steps—so if you have problems in Step 7, return to Step 6 and see if it makes a difference.

For Images for Steps 7-9, see Part 1 here.

Step 7: Import to EndNote

To import into EndNote, select “File” tab –> “Import” –> “File.” Select your tab-delimited text file. For Import Option, select “Tab Delimited.” Duplicates: “Import All”. Text Translation: “No Translation.”

Step 8: Export to Zotero

For this step, Zotero has provided documentation here.

To export to Zotero, click on Edit –> Output Styles –> Open Style Manager. Make sure RefMan (RIS) Export is selected. Close the Style Manager. Another acceptable export Style is BibTeX.

Select File –> Export. Select file. Save as type: Select “Text”  (give file a new name). Output Style: RefMan (RIS) Export. Uncheck the box “Export Selected Records,” and EndNote will assume you want to export all records. Click “Save.”

Step 9: Import to Zotero

Last step is open up Zotero for Firefox, or Zotero Stand-Alone.

Click on File –> Import –> select file and click “Open”.

**************************

As I noted in the previous post, I would be interested in hearing whether others have done this a different way, or how these steps might be improved to generate a more powerful database.

Advertisements

Categories: Digital Corinthia, Texts

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s