CarLibrary.org - Importing Spreadsheet Data into Greenstone
Importing an Inventory List into Greenstone
This webpage provides a basic step-by-step process about going from a "list" into a basic Greenstone collection of records, with metadata for each record, ready to build a digital library/archive. It is the second CarLibrary.org webpage on Greenstone, each with a different method of creating an automotive-related "collection".
The following steps assumes you have some familiarity with Greenstone and Excel. The Greenstone wiki has a link to a tutorial which is very good and recommended.
The examples on this webpage are Excel files with real, working data imported into Greenstone. This import technique was done many times until a valid collection was created!
This process also assumes you have some knowledge of library classification techniques. If not, some research on this topic or consulting with an experienced librarian may be helpful. One of the official Dublin Core websites is a good starting point, especially topic 4. "Elements" at the bottom of that webpage. Basic categories such as "Title", "Description", "Subject" and "Keyword" can be confusing and tedious to correct after an import with many records. It's best to get it (mostly) right in the beginning.
However: "The Perfect is the enemy of the Good (enough)" - Voltaire So don't hesitate too long to get started!
Why Make a Digital Collection?
For a collection of objects - cars, paintings, books, photographs - which is reasonably organized, there likely is a list (inventory) of these objects and some of their characteristics: "1948 Oldsmobile 88M, green, bought in 1985 from Ted Smith, loaned to the Oakville Museum, call Fred Friday for status" or "Snowy Winter, oil on canvas, by Patricia Jones, 26"x18", bought in 1997 for $175".
If this collection ia slightly more organized, this list and its attributes may now be on a computer, in some type of database. But what is to be done with the documents about the restoration of a 1948 Oldsmobile or the hundreds of photos at various car shows? Linked references - or sometimes the actual document - can be put on a PC. But when the documents grow to hundreds or thousands and direct, repeatable access is needed to any type of digital file related to your collection, a software program that goes beyond databases or photo organizers is recommended: the Greenstone Digital Library software is a good example. The cost is right - it's open-source (free to use). And it's not just for "libraries", but applicable for collections, archives, libraries and museums.
When a conversion of a (Excel) list to Greenstone is made, the attributes of each object become "metadata" in categories which can be used for access or classifying the current collection and future acquisitions. Further, if the Greenstone's standard metadata sets are used, these classifications will be recognized and can be accessible to a wider audience. A collection on Greenstone can remain private on a home or business computer, put on a network or made accessible to the Internet.
Lists, Databases and Excel
A good example of a "list" is a simple contact list of names, addresses and other personal or business information. Such a list would look like this:
Lists such as the example above have been hand-written and typed for generations, but computers allow us to put this data into Excel (or Word) tables or more advanced programs, such as a database. For a useful table (or list) to be used as a "database", some rules apply:
Note in the example above, Record 01 uses "CA" as the standard abbreviation for "California" whereas Record 02 spells out the full state name. This is not good database practice.
Excel was the software used to make the table/list/database for gathering records and data for import into Greenstone. If you have typed lists, investigate using OCR to convert them to spreadsheet/Excel files. Or use a good typist familiar with Excel! If the database-type tables are in Word or a similar program, they usually can be copied into or imported into a spreadsheet-type program, such as Excel.
Excel provides many functions to help you review and clean-up the data. Sorting, copying, pasting and moving cells of data will speed up any need to make the data uniform and adhere to good database practices.
For more information on using Excel as a database, see Using Excel as a Database or an Excel user book.
This is another example of records and a database, more similar to data planned to bring into Greenstone:
A further example which shows how a archive catalog can be imported into Greenstone:
Finally, the following are two examples of field names (columns headings) that have been used for Excel files to record data on cars (vehicles) and their owners, in separate files. See below for further details:
Create an Excel File/Database
Using the examples above, create or check the Excel file to ensure all the data for each record is on a single line, "like kind" data is in each column, variations in each data item (spelling and abbreviations) have been made uniform, and blank lines have been eliminated. Blank cells are acceptable.
Dates can have many formats. The can be either "text" or in an Excel date formats - which can look exactly like text. A good method to fix dates is to sort the entire file on the "Date" field. Dates in text format should be at the top and should be corrected to an Excel date format.
The file intended as an import can have only a few records - probably best for initial trials - or thousands of records. Greenstone imports are very quick!
Finally, create a new first column for the Excel file with a name such as "RecordID" - the exact field name is not critical. Good practice is to make the data in this column a "name" that relates to the file data plus a number to ensure each records has a unique identifier. For the Frazer Nash car file above, this would be something like "HighSpeed05", "Highspeed06", and "LeMansReplica08". For the archive file above, this would be "Sales-Promo001". If there are many records, this can be a tedious manual process -Excel's process to create a list of consecutive numbers can be used, format the numbers into a standard format (ie. "01", "02") and then use the "Concatenate" function to combine this number field with a data item from a different column. This column of data will be very useful later in Greenstone.
Review Standard Greenstone Metadata Categories
Assuming there has been some basic Greenstone experience, there will be familiarity with the "Dublin Core Metadata Standard", which is the basic classification scheme used in Greenstone and widely recognized by digital libraries and other resources (including web pages).
The Dublin Core basically consists of these elements:
In Greenstone, each Dublin Core element is prefixed with "dc.", so they appear as dc.Title, dc.Creator, etc. Because these elements are widely accepted and recognized, it is a good idea to match the imported field names to the Dublin Core elements, insofar as that is possible.
For this example of an archive file, this mapping is used:
Note that not all Dublin Core metatags are mapped from the file scheduled to be imported; unused metatags can be added after the import directly in Greenstone, as needed. Note also that new "item" metatags have appeared. These should be added to Greenstone before the import. This process is described below.
Other metatags will be also added, such as "car.Make", "car.Model", etc., specifically because the data of this archive is from a car company and a car club.
These are the fields/categories suggested for the Greenstone automotive digital collections:
Note that the Excel file destined for Greenstone does not initially need to have entries for all of these Dublin Core metatags; metatag data can be added after the import as needed, directly in Greenstone. New metatag categories can be also created, such as "item.xxx" which could be useful to add physical attributes in an archive collections. New metatags should be added to Greenstone before the import. This process is described below.
The "mapping" of car collections Excel file to the planned Greenstone collection is very easy - rename column headings to the relevant Dublin Core element or to a new metatag category planned to be added to Greenstone.
For the the imports of the Excel example files shown above, a metadata set was created: "cars.xxx". For a collection that was primarily an archive inventory, an "item.xxx" metadata set was added.
"Exploding" (Importing) the Spreadsheet Database
In Greenstone, "importing" an Excel file/database is not very difficult. This process will create a (nul/empty) records from each record in the Excel file. Each record's data elements will become a metatag, ready to be used to identify/categorize each car, or related document, photo or other digital item. The Excel file must be converted to a "CSV" format and then "exploded" in Greenstone. This "explosion" is very safe!
A Greenstone tutorial explains this - on that page, follow on from step 15.
An Excel file cannot be "exploded" directly, but such a file is easily saved as a "comma-delimited" (Comma Separated Values, "CSV") file.