Chapter 3: Slice, Dice, and Join your Records Again

In this Chapter

  • Getting Started
  • Splitting MARC Records
  • Joining MARC Records
  • Beyond Split and Join

Getting Started

If your library is like most libraries I work with, you get a lot of your bibliographic metadata from content providers.  E-Books, E-Journals, Shelf-Ready…it seems like libraries are constantly having to deal with files of MARC records.  Some of these files are incredibly large, with thousands of records, while others may have just a couple of records, but may be part of a zip with with thousands of these individual record files.  Whatever the case, catalogers have developed lots of strategies for handling vendor data — and one of those strategies is slicing, dicing, and joining record data together.  Whether we are talking about a really large file that needs to be cut into more manageable chunks, or a thousands of individual files that need to be joined into one single file — catalogers need tools and MarcEdit provide them with options.

Splitting MARC Files

Imaging the following use case:

I’m a cataloger that has just received a vendor file containing a current set of purchased e-books.  The file contains approximately 10,000 records, but are of questionable quality.  I’d like to have the ability to break this file into 5 smaller files to simplify editing.

The above use case tends to be pretty common.  The size of the files will vary, but one of the more common questions that I receive deal with turning large files into more manageable files.  This is what the MARCSplit tool was designed to do.

Figure 1: MARCSplit Window

Figure 1: MARCSplit Window

The MARCSplit tool can be called from the MARC Tools Window, from the Main MarcEdit Program Window (Tools menu/MARCSplit) or can be added to the Main Window as a link via the preferences.  And as one might guess from the name of the tool — the purpose of this tool is to break large MARC files into groups of smaller files.

The MARCSplit utility has a couple of different modes that are worth highlighting.  First, by default, the tool breaks records based on the number of records defined by the user.  By default, the tool sets a Record per file limit of 1000.  This means that the tool will generate files of up to 1000 records.  Yours can adjust this value to meet their particular need.

Secondly, users have an option to select the # of files to split.  This option allows users to only generate a specified number of files via a split operation.  What it does not do is automatically split an entire file into a set number of files.  Let me explain…

Say that we have a file with 100,000 records.  I’d like to split that into files of 1000, but I only want to see the 10 files.  Rather than creating 100 files, with 1000 records in each file — I can check the # of files option, set the Records per file to 1000, and tell MarcEdit that I only want 10 files.  MarcEdit with then generate just 10 files, each with 1000 records.