Chapter 4: Working with MarcEdit’s command-line tools

In this Chapter

  • Getting Started
  • Configuring Your System
  • Using the Command-line options

Getting Started

MarcEdit includes a separate command-line tool that encapsulates a wide range of functionality.  The tool provides a GUIless interface that can be used within shell scripts or other automation tools. The command-line tool is available in all versions of MarcEdit.  On windows and Linux, the terminal version is run through a separate program: cmarcedit.exe.  On the MacOS version, the terminal is run through the MarcEdit app.  In all examples below, the cmarcedit.exe can be replaced with MarcEdit on MacOS.   Because MarcEdit isn’t installed into the PATH of your system, to access these tools, you need to access the terminal application by entering the full path to the executable at the command prompt.  This location will be in different places depending on the operating system and, on Windows, if your system is 32 or 64 bit.  For each system, the program can be found at:

Windows:

  • Windows XP, VISTA, Windows 7, Windows 8, Windows 10 (32-bit systems): c:\program files (x86)\MarcEdit 6\cmarcedit.exe
  • Windows XP, VISTA, Windows 7, Windows 8, Windows 10 (64-bit systems): c:\program files\MarcEdit 6\cmarcedit.exe

MacOS:

  • /Applications/MarcEdit.app/Contents/MacOS/MarcEdit

Linux:

  • Path where you unzipped the file/cmarcedit.exe
    Note:  Linux implementations need to be prefixed with the mono command.

If you are going to use these tools often (as I do), I recommend creating an environmental variable that points to the application path.

Configuring Your System

If you are going to make use of the terminal program, it is highly recommended that you either add the MarcEdit program directory to your PATH environmental variable, or you create a new variable for the MarcEdit path.  My personal preference has been to create a new variable to store the MarcEdit program path.

The process for setting these variables up will be different on each system, and at some point in the future — I’ll likely have the installation program do this for users by default (at least on Windows and MacOS), but for now, the process is a manual one.  When this process is automated, I’ll be creating this value as a user variable under MARCEDIT_PATH, so I recommend using that value.  If you don’t know how to setup an environmental variable on your system, please see one of the following links:

Once configured, you will be able to access the MarcEdit terminal using the environmental variable and the program name.  For example, on Windows:

>> %MARCEDIT_PATH%\cmarcedit.exe

On MacOS:

>> $MARCEDIT_PATH/MarcEdit

In the examples below, I will be using the cmarcedit.exe program to demonstrate different types of functionality.  On MacOS, you can replace the cmarcedit.exe with just MarcEdit.  For example, in the following example:

>>cmarcedit.exe -help

The MacOS equivalent would be:

>>MarcEdit -help

Finally, these examples leave out the program path.  You can assume that for each of these examples, you would need to enter the path to the program, or prefix the command with the environmental variable that stores the MarcEdit Application Path.

Using the Command Line Options

The program current provides the following options (these options can be found by entering: >>cmarcedit.exe -help

***************************************************************
* MarcEdit 6.2 Console Application
* By Terry Reese
* email: reeset@gmail.com
* Modified: 2017/5/30
***************************************************************
Arguments:
        -s:     Path to file to be processed.
                        If calling the join utility, source must be files
                        delimited by the ";" character
        -d:     Path to destination file.
                          If call the split utility, dest should specify a folder
                        where split files will be saved.
                        If this folder doesn't exist, one will be created.
        -export_delimited:      Export Tab Delimited Records function
        -D or -delimiter:       Set the delimiter.
                [format]delimiter1,[delimiter2],[delimiter3]
                [options:        = tab, [character]
        -normalize:     Sets the normalization options when exporting tab delimited data.  This is turn off by default
        -rules: Rules file for the MARC Validator or Linked Data Tool or export_delimited (depending on context).
        -view:  View the specified source file.  Default, 2K will be printed.
        -buffer:        Amount of data to be viewed.  This is an optional Parameter.
        -pretty:        When paired with the -view command, will always output data in mnemonic format if data is MARC
        -mxslt: Path to the MARCXML XSLT file.
        -xslt:  Path to the XML XSLT file.
        -batch: Specifies Batch Processing Mode
        -encoding:      Specifies the file encoding when translating to UTF8 or XML.  Must be a windows codepage
        -character:     Specifies character conversion mode.
        -break: Specifies MarcBreaker algorithm
        -make:  Specifies MarcMaker algorithm
        -marcxml:       Specifies MARCXML algorithm
        -xmlmarc:       Specifies the MARCXML to MARC algorithm
        -marc2json:     Specifies the MARC 2 JSON algorithm
        -json2marc:     Specifies the JSON 2 MARC algorithm
        -marctoxml:     Specifies MARC to XML algorithm
        -xmltomarc:     Specifies XML to MARC algorithm
        -xml:   Specifies the XML to XML algorithm
        -validate:      Specifies the MARCValidator algorithm.  If -rules options is present, validation validates against rules.  Without, it validates structure
        -dups?:         Will report if duplicate records are likely.
        -clean: If present with the -validate option, the program will clean invalid records from the file.  If not set, the program will just return structure errors.
        -task:  Path of a task file to run.  Must have a -s and -d values defined.  -silent is optional
        -silent:        Removes all dialogs when running tasks.
        -join:  Specifies join MARC File algorithm
        -split: Specifies split MARC File algorithm
        -records:       Specifies number of records per file [used with split command].
        -raw:   [Optional] Turns of mnemonic processing (returns raw data)
        -utf8:  [Optional] Turns on UTF-8 processing
        -marc8: [Optional] Turns on MARC-8 processing
        -pd:    [Optional] When a Malformed record is encountered, it will modify the process from a stop process to one where an error is simply noted and a stub note is added to the result file.
        -bibframe:      Specifies the current bibframe transformation.  Currently, bibframe2.
                Example: >>cmarcedit.exe -s [yourfile] -d [output file] -bibframe -idfield 001 -baseuri http://www.example.com
-output rdfxml
        -idfield:       [Optional] Pairs with the bibframe conversion, sets the id field.  Defaults to 001.  Can be a field and subfield combination: i.e.: 035$a
        -baseuri:       [Optional] Pairs with the bibframe conversion, sets the base uri.  Defaults to example.com
        -output:        [Optional] Pairs with the bibframe conversion, sets schema output. Valid values: rdfxml
        -collections:   Print out all Linked Data Collections currently profiled.
        -buildlinks:    Specifies the Semantic Linking algorithm
This function needs to be paired with the -options parameter
        -options        Specifies linking options to use: example: lcid,viaf:lc,oclcworkid:001,autodetect,3xx
lcid: autodetects main entry
                autodetect: autodetects subjects and links to know values
                oclcworkid: inserts link to oclc work id if present
                3xx: autodetects values in 3xx fields and links to known values
                limit: limits values to specific defined collections.  Example: limit:mesh (limits to just mesh headings                rules: path to the linked data rules file.              viaf: linking 1xx/7xx using viaf.  Specify index after colon. If no index is provided, lc is assumed.
                        VIAF Index Values:
                        all -- all of viaf
                        nla -- Australia's national index
                        vlacc -- Belgium's Flemish file
                        lac -- Canadian national file
                        bnc -- Catalunya
                        nsk -- Croatia
                        nkc -- Czech.
                        dbc -- Denmark (dbc)
                        egaxa -- Egypt
                        bnf -- France (BNF)
                        sudoc -- France (SUDOC)
                        dnb -- Germany
                        jpg -- Getty (ULAN)
                        bnc+bne -- Hispanica
                        nszl -- Hungary
                        isni -- ISNI
                        ndl -- Japan (NDL)
                        nli -- Israel
                        iccu -- Italy
                        LNB -- Latvia
                        LNL -- Lebannon
                        lc -- LC (NACO)
                        nta -- Netherlands
                        bibsys -- Norway
                        perseus -- Perseus
                        nlp -- Polish National Library
                        nukat -- Poland (Nukat)
                        ptbnp -- Portugal
                        nlb -- Singapore
                        bne -- Spain
                        selibr -- Sweden
                        swnl -- Swiss National Library
                        srp -- Syriac
                        rero -- Swiss RERO
                        rsl -- Russian
                        bav -- Vatican
                        wkp -- Wikipedia

        -help:  Returns usage information

C:\Users\reese>

 

Common options:

Translating data from MARC to mnemonic:

>> cmarcedit.exe -s [sourcefile] -d [destfile] -break [-utf8|marc8|raw|pd]

The above command has a number of optional values.  -utf8|marc8|raw set the encoding properties for the MARC file.  If -utf8 is used, the program will assume that the data is in MARC8 or UTF8 and ensure that all data is encoded in UTF8.  The same is true for the marc8 option, though the final output is marc8.  The raw option doesn’t do any encoding.  This is what is used by default.  The pd option tells MarcEdit to stop on error and output the record and error that caused the processing issue.

Translating data from mnemonic to MARC:

>> cmarcedit.exe -s [sourcefile] -d [destfile] -make [-utf8|marc8|raw]

Working with Linked Data:

The linked data option uses the following pattern: cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options [linkoptions]

As noted above in the list, –options is a comma delimited list that includes the values that the linking tool should query.  A user, for example, looking to generate workids and uris on the 1xx and 7xx fields using id.loc.gov – the command would look like:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid

Users interesting in building all available linkages (using viaf, autodetecting subjects, etc. would use:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid,autodetect,viaf:lc

Notice the last option – viaf. This tells the tool to utilize viaf as a linking option in the 1xx and the 7xx – the data after the colon identifies the index to utilize when building links.  The indexes are found in the help (see above).

Translating Data to Bibframe

MarcEdit’s console includes the ability to leverage the U.S. Library of Congress’s bibframe translation profile.  The tool has 3 optional parameters:

  • -idfield: default is 001 but can be a field + subfield (i.e. 035$a)
  • -baseuri
  • -output: Serialization, only rdfxml is currently supported

To use the function — you call the bibframe argument.  The command would look like on windows:

<<cmarcedit.exe -s [sourcefile] -d [destfile] -bibframe

On MacOS, the command would look like:

<< MarcEdit -s [sourcefile] -d [destfile] -bibframe

Additional Examples

Running Tasks via the Terminal.  The -experimental flag is optional, but runs tasks natively and is significantly faster.  This functionality will eventually be standard.

>> cmarcedit.exe -s [sourcefile] -d [destfile] -task [fullpath to task file] -experimental

Validating Records:

>> cmarcedit.exe -s [sourcefile] -validate

Add -clean to remove invalid records:

>> cmarcedit.exe -s [sourcefile] -validate  -clean

Command to tell you if dups exist in a file:

>> cmarcedit.exe -s [sourcefile] -dups?

Command to Export Delimited values:

>>cmarcedit.exe -s 100_a_coded_test.mrk -d 001_100.txt -rules UNIMARC_bib_001_100_coded.txt -export_delimited -D \t

What more information:

You can find additional examples of how the command-line version of MarcEdit works by searching for cmarcedit @ http://blog.reeset.net/?s=cmarcedit&searchsubmit=