Chapter 3: Understanding the MarcEdit Preferences

[/table]

MARCEngine Settings

Figure 5: MARCEngine Settings

Figure 5: MARCEngine Settings

The MARCEngine is one of the key components of the MarcEdit application.  The MARCEngine processing library controls all aspects of processing both MARC and non-MARC by the application.  The MARCEngine options are broken into 4 grouping: General Options, XML Options, XSLT Engine options and Unicode Normalization.  Individuals working with MarcEdit and working with data that is not in MARC21 or are international users working with diacritical data should carefully consider which of these options will best meet their needs.  Likewise, individuals utilizing MarcEdit to process XML encoded data or are using MarcEdit to output XML encoded data should evaluate these options to ensure that MarcEdit can best meet their individual needs.

General Options

  • Use Diacritics when breaking:  Recommended Value: Checked.  When this option is selected, MarcEdit will replace non-UTF-8 encoded diacritics with mnemonics.  For example, file encoded in MARC-8, that includes a diacritic with an A with an acute, would be represented as {acute}a or {aacute} when rendered in the MarcEditor.  Once compiled, this data will be converted back to the byte equivalents of the mnemonic values.  For nearly all users, this option should nearly always be selected.  Only those users that are creating and editing data in an alternative characterset (like Big-5) should consider unchecking this option.  However, if you uncheck this option, you much also change the MarcEditor Preference, Default encoding to match your record encoding.For example, if a user’s data is encoded in Big-5, and they wish to render their data in Big-5 and have a keyboard configured to enter Big-5 format, you would configure MarcEdit to treat record data as Big-5 by unchecking the Use Diacritics when breaking and setting MarcEditor’s Default Encoding to Big-5.  However, outside of this narrow use case, users are encouraged to always select this option.
  • Records in MARC21 format:  Recommended Value: Checked.  Since most national libraries are moving towards a more universal acceptance of MARC21 as the privilaged flavor of MARC, I recommend that this value remained selected.  The purpose of this configuration option is to signal to the MARCEngine when it is ok to manipulate the MARC leader for setting character encodings.  MarcEdit has the ability to convert between multiple character encodings, and in MARC21, there is a specific byte in the leader that must be set to identify when a record is in UTF-8 or in something else.  If you know that your data is not in MARC21, and you will be converting data from one character encoding to UTF-8, you should uncheck this option to prevent MarcEdit from incorrectly flipping the leader byte.For example, if a user’s bibliographic data is in UNIMARC and the data’s characterset is encoded in MARC-8 and the user is going to use MarcEdit to convert the data from MARC-8 to UTF-8…the user should uncheck this value prior to running the conversion.

XML Options

  • MARCXML XSLT:  As discussed in subsequent chapters, MarcEdit’s XML framework is designed around using MARCXML as a mediator schema when moving metadata between schemas.  In previous versions of MarcEdit, all XML transformations were handled via XSLT.  While this is no longer the true in the case of MARCXML conversions, users still have the option of forgoing MarcEdit’s native MARCXML processing algorithms in lieu of an XSLT approach.  While MarcEdit’s native processing algorithm is orders of magnitude faster and has no size limitations when processing MARCXML data, the XSLT approach allows the user greater flexibility in customizing the output generate by the transformation.  For users wanting to utilize this approach, unselect the Use Native Option (Non-XSLT process) and enter the path to the custom XSLT file that MarcEdit should utilize when processing MARCXML data. [table]tip[attr style=”width:90px”], “Please note, users wishing to modify the MARCXML stylesheet should not modify the default stylesheet provided by the application.  MarcEdit treats that file as protected, and will overwrite the source file on update.  Users should create their own MARCXML stylesheet if they wish to make modifications.”[/table]
  • Use Native Option (Non-XSLT Process): Recommended Value: Checked.  MarcEdit provides two methods for translating MARCXML formatted data into MARC: an XSLT option and a Native option.  If this value is checked, MarcEdit will utilize the native option, which utilizes a SAX processing approach to translation.  If the user unselects this value, MarcEdit will utilize the XSLT file defined in the MARCXML XSLT and utilize an XSLT process when transforming MARCXML data to MARC.[table]tip[attr style=”width:90px”], “MarcEdit’s native Option is capable of processing MARCXML data at high speeds and with no size limitations.”[/table] [table]tip[attr style=”width:90px”], “The largest MARCXML file I  have processed, is ~1 TB utilizing the Native Option.”[/table]
  • Use NamespacesRecommended Value: Checked.  Namespaces in XML allow for disambiguation.  However, they also significantly increase the file size of a document.  By unselecting this option, the MARCEngine will not output namespaces when generating MARCXML data

XSLT Engine

Leave a Reply

Your email address will not be published. Required fields are marked *