Chapter 3: Understanding the MarcEdit Preferences

MarcEdit includes two XSLT processing engines are part of the application; the MSXML engine which is a part of the .NET framework, and Saxon.NET, developed by Michael Kay [ref]Saxon XSLT and XQuery Project, http://sourceforge.net/projects/saxon/[/ref].  Both engines have their strengths and weaknesses.  By default, MarcEdit selects the MSXML engine, since this XSLT engine has been optimized for speed…but that speed does come with a cost.  Users are encouraged to evaluate the differences between the two options and make the choice that best suites their circumstances.

  • MSXMLRecommended Value: Selected.  The MSXML engine provides support for XSLT 1.0 and XQuery 1.0.  Users wishing to utilize XSLT 2.0 or XQuery 2.0 should select the SAXON.NET processor.  The MSXML engine is selected by default for two reasons: 1) the engine has a smaller footprint due to the integration with the .NET framework, and 2) the engine is noticeably more responsive when processing small to medium size XML documents.
  • SAXON.NET: The SAXON.NET engine is a .NET implementation of the SAXON XSLT engine.  This XSLT engine support XSLT 2.0 and XQuery 2.0.  This engine tends to have a noticeably slower loading time, but provides better performance on large to medium-large XML documents.  

[table]tip[attr style=”width:90px”], “The XSLT Engine option sets the default XSLT engine to be used by MarcEdit.  However, users can override this default value and specify a specific XSLT engine at the transformation level.”[/table]

 

Unicode Normalization

With  more and more library metadata being encoded or translated into UTF-8, issues related to Unicode normalization are becoming more common.  While many metadata libraries believe that simply moving metadata into UTF-8 will fix the indexing issues that libraries currently face when working with diacritical data, the solution isn’t that simple.  What many people do not realize, is that not all Unicode data is created the same.  One of the issues related to MARC-8 encoded data is that diacritical data isn’t represented as a single value.  For example, and A with an acute is represented by the letter ‘a’ and a value representing the acute symbol.  When indexed, the diacritical information is lost.  The same thing can occur with UTF-8 data depending on the normalization in use.  Within the library community, the U.S. Library of Congress recommends the use of the Compatibility Decomposition or KD notation.  This notation emulates the MARC-8 encoding structure, in that diacritical data are represented as two separate codepoints: a codepoint representing the ‘a’ and a codepoint representing the diacritical mark.  Display of the diacritic happens at the display level, but for indexing purposes, the diacritical information is lost.  For users outside of the United States, the preferred Unicode normalization is the Canonical Decomposition or C notation.  In this case, diacritical data is represented as a single codepoint, so the ‘a’ with an acute is represented by a single value representing that character.  The diacritical information is then represented correctly at the display level and is indexed correctly.  Presently, the U.S. Library of Congress specifies the use of the KD notation for all records encoded in MARC21 to ensure the ability to round-trip data between MARC-8 and UTF-8.

  • Compatibility Decomposition (KD)Recommended Value: Selected.   My personal opinion is that this notation should be dropped in lieu of the Canonical Decomposition, but for users working with MARC21 and sharing data with libraries within North America, OCLC or the Library of Congress…the KD notation should be used.
  • Canonical Decomposition (C):  This option will convert all UTF-8 diacritical pairs to their single codepoint representation.

[table]tip[attr style=”width:90px”],”While the KD notation is recommended for purposes of compatibility with MARC-8, MarcEdit has the ability to maintain that compatibity between UTF-8 data encoded using the C notation and MARC-8.  The process occurs seamlessly for the user.”[/table]

Update Settings

Figure 6: Update Settings

Figure 6: Update Settings

Leave a Reply

Your email address will not be published. Required fields are marked *