Characterset support has long been a thorn in the side of many technical services librarians, because MARC data comes in so many different flavors and local character encodings. \u00a0For catalogers in North America — two primary character encodings are dominant: MARC8 and UTF8. \u00a0MARC8, of course, is an imaginary characterset that Libraries invented in the 1970s to enable extended Latinate characters to be represented. \u00a0As additional characters were added, the MARC8 format was extended to provide access to Greek, Hebrew, CJK, etc. \u00a0It evolved MARC8 into an escape-based character encoding, which means that it’s virtually useless and unreadable by anyone outside of the library.<\/p>\n
Sadly, things don’t get much better if dealing with UTF8 data. \u00a0In order to preserve round-trip ability with MARC8, most library specifications require the use of the KD notation around UTF8 data — a notation that separates characters from their diacritics, continuing the trend of making non-English language materials difficult to search and find.<\/p>\n
MarcEdit provides for users a method to move data between various charactersets and character encodings. \u00a0Character encodings (like UTF8 notation) is set in the MARCEngine Preferences, as noted: in Book 1: Chapter 3: Understanding the MarcEdit Preferences[ref]http:\/\/marcedit.reeset.net\/learning_marcedit\/welcome-to-marcedit\/chapter-3-understanding-the-marcedit-preferences\/3\/<\/a>[\/ref]. \u00a0For characterset conversions, users should utilize the Characterset Conversion tool found from within the MARC Tools window.<\/p>\nWorking with Charactersets<\/h3>\n