Pages

Tuesday 10 September 2013

Linked Data: from Aleph/Primo to the Dictionary of Luxembourgish Authors

Roxana Popistasu, IT staff, Bibliotheque nationale de Luxembourg

Project started last year, simultaneously went live with Primo. The NLL also manages Aleph & Primo fo rthe network of libraries in Luxemboug. Also partnership with Centre national de litterature, which manages the content of Autorenlexikon (dictionnaire des auteurs luxembourgeois). Idea is to link all this info. Other partners: Magic moving pixel, an IT management company for Autorenlexikon. The goal of the project was to evaluate the work involved.

Questions to be answered:
- How to create a link between authors? String matching or id's?
- How to deal with identical names?
- How to deal with the authority records?
- How to do find (and save?) the matches

The initial results were unsatisfactory. Connecting the authors between AutorenL and the bib database based on id's produced a low number of matches (60%) and this was even lower with the authority database (35%), so we had to use the string matching. But even 60% was better than nothing so used that first. Link was added in the bib record.

The actual project for setting up the linking started in March 2013. Started by adjusting the matching algorithm and create a database with matches. Then came the need to create web service to be used for the display in the catalogue and in AutorenL, then display matches in the Aleph OPAC, and create the validation service. Matches were made on author and title.

The algorithm: created normalisation rules, e.g. elminate different characters, upper/lower cases etc. Work on standardising the cataloguing ruiles with were different in Aleph (MARC21) and AutorenL. Levels of matching needed to be checked. Created database with matches, regularly and automatically updated using exports from Aleph to import in that DB.

Choices were made for the matches, e.g. using pseudonyms and alternative names so looking at those enabled to do the matching if one person was represented differently in the different databases. A validation service was set up for the National centre for literature, to assist them to do the matching on their side. This was based on levels of accuracy. They could find the Aleph system id where relevant. This has also helped them to find small mistakes in their data which they wouldn't have otherwise found.

Phase 2 of the project is how it displays in Primo and doing links between the authority database and AutorenL because at the moment it is only done with the Aleph database. We are also going to work with VIAF to publish our authority data but first we need to improve it. This project can be part of this process. We will investigate more how to link on id's and see how to integrate with other systems, such as DigiTool.

No comments:

Post a Comment