Artificial Intelligence to give new access to 500-year-old manuscripts
Ground-breaking computational methods will be used by a team of researchers to advance the access of historical collections and study the history of Early Colonial Mexico.
Grants of £250,000 from the Arts and Humanities Research Council (AHRC) and $150,000 from the National Endowment for the Humanities (NEH) will enable the team from LJMU and Lancaster University to carry out a three-year project, using artificial intelligence.
The project ‘Unlocking the Colonial Archive: Harnessing Artificial Intelligence for Indigenous and Spanish American Historical Collections’ is a joint project between the UK and the US (AHRC/NEH) under the call New Directions for Digital Scholarship in Cultural Institutions and will involve working with industry partners.
The project brings together a strong team to work with some of the most important collections for the study of colonial Latin America.
This includes the main collections from the LLILAS Benson Library, such as the iconic 16th century Geographic Reports of New Spain and the collection of the Primeros Libros de las Americas at the University of Texas at Austin, the Royal Archive of Cholula in Mexico, and some of the historical collections from the General Archive of the Nation in Mexico.
The main objective of the project will be to solve some of the most important challenges that research with historical collections usually pose.
Even when digitally accessible, historical texts are difficult to work with due to calligraphy, changes in spelling over time, and even working out the different types of ‘hands’.
The compilation of information from large historical collections usually take years. Furthermore, when working with image collections such as historical maps and paintings, comparative studies are usually slow due to the long process of identification of iconography and elements that this requires.
Working with industry partners, the project will use machine learning methods in three distinct research areas to:
· Create automated transcriptions of handwritten 16th and 17th century Indigenous and Spanish American historical documents. This will be done using and experimenting with state-of-the-art Handwritten Text Recognition.
· Identify and mine from large multilingual historical text collections information about historical topics of interest using a combination of techniques from the fields of Natural Language Processing (NLP), Linked Data (LD), and Corpus Linguistics.
· Facilitate the automated identification of iconographic elements as well as the search and analysis of pictorial features in Mexican Indigenous maps and printed books, using Computer Vision techniques in combination with Linked Data.
In addition to the technological component, through the analysis of specific collections, the project will advance understanding of the colonial situation of the Viceroyalty of New Spain (in the area that is now Mexico) immediately after the fall of the Aztec Empire.
UK Project Principal Investigator and a Senior Lecturer in Digital Humanities at Lancaster University Dr Patricia Murrieta-Flores said: “This work will create a step change in the ways we do research with textual and image collections. The creation of techniques to, for instance, create automatic transcriptions from historical documents will make these accessible to more researchers worldwide, and as part of the project, we will also train humanists in the use of all these methods.”
Dr Murrieta-Flores will work with Co-Investigator Dr Javier Pereda from Liverpool John Moores University, US PI Kelly McDonough and US Co-I Albert Palacios the University of Texas in Austin and the Llilas Benson Library.
Senior Lecturer in Graphic Design and Illustration at Liverpool John Moores University Dr Javier Pereda, said: “This is a very exciting research opportunity since it enables us to engage with a very complex and large dataset from the Spanish colonial archive. Most importantly it will open new doors of collaboration between world leading institutions and researchers around the world. I am particularly interested in how heritage organisations, researchers and general users can benefit from the role of Semantic Web and Artificial Intelligence technologies.”
A large team of international collaborators from Mexico will include: the National Autonomous University of Mexico (IIE-UNAM), the Meritorious Autonomous University of Puebla (BUAP), the National School of Anthropology and History (ENAH), the National Institute of Anthropology and History (INAH), and the You-I Lab (IPICYT);
European partner institutions include the University of Alicante, the University of Lisbon, Innsbruck University, Zurich University of the Arts.
Industry partners are the Lucentia Lab, Tagtog, and Transkribus.