Archive Spotlight: Historic Arabic Newspaper Digitization Project

In addition to our family collections, one of the Khayrallah Center Archive’s core collections is our growing trove of historic Arabic newspapers, literary journals, and magazines. Between 1890 and 1950, migrants from Greater Syria established a remarkably active and diverse immigrant press. Khayrallah Center researchers have identified over 140 newspapers, literary journals, and magazines published in North and South America during this time frame. The Khayrallah Center intends to locate and digitize the surviving Arabic newspapers published in North and South America before 1950 and make them available to researchers around the world. This ambitious and challenging project will benefit both scholars and descendant communities by opening access to the words and lives of the early Lebanese immigrants, as depicted through their own writings.

Mastheads for North and South American Arabic newspapers, advertised in Al-Majalla al-Tijarriva, [The Syrian-American Commercial Magazine].

These early newspapers contain a wealth of information about members of the Mahjar, or diaspora. Though these publications were primarily intended for educated, literate individuals—many of whom were members of the middle and upper echelons of society—they nonetheless reveal rich details and historic trends about early Arab-speaking immigrants. One important thing that the newspapers teach us is the ways in which immigrants remained connected to their homelands even as they developed new identities in their new countries. This is evident in the first North American Arabic newspaper, Kawkab Amirka [Star of America], founded in 1892 by the pioneering Arbeely family. Kawkab Amirka contained news about events in the Ottoman Empire as well as reporting on American politics and community issues. The newspapers, even the many that lasted only a few years before succumbing to competitors and financial challenges, also represent the diversity present within early immigrant communities. In New York City alone, there were newspapers representing numerous religious groups and political viewpoints, representing Maronite and Orthodox religions as well as various views towards the Ottoman Empire and, after World War I, the question of national independence and identity. The newspapers additionally reveal both how culture and identity changed and was preserved across generations. Though most were published in Arabic, some publications were fully or partially bilingual. In the 1920s some papers were published in the languages of the new countries—including English, Spanish, and Portuguese—in order to engage the children and grandchildren of first-generation immigrants, many of whom were not fluent or literate in Arabic; an important example of this is The Syrian World, which explicitly sought younger audiences. For genealogical researchers, historic Arabic newspapers offer ways to learn more about particular individuals and families, for most of the newspapers had local news sections containing notices of marriages, deaths, and travel. 

The masthead of the rare newspaper Al-Kown [The Universe], published between 1907 and 1910 in the Syrian Colony of Manhattan.

The Historic Arabic Newspaper Digitization Project builds upon the work of pioneering scholars of the Arab diaspora, such as Dr. Alixa Naff, who aided the Library of Congress in collecting physical newspapers to be preserved through microfilm. While this led to the preservation of many newspapers that might otherwise be lost, microfilm is an increasingly difficult technology to access, requiring both the rolls of film and a specialized machine to read them. The first stage of our Arabic newspaper project has been to digitize the microfilms in the Arab American collection at the Library of Congress. In the fifteen months since initiating this endeavor, we have acquired full or partial runs of 19 unique newspapers. In fall of 2017, we completed digitization of over 40 years of Al-Hoda [Guidance] and three decades of Mira’at al-Gharb [Mirror of the West]. We have also digitized the complete available microfilms of several lesser-known publications, including important literary journal Al-Funun, the English-language journal and newspaper The Syrian World, lesser-studied newspapers Al-Wafa [Fidelity] and Al-Kown [The Universe], and more. Digitization of the other long-running newspapers is currently in progress. In the coming years, we plan to expand our scope to include the newspapers published in Brazil, Argentina, and Mexico, all of which were home to significant communities of Syrian and Lebanese immigrants.

The physical pages of a 19221 issue of Al-Iqbal, currently undergoing professional digitization.

An ongoing element of this project depends on collaboration with institutions and individuals. Many of the microfilmed titles are incomplete, containing only issues of the newspapers that were known at the time of microfilming. For example, though Kawkab Amirka was published well into the 1900s, it is only available on microfilm until 1896. Similarly, though Mira’at al Gharb began publication in 1899, the available microfilms begin in 1910. Newspapers were printed on low-quality paper, and were often discarded after reading, and, sadly, many publishers’ archives were lost, often through disasters such as fire. As a result, some of these newspapers may be lost to time. However, we hope to locate any and all physical copies that survive. To this end, we depend on collaborators to locate historic newspapers. Many newspapers may be preserved by private collectors, held in local archives, sold in antique shops, or kept among family papers and archives. For example, a community historian and collector in Lawrence, Massachusetts owns copies of Al-Iqbal, a newspaper published in that community in the early twentieth century previously thought to be lost. A collaboration with the Lawrence Public Library Special Collections led us to digitize their holdings of Al-Wafa, another newspaper from Massachusetts. Descendants of the poet and journalist Elia Madi preserved the entirety of his newspaper, As Sameer, in their family collection. While these represent long runs of the newspapers, even single issues are treasures which enrich the historic record. The Khayrallah Center is working with both to digitize these important publications, and is actively seeking further partnerships.

One of the Khayrallah Center’s long-term goals in this project is to expand the accessibility of these newspapers through the development of OCR capable of reading historic Arabic script. OCR, or optical character recognition, is the technology that supports keyword searches in digital texts, allowing for the creation of searchable databases of books and newspapers printed in English and other languages that use the Latin alphabet. Arabic OCR, however, lags behind OCR for languages that use the Latin alphabet, and currently lacks the capability to accurately read historic Arabic type. This is due in part to several characteristics of Arabic script. Unlike the Latin alphabet, Arabic is a cursive script, and many of the letters have multiple forms depending on their location within a word.

An image of the Mergenthaler Linotype printing press, which many newspapers used to produce their issues.

Additionally, Arabic has many diacritical marks—dots, lines, and other marks above or below letters—that impact computers’ ability to interpret texts. While these represent considerable challenges to developing accurate Arabic OCR, the Khayrallah Center has been testing different ways in which this technology can be developed. This endeavor requires a large corpus of digitized material. To this end, the Khayrallah Center has partnered with other organizations like the Center for Research Libraries, who have generously shared their previously-digitized material to contribute to our development of a robust, centralized corpus that is necessary to support OCR development.

Now that we have amassed a foundational collection, in the coming year we will be seeking funding to progress to the next stage of development. Making the early newspapers searchable will further open up avenues of research, and will allow even non-Arabic speakers to get a broad idea of the papers’ contents by enabling them to more easily use online translation programs. While the complexity of this project means that it will take time, its importance ensures that the Khayrallah Center is pursuing this goal avidly.

Though the Historic Arabic Newspaper Digitization Project is ongoing, we are constantly adding new publications. Many are available freely on our digital archive; others are restricted to researchers for varied reasons, including copyright. Our complete holdings are viewable on our online finding guide, while we provide regular updates about our progress on social media. We welcome researchers and seek partners to help us fulfill our goal of creating a centralized, fully searchable database of historic Arabic newspapers from North and South America. This project depends on collaboration and support. If you are an individual or institution who owns copies of an historic Arabic-language newspaper, or know someone who may, contact our archivist to discuss digitization. If you do not own a newspaper but would like to contribute to our project, consider adopting a newspaper to help us cover the costs of digitizing these rare historic sources so they can be preserved for future generations and made accessible to researchers around the world.

3 responses on “Archive Spotlight: Historic Arabic Newspaper Digitization Project

  1. Laila Hussein says:

    What a great project. I am so greatfule to see such efforts.
    Laila

    1. Altaf Shaikh says:

      We are expertise in Arabic Data entry!!!

  2. Shoeb A. Siddiqui says:

    Great initiative! I have loads of experience in digital archiving and currently working on several digital projects of various global newspapers. We have expertise in Arabic language as well and have very efficient workflows for digital archiving in particular for old newspapers. One can reach me for further discussions.

Leave a Response

Your email address will not be published. All fields are required.