Collaborative Approaches to Integrate Repositories within the Research Information Ecosystem: Creating Bridges for Common Goals

ABSTRACT Open Access (OA) has many tendrils running across the wider research information landscape. There are more researchers, organizations, and systems than ever before engaging with (or being asked to engage with) OA throughout the research ecosystem. However, too often OA activities and processes within repositories remain siloed from research information management systems (RIMS) and tasks, creating an undue burden of time and duplicating effort, thereby undermining the overall effectiveness of OA. By investing in interoperable metadata standards and practices, and creating a networked landscape of systems and community, technology ecosystems can be created that encourage researchers to make even more of their research open while streamlining research information management activities. By unifying the community around a more sustainable, systems-agnostic approach focused on flexible interoperability, it is possible to create an environment in which organizations can choose the tools relevant to their needs, bring those tools together in a complementary dynamic, and maximize data reuse.


Introduction
Founded in 1900, Carnegie Mellon University (CMU) is a comprehensive private research university located in Pittsburgh, Pennsylvania. In 2019, CMU and vendor partners Figshare and Symplectic (part of Digital Science) collaborated to further develop an interoperable, networked research information ecosystem built between its Institutional Repository (IR) and its Research Information Management System (RIMS). This was further developed to include strategies to bring together communities, content, systems and processes within the interconnected ecosystem. Drawing examples from the usage of Figshare for Institutions as the repository system and Symplectic Elements as the RIMS, this collaboration specifically explored where repositories sit within the research information management landscape, and how they can be connected to RIMS. This further included outlining opportunities for interconnectivity within multiple connective points and highlighted the successes of vendors and institutions working collaboratively toward common goals and achievements.

Institutional repositories and RIMS
There are many possible options today when it comes to repositories and RIMS, including solutions that are built upon open-source or proprietary platforms, which can be locally-hosted or cloudbased. While developed for different primary purposes, institutional repositories (IRs) and RIMS have shared very similar histories. Both systems emerged in the 1990s, first as more specialist solutions, before the shift to rapid growth and wider adoption during the early 2000s. In the last decade, as both systems have become even more commonplace, they have both seen widening usage and roles within the information infrastructures of institutions. 1

Institutional repositories
In 2017, CMU implemented a new repository service built upon the Figshare for Institutions platform. Combining the features and capabilities of a traditional Institutional Repository and Data Repository, CMU's new "Comprehensive Repository" would be known as KiltHub. 2 The KiltHub repository provides researchers the means to make the broader scope of their scholarly output and research data published Open Access, thereby allowing the researcher to expand the reach of their research narrative. Like many IRs, CMU's KiltHub repository serves a dual role for scholarly outputs as a repository both for items that have never been formally published before, as well as items that have been published previously. For those items not formally published before, such as technical reports, white papers, etc., the version published within the repository is recognized as the version of record. While other versions of the work may exist in other venues, the version published within the repository is the version that is recognized as the official version.
For items first-published in other venues, such as published journal articles, the repository serves in its more traditional role as a mechanism for facilitating Green Open Access. In this role, the repository provides the means and mechanisms so that authors can exercise their rights as defined within their copyright transfer agreements. This allows an author to disseminate a particular version of their publication via Open Access in certain defined venues, such as their institutional repository. In most cases, the version that authors are permitted to share is the version most commonly referred to as the "Author's Accepted Manuscript (AAM)" version. The AAM version of the publication is the final version of the publication submitted to the publisher prior to publication. This version includes all of the requested revisions from the peer-review process. While missing the branding, final typesetting, and layout from the publisher's final processes, intellectually, the AAM is recognized as the final accepted version of the article.

Research information management systems
A RIMS is defined as a system that "aggregates, curates, and utilizes metadata about a wide range of research activities." 3 This includes drawing in metadata (See Figure 1) from sources that are both external and internal to the institution, such as publication databases, Human Resource (HR) data, grants management systems; as well as personal information contained within a faculty member's curriculum vitae. There is a diversity of names used to describe RIMS. For example, RIMS are most commonly referred to as Current Research Information Systems (CRIS), Faculty Activity Reporting Systems (FARS), and Faculty Information Systems (FIS). While known by different names, these systems are essentially used for very similar purposes, including faculty annual reviews/reports, Open Access compliance monitoring, curating information to populate public profiles or website content, and central research analytics.
While beginning as systems that were developed and maintained internally to individual institutions, many RIM managers began meeting to share code and processes as early as the early 1990s. By 2002, these informal meetings were formalized with the creation of the European standard for CRIS data, the Common European Research Information Format (CERIF), which, in turn, led to the creation of EuroCris, a European organization built around RIMS management and the standardization of their information. 5 As RIMS have become increasingly common in universities and research institutions around the world, additional communities have developed to support the growing number of research managers and administrators, including a number of national and international professional associations. 6 To date, there is no such body, oversight organization, or set of standardized policies for RIMS within the United States. Because of the diverse array of information, and the various ways these systems may be used, there are many different units within a university that may claim oversight over the system. For many institutions, this also includes their academic libraries.
Since 2017, OCLC has studied the adoption of RIMS by libraries and its implication for service delivery. Most notably, in their 2017 report, "Research Information Management: Defining RIM and the Library's Role," OCLC noted the particular roles libraries may play within RIMS management. 7 Today, libraries seek to be aligned with their institutions' strategic research focus. By building upon their expertise with publications and scholarship through discoverability, access, and training support; libraries can expand their overall focus stewarding the scholarly record. This further extends the libraries role and mission of providing additional research-focused support. This provides libraries the mechanism and services to extend their visibility, thereby expanding their role, which presents the libraries as a logical advocate and supporter of RIM system management.

Scholarly communication ecosystem
Since first defined by Brosman and Kramer in 2015 in their work, "101 Innovations in Scholarly Communication," the Scholarly Communications Ecosystem has been represented as a classification of tools and services that a researcher could utilize as they progress through the lifecycle of their research. 8 Beginning with 101 tools in 2015, the current publicly-compiled list has grown to over 680 different tools, and continues to grow as more are developed. These tools have been classified into the six recognized research lifecycle stages defined by Brosman and Kramer: Discover, Analysis, Writing, Publication, Outreach, and Assessment. 9 By classifying the growing list of possible scholarly communication tools, one could now develop individual scholarly communication ecosystems and workflows around common themes. The themes recognized by Brosman and Kramer included ecosystems that utilize both traditional and modern tools, as well as ecosystems built around open source/community-driven tools, and proprietary tools from providers such as Elsevier and Digital Science.
Leaning upon the findings of Brosman and Kramer, CMU began developing its own Scholarly Communication Ecosystem in 2016. The CMU ecosystem is divided into five stages: Discovery, Organize, Create, Share, and Impact. With these five stages, the CMU Libraries could organize the plethora of tools and services it provides in a way that highlighted the usage and usefulness of each service and its potential connectivity to researchers as they progressed through their own research lifecycle. This has also included the usage (e.g., views and downloads) and impact (e.g., citations and altmetric attention) that could be made by the institutional repository and RIM system.

Repositories and the SCE
Institutional Repositories (IRs) are tied to the Open Access movement as a facilitator of OA paper publishing and OA paper archiving. By supporting a number of Open Access publishing methods and scholarly communication initiatives, institutional repository infrastructure provides a long-term commitment to safeguarding, preserving, and making accessible the digitized intellectual output of one's institution. 10 The growth of institutional repositories in the last fifteen years has been staggering. The Directory of Open Access Repositories (OpenDOAR) dataset shows a jump from eighty-seven indexed institutional repositories in December, 2005 to over 4,300 global IRs as of October, 2019. 11 Some of the most popular IR infrastructures globally are still some of the original platforms created back in the early 2000s -DSpace (developed in 2002 at MIT), EPrints (developed in 2000 at The University of Southampton), and Bepress (founded in Berkeley, California in 1999)though in recent years, there have been major developments in OA repository technology and the larger repository environment. 12 As the research enterprise becomes increasingly digitized and reliant on large scale datasets, computing results, and data analysis, the technology associated with scholarly communication has advanced in step. Repositories were initially developed as an open archive for research papers, though new repository platforms have emerged as a way to manage not only publications but also research data. Research data is loosely defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings. 13 Publishing standalone or underlying research data alongside an article provides additional context to the research conducted. It also offers an opportunity to increase efficiencies across the research process, enabling reproduction and verification of results, making the results of publicly funded research available, enabling others to ask new questions about the data, and advancing the state of research and innovation. 14 In response to research becoming more computational and driven by data, Figshare was founded in 2011 to fill that void in the scholarly communication workflow by offering a platform for researchers to publish their research data in a persistently shareable, discoverable, citable manner. What set the platform apart was the ability to accept all file types, assigning metadata to those files, and, in turn, indexing them and sharing the published content throughout the scholarly communication ecosystem. Initially created as a generalist repository for end users, Figshare expanded its offerings in 2013 to include Figshare for Institutions. 15 Figshare for Institutions provides a way to leverage the Figshare platform to provide a suite of institutional repository services. With Figshare for Institutions, universities can highlight and publish all of the school's research generated throughout the research lifecycle, from grant and data management planning to data collection and raw outputs to article publication and supplemental data, all via a single platform. 16 True to the Open Access mission, Figshare ensures all published content receives metadata for context, a university-branded Digital Object Identifier (DOI) to aid in citability and persistence of content, and an appropriate copyright license to enable the use, reuse, and potential reproducibility of published content. Figshare also adheres to and follows a number of community-led principles, namely the FAIR Principles (Findable, Accessible, Interoparable, and Reusable), a set of guidelines to ensure the maximum findability, accessibility, interoperability, and reuse of published research data. 17 Launched in 2017 using the Figshare for Institutions platform, the KiltHub repository at Carnegie Mellon University offers researchers support for open research data and Open Access publications, essentially a comprehensive repository to serve all levels of the publication stage of the scholarly communication ecosystem.

RIMS and the SCE
RIM systems were created to help research institutions collect and store structured data about faculty research and scholarly activities, allowing those organisations to build up a collection of information about their research outputs and activities which can be repurposed in a variety of ways. "RIMS benefit academic institutions through both their efficiency and their effectiveness. Providing a central repository of information about faculty scholarship and research activities, from which multiple outputs may be exported, allows for efficient capture and reuse of faculty data." 18 RIMS frequently become an enterprise system within research institutions, providing a trusted space for consolidating and linking datasets from across the institution by interfacing with a wide variety of systems. As a result, the ways in which a RIMS can be positioned and the key use cases and drivers which emerge for the RIMS can vary from organisation to organisation.
Interest in integrating institutional repositories with CRIS systems has grown steadily over the years and while it is far from a new initiative (Symplectic built their first repository integration over a decade ago) work in this space has continued to evolve and grow over time. A key driver for this appears to be linked to a growing awareness that the rich collections of interlinked metadata collected in RIMS can be a significant asset and make a substantial contribution to FAIR activities and open science infrastructure. 19 Studies have demonstrated growth in the number of institutions opting to integrate their RIMS with IRs, an experience echoed within the Symplectic community. 20 Building the network

Integrating RIMS and IRs
Integrating with IRs has become one of the primary use cases of Symplectic Elements, and they have extensive experience building sustainable and scalable repository integrations and Open Access monitoring functionality, supporting over eighty such integrations at time of writing. Back in 2016, building on advancements in repository technology, Symplectic opted to completely redesign their repository integration platform from the ground up to create a flexible framework to power the next generation of repository integrations for Elements, called Repository Tools 2 (RT2). 21 RT2 integrations are rich bi-directional integrations, built on a foundation of harvest functionality which establishes the repository as a data source for Elements. This allows Elements to provide authoritative Open Access monitoring and minimizes duplicates. It also enables academics and administrators to view and interact with all of their OA outputs in a single system, saving time and effort. Because Elements already has a picture of the organisation's publishing activity, it can use the data collected to actively encourage researchers and administrators to make their works openly available. It can also act as an additional deposit interface, increasing the likelihood a researcher will deposit their publications by reducing the time and effort required and removing the need to re-type metadata into forms when depositing to the repository.
Elements integrates with a number of repository platforms including Figshare for Institutions (as used by CMU) as well as others such as DSpace, EPrints and Hyrax, all through a common integration framework. The RT2 framework allows Elements to be connected to the native repository Application Programming Interfaces (APIs) for the relevant platform, and to programmatically (and with appropriate authorisations) read metadata from and write deposits to the repository. In order to build this kind of rich integration, it is very important that all systems involved are investing in the underpinning architecture required to maximize interoperability. Stable and standardized APIs are an essential building block for integrating systems, providing the basis for communication and this integrating the systems. However, on their own, they are not enough to really create an ecosystem, as they will often stop short of the complexity necessary to link the systems together on a deeper level.
Both RIMS and IRs capture complex metadata describing research outputs or activities; however, each will do so in slightly different ways, working towards slightly different purposes and slightly different metadata schemas. Whilst it can be tempting to attempt to create one metadata schema that will remove much of this complexity, in reality, the diverse needs of research institutions and their researchers around the world will always mean they require flexibility to align metadata with their local needs. Instead, by going beyond APIs to create translation layers and administrative tool sets, we can ensure that the metadata captured within these systems can be meaningfully mapped from one system to another, even being transformed as necessary along the way. Within Symplectic's RT2 framework, the crosswalk toolkit defines how each institution's data will be mapped between Elements and the repository. Defined using crosswalk map files, for both harvest and deposit, these incredibly flexible crosswalks allow institutions to retain control over how to structure their data in each system and ensure they are transferred and transformed accurately. To ensure that the repository integration is easy to implement and maintain, all configuration of the integration, including writing and updating crosswalks, can be completed from within the Elements User Interface (UI). Translation layers and toolkits such as these remove barriers for transferring data between systems, creating a deeper and more meaningful interconnection between them.

Institutional benefits to integration
At CMU, the bi-directional integration between the repository and the RIMS has enriched both systems. From the repository to the RIMS, the repository serves as a data source for metadata and content matching. While users could create manual records within their Elements profiles instead, the integration auto-feeds information about their publications and other materials from Figshare into Elements, thereby saving time and effort to provide this information. Because both systems utilize a common user feed, the metadata records are auto-claimed to the users' Elements profile.
Additionally, from Elements to KiltHub, the RIMS is able to provide a means to verify the deposit status of publications, and deposit publications to Figshare via Elements. By utilizing the citation information provided by the publication feeds, Elements utilizes an integration with Jisc's SHERPA RoMEO service to analyze the potential opportunity to make that publication in the user's KiltHub profile. If a publication is identified as being eligible for repository deposit, and a record is not already found within KiltHub, a user is presented with the opportunity to deposit that item into KiltHub. Elements presents the user with this information, as well as a workflow that mirrors the submission workflow of the repository, including providing the user the means to agree to the repository's deposit agreement.
Once the user agreed to the terms of deposit, the publication's metadata record and the file supplied by the user would be submitted to the repository to be verified by the repository specialist as any direct-deposit to the repository would be. Through these two pathways, the ecosystem between the repository and the RIMS can be used to monitor the overall Open Access status of the institution. Additionally, the ecosystem can be utilized to assist in improving the level of involvement the faculty have with the repository. By developing this bi-directional integration, faculty do not always have to directly interact with the repository to make their content Open Access. The integration provides the user the choice to work within whichever of the two environments best suits them, whilst continuing to help them make their work Open Access.
The flexibility provided by the RT2 crosswalk framework has allowed CMU to adjust the default workflows to better align with local requirements and ensure that all metadata are being accurately represented in both systems. For example, an early issue identified with the bi-directional integration between Elements and KiltHub related to the way in which outputs harvested from the repository were matched within Elements. Because of the identifier features of Figshare, all records published in KiltHub receive their own repository-based DOI, including content that was previously published. Beyond providing a means to cite the content within the repository, the DOI was utilized as an identifier to the repository content. At the time, the integration from KiltHub to Elements was designed to identify the repository DOI as the official DOI of that item. While an item that may have been published prior may have also had the version-of-record (VOR) DOI from the publisher listed in the metadata, it was not added in a way that Elements could recognize the difference between it and the repository DOI. This prevented the content from KiltHub to be matched accurately against the records of the publication already found and claimed by Elements.
The integration between Elements and KiltHub needed to be revised to account for these different kinds of DOIs, and for logic to be added that would inform how Elements would progress from searching and identifying the two possible DOIs listed within a single record. The solution developed produced two paths for content to be added from KiltHub to Elements. The first path for content that may have VOR DOI present required revisions to the KiltHub metadata structure. A new field would be created that would account for the VOR DOI. The integration crosswalk map could then also be revised to reflect the addition of this new metadata field and ensure it was mapped to the correct place in Elements.
This meant that Elements would now search within a citation record for this particular metadata field. If this field is utilized, Elements would know that this record should match to an existing record it may already possess via the VOR DOI from the publisher. This ensures that the citation record from KiltHub can be consolidated as an additional source where that citation was found. Because KiltHub was recognized as the CMU repository, Elements would know that if a citation record was found within KiltHub, it would mean that the publication was already made Open Access, and could be listed within the Open Access monitoring mechanisms built into Elements.
For the scholarly outputs not previously published, the DOI provided by the repository would be the VOR DOI. The official version of that record would be recognized by Elements as the version found within the repository and would cause Elements to create a unique record directly within the user's profile for that item identifying its source as the repository. By developing a two-tiered solution, the integration between Elements and KiltHub now better represents the unique features and capabilities of each system to the greater benefit and usage of the entire ecosystem.

Why this matters
The development and implementation of the ecosystem is important to all stakeholders. For institutions, this provides the widest level of management and ownership of their information. Regardless of the tools and mechanisms are local or external, proprietary or open source, the institution can maintain leverage when it comes to the disparate internal and external locations where the information about their activities and community reside. In this way, the repository and RIM serve as individual data warehouses for research outputs and research information across the scholarly landscape.
By connecting and displaying information from diverse locations into a central view, the ecosystem can begin to solve practical issues when it comes to the academic infrastructure, including faculty annual reviews and web-site content management and development. This ensures that the old adage of "enter once and use exponentially" can be realized. Additionally, because the wide range of information about one's research narrative can be consolidated into this single view, different analyses and examinations of the research endeavor can be explored. Through the research ecosystem, researchers and administrators can examine themselves to better understand their current practice and explore future opportunities, such as potential research collaborations, grant applications, and publishing venues. From a library perspective, the ecosystem could be used to examine publishing trends versus library holdings and subscriptions, and explore the question of "How Open Access are we?" By working with internal and external partners that share the common goal of interoperability and data accessibility, each institution is able to develop a scholarly communication ecosystem that provides them the greatest level of choice. All this is dependent upon working with partners and services that understand that their value is not just in their own offerings, but in their collective capability to work with all potential players when such integrations present real, practical solutions. Institutions should be able to choose the partners and services that work best for their needs and budgetary constraints regardless of source or provider. Different institutions will require different solutions, and institutions should have the power to choose what will work best for their users and their institutional needs.

The future of repositories and RIMS
One of the key aspects of the FAIR Principles is the ability for interoperability between systems and this speaks well to the future of IRs and RIMS alike. Figshare was founded on the principles of the open data and Open Access movements and promotes a vision to change how academic publishing operates by improving the dissemination and discoverability of all scholarly research and data. Since it began in 2011, the Figshare team has focused on using new technologies to aid all members of the scholarly communication ecosystem, from researchers to publishers to institutions, in their attempts to better manage and disseminate academic research.
While the journal article isn't going anywhere any time soon, the team at Figshare has been working hard with the rest of the scholarly community to help recognize research data as a first-class research object. The value of openly-available research data cannot be understated and the methods needed to capture, curate, and disseminate this information alongside publications can speed up how research is communicated. It is for this reason that one of the tenants of Figshare's mission and core beliefs is that academic researchers should never have to put the same information into multiple systems at the same institution. 22 Interoperability, efficiency, and ease of use allow academics to build on the ever-growing global corpus of research and speed up how that research is communicated to the benefit of society.
The goal and future of repository technology is for it to fit into different workflow processes seamlessly, making responsible data management and data sharing an uninterrupted and unobtrusive part of the research lifecycle and scholarly communication alike. Supporting interoperability in sustainable and scalable ways will remain a core focus for RIMS, allowing them to be more deeply integrated with not just IRs but also a wide range of systems from across the scholarly ecosystem. This creates opportunities for a whole network of systems to be connected in a variety of ways. As the development of stable, standards-based APIs and specialized translation layers and administrative toolkits continues to grow, the barriers to integrate should continue to decrease, moving ever closer to a plug-and-play world of integrations. Symplectic continues to grow and enhance our API but also to add to our suite of integration technologies, allowing us to create new integrations which are smarter and more efficient, whilst continuing to maximize the opportunities for data reuse.

Building community
In North America today, communities are beginning to form around RIM platforms. These are similar to the same communities that exist in Europe and Australia in that it connects the institutions that use the same common platform. Within these communities, institutions can share their experiences, both the good and the bad, in administering and utilizing a RIM system. Because RIMS are administered by varying organizational units within institutions, these communities bring together a diverse set of perspectives and stakeholders.
Where the North American communities differ from other global uses cases in that there is not a single unifying body that brings together the use of RIMS across the vendor/platform landscape. In both Australia and the UK, the Australian Research Management Society (ARMS) and the Association of Research Managers and Administrators (ARMA) serve as national-based organizations that bring together institutions across the RIM landscape both to discuss their common priorities and goals, as well as to learn from one another's unique goals and focuses. 23 Likewise, European institutions also can turn towards EuroCRIS as a multinational organization that promotes cooperation and the sharing of knowledge focused on the use and management of CRIS systems in Europe. 24 There is currently no organization or representative body in North America that brings together the breadth and diversity of RIM systems in North America. This is not to say that one could not be created. Given recent research about the adoption of RIMS, one could argue that a logical organization that could be at the forefront of serving as an initial driver for a broader and more cohesive North American RIM community would be OCLC. While OCLC may be a logical first step and key stakeholder community, there may be a need to think beyond the involvement of academic libraries. As many RIM systems are managed and led by other organizational units and groups, the body of stakeholders for RIMS continues to grow. This leads to multiple organizations having a potential claim to the building of a new cohesive community that goes beyond any single platform, stakeholder, or use case. Developing such an organization will require continued examination of the use of RIMS, IRs, and other related services, tools, and platforms as their adoption and use continues to grow in diversity, expertise, and potential use cases within North America.