Scope and intended audience
This document is for IT staff at an organisation that is considering purchasing, or has purchased, RSpace Team or Enterprise.
For Enterprise, this document aims to provide an overview of deployment options to help guide your decision on how to deploy RSpace.
Which Edition should I choose and where will RSpace be deployed?
This is a fundamental decision for you to make, and there are various options, all of which can be supported and have proven to be workable with existing customers.
For deployments of RSpace with less than 15 users who have selected RSpace Team Edition, only option 1 is available. However, this document highlights various information about the SaaS offering that might be useful to the IT staff of an organization that has opted for the Team Edition.
Option 1: Research Space operates RSpace as a SaaS offering installed on an AWS server that we manage for you.
In this scenario, RSpace is deployed on your own private AWS instance in the cloud. Installation, backup, updates and maintenance are all performed by Research Space. For Team edition, ResearchSpace will select a data center location for you from the following list:
Oranizations who purchase RSpace Enterprise edition may request any AWS regional data center necessary to meet your data storage and regulatory requirements. Hosting on AWS is good option if you:
- Are unable or unwilling to dedicate staff time to installing and maintaining RSpace.
- Want complete convenience and to get up and running quickly.
- Anticipate expanding usage over time but do not have suitable resources of your own to accomodate this - AWS is essentially is unlimited in terms of data storage capability.
Option 2. (Requires RSpace Enterprise Edition). You install and operate RSpace within a suitable on-premises environment that you provide.
In this scenario, RSpace is installed on your institutional system - either physical machines, virtual hosts or your private cloud. You can learn more about how to install RSpace. This scenario is good if:
- Your data-compliance guidelines require you to store research data on-premises.
- You are able to dedicate some staff time to installing and maintaining RSpace.
- You have staff with IT experience in managing web applications in a Linux environment.
- You want absolute ownership and control over all aspects of the data life cycle, for example backup and recovery.
Option 3. (Requires RSpace Enterprise Edition). Research Space remotely installs and manages RSpace within a suitable on-premises environment that you provide.
This is a hybrid solution, where you want RSpace on-premises but don't want to delegate IT staff time to maintaining the system. In this case, you set up the infrastructure and Research Space will install and maintain the software. Responsibility for backup /disaster recovery is agreed before installation. This scenario is good if:
- Your data-compliance guidelines require you to store research data on-premises.
- You have limited desire or ability to dedicate staff time to managing RSpace
Here, the process would typically be:
- A kick-off meeting to meet each other and establish a procedure and timeline for the installation, and to confirm infrastructure requirements.
- Customer sets up infrastructure (e.g. virtual servers) with vanilla Ubuntu / Debian OS and grants SSH access to ResearchSpace installation technician.
- ResearchSpace performs basic installation
- Single SignOn / integrations are set up as required.
- RSpace is made available to users.
RSpace does not operate in isolation from your institutional data; in fact it shines when connecting and linking your research work together. In this section we review how the different deployment options described above affect these aspects of RSpace functionality.
Single Sign On
If you want your users to access and login to RSpace using Single Sign On, RSpace supports this for all deployment options, using the SAML2 protocol. Most Identity Providers (IdPs) such as Okta, Azure AD etc., support this protocol. For more details, see Setting up SingleSignOn authentication
Connecting to your existing data storage
RSpace can store and manage all sorts of data files, but there are occasions when your researchers will want to link to data files on an institutional file server rather than bringing the files into RSpace. This might be the case if
- The data files are huge, e.g. large images or sequencing files.
- Your data has to be stored on a particular file server for compliance reasons.
RSpace can talk to these servers using either Samba or SFTP protocols. It just requires read access to list files to link to.
This can be easier to set up for an on-prem installation; connecting from RSpace on AWS is entirely possible technically, but requires access from RSpace to your file-server. Please read Configuring Institutional File Systems for more details
RSpace has integrations with many popular applications including Dropbox, Google Drive, OneDrive, Microsoft Teams, Slack, Office 365, protocols.io, Github, Figshare and Dataverse- see Integrations for a full list. The setup required for each integration is variable. If you are running RSpace as a SaaS (option 1 above) , ResearchSpace will be able to set up these integrations for you. If you are running RSpace on-prem (or, more specifically, the RSpace URL is not a researchspace.com URL), then you will have to configure these integrations, as they often require proof of domain ownership to set up (e.g. Google Drive).
Branding image and custom links
RSpace Enterprise customers can customize the interface by adding an organizational branding / company logo image to the top right corner of the interface (replacing the standard RSpace image), and / or by adding up to 2 other custom text links in the page footer (e.g., pointing to a web page you maintain with information about data privacy policies, legal disclaimers, or other important information about using RSpace at your specific organization).
Getting data out of RSpace
RSpace supports export to all standard formats - HTML, XML, PDF, Microsoft Word and JSON (via RSpace API). Users can export their data themselves, at any level of granularity from a single document to their entire body of work, at any time, and download the export to their own machines. Exports can be scheduled using the API - e.g. running a cron-job to invoke export once a week.
If as a server administrator you want to do low-level data export, this is easily accomplished using standard, free tools. ELN metadata can be exported from the MySQL database using `mysqldump` or `Percona XtraBackup`), and from its internal file store via tools such as `rsync`.
No data is stored in a binary format proprietary to RSpace.
Standard on prem and hosted RSpace deployments are not appropriate for entry of sensitive data (e.g., patient information subject to HIPAA or similar regulatory rules). It is certainly possible, however, to deploy RSpace so that entry of sensitive data is supported and compliant. Often, this issue comes up where usage in a medical school is planned. In these situations, a solution is to deploy RSpace within a validated compliant environment you already use or that you create with assistance from ResearchSpace. Because of the increased cost of data storage and processing in these environments, it may even make sense to deploy a second instance of RSpace specifically for researchers who handle sensitive data, and researve your standard RSpace deployment for use by the majority of users, who don't need the extra 'compliance wrap'.
In the USA, AWS GovCloud offers a compliant computing environment for organisations bound by federal data-handling regulations. RSpace has been installed successfully in this environment.
Migration after a pilot
Customers often run a pilot of RSpace on AWS, before deciding to purchase an ongoing license. In that case you can decide whether to continue using the cloud instance as a production instance, or switching to an on-premises deployment. If you chose to move to an on-premises deployment, it's possible to migrate data that researchers entered into the cloud instance to the on-premises instance of RSpace.
For on-premise deployments of RSpace, backup is solely the customer’s responsibility. We will consult with your IT personnel at the time of deployment. For backing up AWS-based RSpace instances, ResearchSpace uses scripts to automate the backup process that we are happy to share with customers on request.
When deployed as SaaS (software as a service) onto an Amazon Web Services (AWS) private instance that we manage for you, ResearchSpace and Amazon take care of backups for you.
Data is stored in a MySQL 5.7 or MariaDB10.3 database; files are stored unmodified on EBS volumes in a directory structure.
- We make hourly file syncs to S3 using AWS CLI tool
- Nightly and weekly snapshots of instances and data volumes are stored as machine images (AMIs). These are fast to make, and support RTOs in the order of minutes.
- Logical database backups are made nightly, and stored on S3. Data Files, logs, configuration files and search indices are additionally synced to S3 hourly.
For in-depth description, please read SaaS Backup document.
In addition, customers can use the export API endpoint to make additional, scheduled, bulk data exports to any destination you like to act as an additional redundant data backup.