The JupiterOne Contingency Plan establishes procedures to recover JupiterOne following a disruption resulting from a disaster. This Disaster Recovery Policy is maintained by the JupiterOne Security Officer.
NIST: This JupiterOne Contingency Plan is created under the legislative requirements set forth in the Federal Information Security Management Act (FISMA) of 2014 and the guidelines established by the National Institute of Standards and Technology (NIST) Special Publication (SP) 800-34.
JupiterOne policy requires that:
(a) A plan and process for business continuity and disaster recovery (BCDR), including the backup and recovery of systems and data, must be defined and documented.
(b) BCDR shall be simulated and tested at least once a year. Metrics shall be measured and identified recovery enhancements shall be filed to improve the BCDR process.
(c) Security controls and requirements must be maintained during all BCDR activities.
The following objectives have been established for this plan:
Maximize the effectiveness of contingency operations through an established plan that consists of the following phases:
Identify the activities, resources, and procedures needed to carry out JupiterOne processing requirements during prolonged interruptions to normal operations.
Identify and define the impact of interruptions to JupiterOne systems.
Assign responsibilities to designated personnel and provide guidance for recovering JupiterOne during prolonged periods of interruption to normal operations.
Ensure coordination with other JupiterOne staff who will participate in the contingency planning strategies.
Ensure coordination with external points of contact and vendors who will participate in the contingency planning strategies.
Example of the types of disasters that would initiate this plan are natural disaster, political disturbances, man made disaster, external human threats, and internal malicious activities.
JupiterOne defined two categories of systems from a disaster recovery perspective.
The following order of succession to ensure that decision-making authority for the JupiterOne Contingency Plan is uninterrupted. The Chief Operating Officer (COO) is responsible for ensuring the safety of personnel and the execution of procedures documented within this JupiterOne Contingency Plan. The Director of Engineering is responsible for the recovery of JupiterOne technical environments. If the COO or Director of Engineering is unable to function as the overall authority or chooses to delegate this responsibility to a successor, the CEO shall function as that authority or choose an alternative delegate. To provide contact initiation should the contingency plan need to be initiated, please use the contact list below.
The following teams have been developed and trained to respond to a contingency event affecting JupiterOne infrastructure and systems.
IT is responsible for recovery of the JupiterOne hosted environment, network devices, and all servers. The team includes personnel responsible for the daily IT operations and maintenance. The team leader is the IT Manager who reports to the COO.
HR & Facilities is responsible for ensuring the physical safety of all JupiterOne personnel and environmental safety at each JupiterOne physical location. The team members also include site leads at each JupiterOne work site. The team leader is the Facilities Manager who reports to the COO.
DevOps is responsible for assuring all applications, web services, platform and their supporting infrastructure in the Cloud. The team is also responsible for testing re-deployments and assessing damage to the environment. The team leader is the Director of Engineering.
Security is responsible for assessing and responding to all cybersecurity related incidents according to JupiterOne Incident Response policy and procedures. The security team shall assist the above teams in recovery as needed in non-cybersecurity events. The team leader is the Security Officer.
Members of above teams must maintain local copies of the contact information of the BCDR succession team. Additionally, the team leads must maintain a local copy of this policy in the event Internet access is not available during a disaster scenario.
All executive leadership shall be informed of any and all contingency events. Current members of JupiterOne leadership team include the Security Officer, Director of Engineering, and Director of Security Engineering.
This phase addresses the initial actions taken to detect and assess damage inflicted by a disruption to JupiterOne. Based on the assessment of the Event, sometimes according to the JupiterOne Incident Response Policy, the Contingency Plan may be activated by either the CEO or Head of Engineering. The Contingency Plan may also be activated by the Security Officer in the event of a cyber disaster.
The notification sequence is listed below:
The CEO, or their delegate tasked with assessment proceedings is referred to below as the DR Coordinator.
The DR Coordinator is to logically assess damage, gain insight into whether the infrastructure is salvageable, and work with the Response Team to follow any pre-existing procedures for dealing with the damage. If no pre-existing procedures exist, the DR Coordinator is to begin to formulate a plan for recovery.
The JupiterOne Contingency Plan is to be activated if one or more of the following criteria are met:
The DR Coordinator makes the final call whether or not to proceed with the next phase of Disaster Recovery.
If the Disaster Recover plan is activated, the activation sequence below is followed:
The DR Coordinator notifies and informs group leaders and management of the details of the event and if relocation is required.
Upon notification from the DR Coordinator, group leaders and managers are to notify their respective teams. Team members are to be informed of all applicable information and prepared to respond and relocate if necessary.
The DR Coordinator notifies all cloud-service and/or hosting facility partners that a contingency event has been declared and to ship the necessary materials (as determined by damage assessment) to the alternate site.
The DR Coordinator notifies remaining personnel and executive leadership on the general status of the incident.
Notifications can be in-person or via message, email, or phone. Given the seriousness of the proceedings, multiple notification channels should be utilized to confirm authenticity of the notification.
This section provides procedures for recovering JupiterOne infrastructure and operations at an alternate site, whereas other efforts are directed to repair damage to the original system and capabilities.
Procedures are outlined per team required. Each procedure should be executed in the sequence it is presented to maintain efficient operations.
Recovery Goal: Rebuild JupiterOne infrastructure to a production state in 24 hours.
The tasks outlined below are not sequential and some can be run in parallel.
For detailed recovery instructions, please consult the “Backup and Recovery” Engineering Wiki entry.
This section discusses activities necessary for restoring full JupiterOne operations at the original or new site, once the disaster or outage has been mitigated during the Recovery phase.
If necessary, when the availability zone or services utilized by the Cloud Service Provider have been restored, JupiterOne operations that have shifted to alternate zones or providers may be transitioned back. The goal is to provide a seamless transition of operations from the alternate site or zones to the primary mode of operation as soon as safely possible.
Original or New Site Restoration
The Director of Engineering shall establish criteria for validation/testing of a Contingency Plan, an annual test schedule, and ensure implementation of the test. BCDR1 This process will also serve as training for personnel involved in the plan’s execution. The types of validation/testing exercises include tabletop and technical testing. Contingency Plans for all application systems must be tested at a minimum using the tabletop testing process. However, if the application system Contingency Plan is included in the technical testing of their respective support systems that technical test will satisfy the annual requirement.
Tabletop Testing is conducted in accordance with the CMS Risk Management Handbook, Volume 2. The primary objective of the tabletop test is to ensure designated personnel are knowledgeable and capable of performing the notification/activation requirements and procedures as outlined in the CP, in a timely manner. The exercises include, but are not limited to:
The primary objective of the technical test is to ensure the communication processes and data storage and recovery processes can function at an alternate site to perform the functions and capabilities of the system within the designated requirements. Technical testing shall include, but is not limited to:
In the event a JupiterOne facility is not functioning due to a disaster, employees will work from home or locate to a secondary site with Internet access, until the physical recovery of the facility impacted is complete. The recovery shall be performed by the facility management firm under contract with JupiterOne, and coordinated by the Facility Manager and/or the Site Lead.
JupiterOne’s software development organization has the ability to work from any location with Internet access and does not require an office network to perform business duties.
The Director of Engineering will develop a status notification system to provide real time update and inform our customers of the status of each service. The notification system is updated with details about an event that may cause service interruption / downtime. The Director of Engineering works with the Security Team to test this mechanism on a quarterly basis to make sure that all processes and automation associated with status are working correctly. BCDR2
A follow up root-cause analysis details (RCA) will be available to customers upon request after the event has transpired for further details to cause and remediation plan for the future.
Production data that cannot be trivially reproduced is to be synchronized across multiple data stores in AWS. For example, clustering technology or configurations may be used to provide additional availability guarantees.
Additionally, Confidential or higher data stored in S3 buckets is backed up to AWS Glacier for long term storage and recovery.
In an event that requires data to be recovered, it will be retrieved from Glacier (if S3), or via the automated restoration options available for many high-level services like AWS DynamoDB.
JupiterOne assumes that in the worst-case scenario (e.g. one of the production environments suffers a complete data loss) the account will be reconstructed from code, and the data restored from Glacier that is hosted within a different AWS account and geolocation.
Recovery of production Environments and data should follow the procedures listed above and in Data Management - Backup and Recovery.