Protect Your Database in Oracle Cloud with Fault Domains, RAC & Data Guard

    Protect Your Database in Oracle Cloud with Fault Domains, RAC & Data Guard

    Bill Rehm - Database Manager

    Before the cloud, companies hosted their servers on premises or at a data center. The machines could be individual bare metal servers or virtual, but regardless of how you set it up you could physically touch the boxes and manage them separately. If you needed to upgrade one server, change out a network card, or patch the software, you would only need to take that one server out of commission to do it. For mission-critical servers you could build a cluster so that if one of the machines went down for maintenance or patching, you’d still be able to let the business continue working.

    The cloud is different. Although it’s still a pile of machines in a building somewhere, you don’t have any physical access, or much say in how servers are provisioned. When the cloud provider needs to fix the hardware, your application is using or patch your DBaaS database, they don’t coordinate that with you. They tell you that it’s going to happen, and, if it’s not an emergency, they may give you the option to defer the event.

    How do you protect your database and keep your company running 24/7 if you have limited control over server and software patching? With Oracle Database and Oracle Cloud Infrastructure, you can use Real Application Clusters (RAC) and Data Guard to spread your data across OCI Fault Domains, Availability Domains, and Regions.

    First, here’s an overview of Oracle Cloud Infrastructure (OCI): At the highest level of OCI are the Regions. These are geographically isolated data centers located around the world. Within each Region is one to three Availability Domains. Each Availability Domain is physically isolated from other ones and they don’t share any power, cooling, or network resources. Within each Availability Domain are three Fault Domains. Fault Domains are physically separate hardware stacks with independent and redundant power supplies.

    When Oracle carries out maintenance on their hardware inside an Availability Domain, they do it on only one Fault Domain at a time. Because the Fault Domains are physically separate, hardware failures won’t bring down everything in an Availability Domain.

    Now you can build an Oracle RAC database that spans Fault Domains. This not only provides load balancing and instance High Availability, but also protects against hardware failures and maintenance. You don’t have to use all three Fault Domains and create a three-node RAC instance though. You can just have a two-node RAC using just two of the Fault Domains if you like. How about a six-node RAC with two nodes in each Fault Domain? Well, that might be a little overkill for most companies, but you still have the option.

    Oracle RAC across Fault Domains also protects against some patching. If the patch is listed as “Oracle RAC Rolling Installable”, then the process will patch one node at a time. This leaves the other node(s) active and servicing database requests. Unfortunately, not all patches are “RAC rolling”, so you’ll still have to contend with downtimes even with this HA configuration.

    To maintain business continuity during ALL patching events, you can use Active Data Guard to replicate your entire database to another Availability Domain within the Region. This allows you to switch over to an exact secondary copy of your database and all its data while the primary database is undergoing maintenance. This also protects you against a (very unlikely) failure of all the servers in an Availability Domain.

    Something to keep in mind about having data and applications in separate Availability Domains is the latency. Applications like JD Edwards are far too chatty between the database and the application servers for them to be separated even by subnets, much less physically separate networks. Your primary application should only point to the secondary replica during maintenance windows and emergencies. What can work well is if you open the secondary replica for read-only reporting where latency isn’t an issue. Third-party apps and even JDE UBEs can point to that replica and take the load off of the primary database instance.

    Now, this secondary Data Guard instance in the other Availability Domain is still technically just High Availability, not Disaster Recovery. It’s still in the same geographical location, so you aren’t protected against large-scale outages that affect the data center like fire, explosions, and natural disasters. To have true disaster recovery you need a third Data Guard replica in another OCI Region. If a whole city blows up or gets hit by a hurricane, the additional exact copy of your data will still be live and online. Ideally, you’ll also have your application servers duplicated in the Disaster Recovery Region as well. Otherwise, the DR database instance is of little more use than a regular backup.

    I bet all this looks and sounds expensive, and you’d be right thinking that. But How much does it cost for your business to sit idle? Consider this:

    • Salaries of people doing nothing while waiting for the fix
    • Money paid to consultants to try to fix whatever problem is causing the outage.
    • Online orders being missed.
    • Trucks waiting at loading docks for orders that can’t be filled.
    • Contracts with suppliers that will be breached.
    • Perishable goods in danger of going bad.
    • Paying workers to not only do their job, but somehow make up for time lost during the outage.
    • Recreating any lost data.
    • Repairing incomplete transactions.
    • Damage to your brand due to the outage.

    How much does that cost for just one hour? Two hours? 24 hours? At some point, the numbers will show that a complex and comprehensive DR infrastructure is worth every penny. Let GSI help today.