Wednesday, May 17, 2023

What is Scalability and How can affect on overall system performance?

A review of all scalability & HA plus connection pooling techniques on popular RDBMS.

Scalability is a system's ability to process more workload, with a proportional increase in system resource usage.

In a scalable system, if you double the workload, then the system uses twice as many system resources. This sounds obvious, but due to conflicts within the system, the resource usage might exceed twice the original workload.


Examples of poor scalability due to resource conflicts include the following:

  • Applications requiring significant concurrency management as user populations increase
  • Increased locking activities
  • Increased data consistency workload
  • Increased operating system workload
  • Transactions requiring increases in data access as data volumes increase
  • Poor SQL and index design resulting in a higher number of logical I/Os for the same number of rows returned
  • Reduced availability, because database objects take longer to maintain


An application is said to be unscalable if it exhausts a system resource to the point where no more throughput is possible when its workload is increased. 

Such applications result in fixed throughputs and poor response times.


Examples of resource exhaustion include the following:

  • Hardware exhaustion
  • Table scans in high-volume transactions causing inevitable disk I/O shortages
  • Excessive network requests, resulting in network and scheduling bottlenecks
  • Memory allocation causing paging and swapping
  • Excessive process and thread allocation causing operating system thrashing


This means that application designers must create a design that uses the same resources, regardless of user populations and data volumes, and does not put loads on the system resources beyond their limits.


There are two types of system scaling:


Vertical scaling involves increasing a database’s resources. Typically, this involves moving the database to a more powerful computer or to a larger instance type.

“More” is the key word. 

As with any hardware choice, you consider more powerful processors, more memory and/or more network bandwidth. You have to find a balance between them that optimally improves the database’s performance and the number of simultaneous users it can support, not to mention optimizing your hardware budget.


Horizontal scaling involves adding additional computer nodes to a cluster of instances that operate the database, without changing the size or capacity of any individual node. 

Horizontal scaling is also called scaling out (when you add nodes) or scaling in (when you decrease the number of nodes).

Depending on how it’s implemented, horizontal scaling can also improve the database’s overall reliability. It eliminates a single point of failure because you are increasing the number of nodes that can be used in failover situations. 


However, horizontal scaling also increases time and effort (and thus costs), because you need more nodes (and hence more failure points) to keep the database functional.

In other words, vertical scaling increases the size and computing power of a single instance or node, while horizontal scaling changes the number of nodes or instances.

Vertical scaling is an easy way to improve database performance, assuming that you have or can acquire a larger computer or instance. It typically can be implemented easily in the cloud, with no impact on the application or database architecture.


Another method of scaling database systems that is going to popular is sharding.


The Basics of Sharding

Sharding is a technique for improving a database’s overall performance, as well as increasing its storage and resource limits. It’s a relatively simple horizontal scaling technique.

With sharding, data is distributed across various partitions, or nodes. Each node holds only a portion of the data stored in the entire database. For example in Redis ,A key/value input is processed, and the data is stored in a shard.

When a request is made to the database, it is sent to a shard selector, which chooses the appropriate shard to send the request. In Redis, shard selection is often implemented by a proxy that looks at the key for the requested data, and based on the key, the proxy sends the request to the appropriate shard instance.

The shard selection algorithm is deterministic, which means every request for a given key always goes to the same shard. Only that shard has information for a given data key, as illustrated by below figures in Redis database.


Horizontal scaling via sharding

Sharding is a relatively easy way to scale out a database’s capacity. By adding three shards to a Redis OSS implementation, for instance, you can nearly triple the database’s performance and triple the storage limits.


How to increase performance by scaling application and database?


Oracle: 

Use Real Application Cluster - RAC that give you load balancing, scalability, high availability features.


PostgreSQL: 

Use cluster and replication techniques such as :Pgpool, Patroni, repmgr.


For connection pooling and load balancing in enterprise environments you can use:

Pgbouncer, Pgpool, Heimdall.


Pgpool-ii  gives you all of needed features as 'All in One' package but when need more and more loading challenges you can integrate it with Pgbouncer.


MariaDB:

Use Galera cluster integrated with HaProxy.


MySQL: 

Use cluster NDB.


For load balancing : Use haproxy + Keepalive

Or MySQL Maxscale.


SQLServer:

Use AlwaysOn technology or replication.


Sqlserver also available on Linux and can be integrated by Pacemaker/ etcd/ Samba & NFS to handle cluster management.


Nowadays, all aspects of RDBMS try to enhance availability and scalability features such as Sharding. 

So on Oracle this feature is available.


Although Nosql databases such as Mongodb , Redis ,... also provides sharding techniques and can use by specific situations by considering application requirements.


Note:

Migration to cluster databases is not always a good solution on overall, that means it is very depends on you application's behaviors, work load, DBA experience, available resources.

Also clusters database always leads to more challenges in troubleshooting and more maintenance cost.


Regards,

Alireza Kamrani,

Senior RDBMS Consultant.

Fleet Patching & Provisioning (FPP) Concepts and cloning methods for Oracle/Grid Home

  Fleet Patching & Provisioning (FPP) Concepts and cloning methods for Oracle/Grid Home:                           ♠️Alireza Kamrani♠️  ...