By Dan Hotka
Sharding is a method of having a widely distributed database across various hardware platforms utilizing a mix of database technologies. Sharding extends partitioning to various distributed databases that don’t even have to be the same database product.
Sharding is an Oracle12.2 new feature.
I’ve also noticed that Oracle 12.2 is available for Windows (64 bit), Linux (x86-64 bit), Solaris Spark (64 bit), and Solaris x86 (64 bit) systems. Click here to review the Oracle database downloads. The documentation is only available online, though (click here for Oracle Online Documentation). I recently set up a Windows 10 VMware image and I run my Oracle 12.2 there. In my world of writing articles and training, I find VMware to be an essential tool! I can easily set up test environments now and not have to worry about how and what software needs to be installed in the images. The downside of VMware images to me is purchased software.
Oracle calls sharding extreme performance, extreme scalability, and extreme availability.
I call it a great new feature.
To me, sharding is nothing more than table partitioning across platforms and across independent databases. The platforms can be of a variety of types and operating systems and databases, depending on the remote needs of the data. Data can now be distributed to where it needs to be, but it can also be globally queried without knowing or caring where the data is.
Oracle includes sharding in their backup and recovery, including Data Guard, standby database, and hot backups. The shards can be in on-premises databases, in databases located in the cloud, and can utilize Oracle RAC instances as well, making up a hybrid computing solution protected by Oracle’s security and high availability features. You can utilize other Oracle features for shards as well, including compression, online changes, complex data types, etc.
Oracle supports 1,000 shards in this first release of sharding in Oracle126.96.36.199.
The data can be distributed for a variety of reasons, including application requirements (the extreme performance part)…the data is located close to where it is needed, such as branch banking applications where they have their customer accounts and balances. Another reason is regulations, both company mandates and in some cases government mandates. Say, the data has to be in the country where it is used. The sharded data can use low-cost hardware, perhaps hardware that the organization already has for other applications, along with remote storage devices. The high availability comes into play because if one location is unavailable, the missing shard has no effect on the other parts of the application.
Sharding utilizes a ‘shared nothing’ approach. If a SQL query is started across the entire sharded data set, the database connection and CUP that initiated the query becomes the query coordinator, collecting the necessary data from the various shards, assembling the result set, and presenting it to the application. This feature can also be referred to as ‘horizontal partitioning’, using the local hardware’s CPU, memory, and disk storage for local query processing.
A sharded database/table makes up a single logical database/table.
Sharding is part of Oracle’s partitioning feature. Sharding is really an extension of Oracle’s existing partitioning environment, including subparititioning. This might be useful if some of the shards will have a very minimal amount of data.
Oracle’s architecture supports user-defined sharding, composite sharding, and system-managed sharding.
User-defined sharding gives you (the DBA or developer) a very fine-grained approach to how the data will be distributed and accessed. User-defined sharding utilizes range or list partitioning. The downfall here is that the DBA has to monitor and balance the data access load themselves.
CREATE SHARDED TABLE emp (
empno number not null,
deptno number not null,
PRIMARY KEY pk_empno (empno))
PARTITION BY LIST (state)
(PARTITION p_midwest VALUE ‘IA’,IL’,’MN’,’OH’),
(PARTITION p_east VALUES (‘PA’,NY’,’MA’,’RI’),
(PARTITION p_south VALUES (‘TX’,FL’,’GA’,’SC’),
(PARTITION p_west VALUES (‘CA’,AZ’,’NM’,’OR’);
System-managed sharding does not require specific mapping of data but uses hot-spot data to maximize performance across the shards. Oracle sharding then automatically maintains and balances the distribution of chunks of data. System-managed sharding utilizes a consistent-hash partitioning technology.
SQL> CREATE SHARDED TABLE emp (
TABLESPACE SET SHARD_SET_1
PARTITION BY CONSISTENT HASH (empno) PARTITIONS AUTO;
Composite sharding is a combination of the above two sharding types. This type of sharding first partitions based on a range or list key then utilizes composite-hash to balance the data load across partitions.
PARTITIONSET BY LIST (state)
PARTITION BY CONSISTENT HASH (empno) PARTITIONS AUTO
(PARTITIONSET p_midwest VALUE ‘IA’,IL’,’MN’)
TABLESPACE SET t_midwest,
(PARTITIONSET p_east VALUES (‘PA’,NY’,’MA’,’RI’)
TABLESPACE SET t_east,
(PARTITIONSET p_south VALUES (‘TX’,FL’,’GA’,’SC’)
TABLESPACE SET t_south,
(PARTITIONSET p_west VALUES (‘CA’,AZ’,’NM’,’OR’)
TABLESPACE SET t_west);
Sharding Architecture Illustration from Oracle12.2 Database Administrators Guide Chapter 17.
In the above illustration, the unsharded table can be partitioned or not, but it all resides on one computing environment. A sharded database is spread across multiple computing environments, as illustrated on the right. These sharded databases can be on different hardware platforms, operating systems, and even across different database types.
Oracle Sharding Architecture from Oracle Overview Document
Components of an Oracle sharded environment include:
The Oracle documentation has nice step-by-step instructions for implementing sharding. Click here to review these deployment steps.
Sharding is a great way to distribute processing across your application that is already hosted on various computing environments and probably in various locations as well.