Azure SQL Data Warehouse
Single Database
Usually SQL data warehouse contains more than one database to distinguish each layer such as staging, ods, audit/error and data mart. But in ADW it is a single database which support all layers separated by schema instead of DB. To get maximum performance gain, whole data warehouse objects should contains with in a single database even though it supports to create multiple data warehouse databases in the same instance.
Cost Effective
ADW instance can be scale up and down on demand. This will help user to manage the cost because costing will be calculated based on DWU and service running time (rate per hour). When data manipulation is started user can scale up the service to upper tier which has higher computation power and maximum parallelism to complete the ELT/ETL process as mush as faster. Then scale down to usual tier with lower compute power or pause it if it is not required to be up and running once analytics cube is processed. All these operation can be automated with in the Azure.
Massively Parallel Processing
Azure SQL Data Warehouse (so called ADW) is Azure PaaS service which is specifically designed for data warehousing solutions. This is not just a SQL database as Azure SQL PaaS. It supports MPP stands for Massively Parallel Processing. This is like a Apache hadoop file system. Having a control node and multiple compute nodes to process user query in parallel with distributed data in 60 distributions per database. The query processing and concurrency is handled by resource classes and concurrency slots in each resources classes. The number of concurrent queries and concurrency slots are vary by the Data Warehouse Unit selected in the service.
Distribution
How data of a table need to be distributed in the data warehouse is defined when creating the table. Any way there are 60 parallel distributions as explained above. Data can be distributed among those 60 based on defined distribution method such as Hash, Round Robin and Replicate.
- Hash Distribution
- Round Robin
- Replicate
Data Warehouse Unit
Unit of measurements to measure ADW computation/processing power or also known as ADW performance capacity. This defines the memory availability, size of database, number of compute nodes, number of distributions per compute nodes, number of concurrent queries, number of concurrences for each resource classes. This can be adjusted in the portal and it will take few minutes to get reflected the changes into ADW instance. You can read more about DWU and relevant capacity in below Microsoft docs.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/memory-and-concurrency-limits
Workload Management
In ADW, some query like data transformation needs more processing power where as some query needs more parallelism like data loads into ADW. In order to do this in a optimal way you need to know about ADW workload management. This is not a simple task as this need more hands on and optimize based on the experience and following best practices to get maximum query performance from ADW. You can read more on this in below article.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/resource-classes-for-workload-management
Concurrency
Concurrency is nothing but the number of parallel query execution for resource class based on selected DWU. If the prioritized query is running on ADW with more concurrency slots, the rest of low prioritized queries will wait in the queue to be executed. Concurrency slot is the allocated memory slot use to execute query based on logged in user assigned resource class.
Resource Class
Resource Class (RC) is the pre-defined cloud resource allocation for query execution on ADW. This is like a ADW role in terms of resources such as I/O, memory and processor. Resource class defines the number of concurrency slot to be allocated to query execution. There are two types of RC as dynamic and static.
In dynamic RC there are four classes as small, medium, large and extra large. By it name itself giving the definition for the RC. Dynamic RC change the allocated resources when ADW scale in out. But Static does not change it unless few exceptions in staticrc 50,60,70,80 RCs. When you create a user login and db user, you have to assign a resource class either dynamic or static. By default it is dynamic smallrc. When you login by the user and execute some query on ADW, that query will utilize ADW resources that allocated to user assigned RC.
Read Concurrency Maximums get broader idea of RC resource allocation.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/memory-and-concurrency-limits
Workload Management
In ADW, some query like data transformation needs more processing power where as some query needs more parallelism like data loads into ADW. In order to do this in a optimal way you need to know about ADW workload management. This is not a simple task as this need more hands on and optimize based on the experience and following best practices to get maximum query performance from ADW. You can read more on this in below article.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/resource-classes-for-workload-management
Concurrency
Concurrency is nothing but the number of parallel query execution for resource class based on selected DWU. If the prioritized query is running on ADW with more concurrency slots, the rest of low prioritized queries will wait in the queue to be executed. Concurrency slot is the allocated memory slot use to execute query based on logged in user assigned resource class.
Resource Class
Resource Class (RC) is the pre-defined cloud resource allocation for query execution on ADW. This is like a ADW role in terms of resources such as I/O, memory and processor. Resource class defines the number of concurrency slot to be allocated to query execution. There are two types of RC as dynamic and static.
In dynamic RC there are four classes as small, medium, large and extra large. By it name itself giving the definition for the RC. Dynamic RC change the allocated resources when ADW scale in out. But Static does not change it unless few exceptions in staticrc 50,60,70,80 RCs. When you create a user login and db user, you have to assign a resource class either dynamic or static. By default it is dynamic smallrc. When you login by the user and execute some query on ADW, that query will utilize ADW resources that allocated to user assigned RC.
Read Concurrency Maximums get broader idea of RC resource allocation.
Comments
Post a Comment