This blog will discuss about the essentials of Azure Data Warehouse database queries. First of all you have to create a ADW database in Azure which I already posted in Creating Azure Data Warehouse post. Then you can connect to the ADW database using the user credentials given when you create the DB. But you will end up with connectivity issue as given below.
- Change Firewall settings
- Connecting to ADW instance using SSMS
- Create a user - first you have to create a login and then create a user for that login as given below
--Select master db to create a login
CREATE LOGIN <yourusername> WITH PASSWORD = 'Password';
--Select ADW db to create the user for the login
CREATE USER <yourusername> FOR LOGIN <yourusername>;
- Change user RC - once you have user created you can change resource class that user assigned and the db role as well. Database role is something for authorization but RC is something classified how much resource such as memory, compute power is assigned for the user.
--To
assign RC
EXEC sp_addrolemember 'xlargerc', '<yourusername>';
--To assign db role
EXEC sp_addrolemember 'db_owner', '<yourusername>';
--Check user
SELECT *
FROM sys.database_principals
WHERE name LIKE '%yourusername%'
--AND type_desc = 'db_owner';
EXEC sp_addrolemember 'xlargerc', '<yourusername>';
--To assign db role
EXEC sp_addrolemember 'db_owner', '<yourusername>';
--Check user
SELECT *
FROM sys.database_principals
WHERE name LIKE '%yourusername%'
--AND type_desc = 'db_owner';
- Create table and insert sample data into the table
CREATE TABLE dbo.MyTable_8000264
(
Id
INT IDENTITY (0,1)
,FirstName VARCHAR(100)
,LastName VARCHAR(100)
,DOB DATE
)
--Try below insert query
INSERT INTO dbo.MyTable_8000264
VALUES ('Jhon','Smith','1990-10-19')
,('Kane','Loyd','1991-02-14')
,('Julia','Robertson','1989-04-22')
,('Ane','Jakob','1995-03-04')
,('Ane','Peterson','1993-05-08')
,('Kane','Loyd','1991-02-14')
,('Julia','Robertson','1989-04-22')
,('Ane','Jakob','1995-03-04')
,('Ane','Peterson','1993-05-08')
you will see that you cannot records like that but, try below
--Try below insert query
INSERT INTO dbo.MyTable_8000264
VALUES ('Jhon','Smith','1990-10-19')
GO
INSERT INTO dbo.MyTable_8000264
VALUES ('Kane','Loyd','1991-02-14')
GO
INSERT INTO dbo.MyTable_8000264
VALUES ('Julia','Robertson','1989-04-22')
GO
INSERT INTO dbo.MyTable_8000264
VALUES ('Ane','Jakob','1995-03-04')
GO
INSERT INTO dbo.MyTable_8000264
VALUES ('Ane','Peterson','1993-05-08')
GO
INSERT INTO dbo.MyTable_8000264
VALUES ('Jhon','Smith','1990-10-19')
GO
INSERT INTO dbo.MyTable_8000264
VALUES ('Kane','Loyd','1991-02-14')
GO
INSERT INTO dbo.MyTable_8000264
VALUES ('Julia','Robertson','1989-04-22')
GO
INSERT INTO dbo.MyTable_8000264
VALUES ('Ane','Jakob','1995-03-04')
GO
INSERT INTO dbo.MyTable_8000264
VALUES ('Ane','Peterson','1993-05-08')
GO
- Table distributes into 60 distributions - in ADW each table is distributed in to 60 distributions and data is spread among these. This is like hadoop file system. You can see the data distribution using below query.
-- Find data skew for a distributed
table
DBCC PDW_SHOWSPACEUSED('dbo.MyTable_8000264');
- Create table with different distributions and insert into distributed table
- Round robin – this is the default if you don't mention it specifically. Data will be distributed randomly with in 60 distributions.
CREATE TABLE dbo.MyTable_8000264_rr
(
Id
INT IDENTITY (0,1)
,FirstName VARCHAR(100)
,LastName VARCHAR(100)
,DOB DATE
)
--OR automatically it creates round robin
as below
CREATE TABLE [dbo].[MyTable_8000264_rr]
(
[Id]
[int] IDENTITY(0,1) NOT NULL,
[FirstName]
[varchar](100) NULL,
[LastName]
[varchar](100) NULL,
[DOB]
[date] NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
GO
-- Find data
skew for a distributed table
DBCC
PDW_SHOWSPACEUSED('dbo.MyTable_8000264_rr');
- Hash - data will be distributed based on hash key given when creating the table. usually to select the hash key, the distinct data count should be more than 60. otherwise data will be distributed un-evenly.
CREATE TABLE dbo.MyTable_8000264_h
(
Id INT IDENTITY (0,1)
,FirstName VARCHAR(100)
,LastName VARCHAR(100)
,DOB DATE
)
WITH
(
DISTRIBUTION = HASH(FirstName),
HEAP
)
GO
-- Find data
skew for a distributed table
DBCC
PDW_SHOWSPACEUSED('dbo.MyTable_8000264_h');
- Replicate - this will duplicate the data in each distribution. This option is best for master data tables and small scale tables.
CREATE TABLE dbo.MyTable_8000264_r
(
Id
INT IDENTITY (0,1)
,FirstName VARCHAR(100)
,LastName VARCHAR(100)
,DOB DATE
)
WITH
(
DISTRIBUTION = REPLICATE,
HEAP
)
GO
-- Find data
skew for a distributed table
DBCC
PDW_SHOWSPACEUSED('dbo.MyTable_8000264_r');
But we cannot see that duplicate in the data using above. I also don't know why it is so.
- Identity insert
GO
INSERT INTO dbo.MyTable_8000264 (Id,FirstName,LastName,DOB)
VALUES (123,'Jhon','Smith','1990-10-19')
GO
SELECT * FROM dbo.MyTable_8000264
-- Do the same insert with set identity insert off for a hash distributed table
SET IDENTITY_INSERT dbo.MyTable_8000264 OFF
GO
INSERT INTO dbo.MyTable_8000264 (FirstName,LastName,DOB)
VALUES ('Jhon','Smith','1990-10-19')
GO
SELECT * FROM dbo.MyTable_8000264
- Concurrency and slot usage - this query will give you the statistics of the query execution with concurrency, and slots used by the execution and so on.
SELECT
SUM(CASE WHEN r.[status] = 'Running' THEN 1 ELSE 0 END) [running_queries],
SUM(CASE WHEN r.[status] = 'Running' THEN rw.concurrency_slots_used ELSE 0 END) [running_queries_slots],
SUM(CASE WHEN r.[status] = 'Suspended' THEN 1 ELSE 0 END) [queued_queries],
SUM(CASE WHEN rw.[state] = 'Queued' THEN rw.concurrency_slots_used ELSE 0 END) [queued_queries_slots]
FROM
sys.dm_pdw_exec_requests r
JOIN
sys.dm_pdw_resource_waits rw ON rw.request_id = r.request_id
WHERE
(
(r.[status] = 'Running' AND r.resource_class IS NOT NULL ) OR r.[status] ='Suspended' )
AND
rw.[type] = 'UserConcurrencyResourceType';
SELECT *
FROM sys.dm_pdw_exec_requests AS r
WHERE STATUS != 'Completed'
AND STATUS != 'Failed'
AND STATUS != 'Cancelled';- ADW scaling - the following query can be used to scale ADW in and out with given DWU at query level which can be done in portal itself also.
ALTER DATABASE analyticsbidwdev MODIFY (SERVICE_OBJECTIVE = 'DW200c');
WHILE
(
SELECT TOP 1 state_desc
FROM
sys.dm_operation_status
WHERE
1=1
AND
resource_type_desc = 'Database'
AND
major_resource_id = 'analyticsbidwdev'
AND
operation = 'ALTER DATABASE'
ORDER
BY
start_time DESC
) = 'IN_PROGRESS'
BEGIN
RAISERROR('Scale operation in
progress',0,0) WITH NOWAIT;
WAITFOR DELAY '00:00:05';
END
PRINT 'Complete';- #Table scope - this will support you to see the difference of the # table scope between normal SQL db and ADW. Do the following query execution in both SQL db and ADW db.
CREATE PROC dbo.Proc_1
AS
BEGIN
SELECT 123 AS Id
INTO #TEST
END
GO
CREATE PROC dbo.Proc_2
AS
BEGIN
SELECT 456 AS Id
INTO #TEST
END
GO
CREATE PROC dbo.Proc_Master
AS
BEGIN
EXEC dbo.Proc_1
EXEC dbo.Proc_2
END
GO
--Run main proc
EXEC dbo.Proc_Master
--Drop procs
DROP PROCEDURE dbo.Proc_1
DROP PROCEDURE dbo.Proc_2
DROP PROCEDURE dbo.Proc_Master
Usually the # table is scoped in to the session in both SQL db and ADW db. But the scope of inner stored procedure is bit different. In SQL db, if you execute SP with in another SP, two SPs will execute in two different sessions. But in ADW, second SP will execute in the same session of the first SP. That's why when you have the same # table in second SP gets an error saying # table already exists.
Usually the # table is scoped in to the session in both SQL db and ADW db. But the scope of inner stored procedure is bit different. In SQL db, if you execute SP with in another SP, two SPs will execute in two different sessions. But in ADW, second SP will execute in the same session of the first SP. That's why when you have the same # table in second SP gets an error saying # table already exists.
- CTAS Table - means Create Table As Select which will create a table using the output of the select statement and given index and distribution option.
CREATE TABLE dbo.MyTable_8000264_CTAS
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = ROUND_ROBIN
)
AS SELECT *
FROM dbo.MyTable_8000264;
That is about the querying ADW db for now and have more practice and find the optimal way to write queries in ADW with resource class and concurrency.
That is about the querying ADW db for now and have more practice and find the optimal way to write queries in ADW with resource class and concurrency.
Comments
Post a Comment