Azure for Data Warehouse and Business Intelligence


The purpose of this blog is, mainly discuss on Azure resources or services which are engage with data warehousing and business intelligence. So there are many services Microsoft provided to do BI works. Let's look at each services in brief.

Computing

  • Azure VMs
Azure VM is the cloud service for Virtual Machine. You can have different type of VMs by Microsoft Azure. They mainly provide VM with their own Windows OS. But there are some open source OS such as Obuntu, Red Hat, SUSE, Sent OS and Debain. VM can be created with pre-defined set of hardware and software resources. And cost will be based on selected package and selected pricing mode whether Pay as you Go or Reserved. Any time you can scale out the resources. These VMs can be added to the on-premises network by using a Virtual Network (Vnet).


Storage

  • Blob Storage
Blob storage is a primary storage service provided by Microsoft Azure  for data store in different formats such as Files, Tables, Queues and Blobs. Storage access authentication is by the Access Key. If anyone knows the storage account details with a key, that storage can be accessed from internet. This is one of the drawback of this service but there is a way to regenerate the key to mitigate that security risk. Costing is mainly based on uploads and downloads. There is  a client tool called Storage Explorer to manage data in storage account easily from your own PC. That is possible to do the same using web browser.
  • Data Lake Store
Data Lake Store is the enhanced service to cater data store in cloud.There are two versions of data lake store as Gen1 and Gen2 with some enhanced feature in Gen2. This service also facilitate any type of files to store there. Can be accessed using Storage Explorer but there is no way to use this through web browser as it does not support to upload/down load files using browser like blob storage service. The background framework of data lake is blob storage but with multi layer access control. That means you can grant access at the folder level unlike blob storage which supports only at top level  or container level access. And this is tightly integrated with Azure Active Directory. Hence in data lake store gen 2 has more security than gen 1 and blob store.

Analytics

  • Analysis Service
This is the cloud version of SQL Server Analysis Services (SSAS) but supports only Tabular model. This does not support for Multi Dimensional Model of analysis service. There are three pricing categories in AAS like developer, basic and standards. Each category has tiers which contain the memory available and QPU stands for Query Processing Unit which use to measure analysis service processing power. More QPU means that has more processing power and can run multiple queries in parallel. Tabular model in AAS is always in memory. Costing is based on tier which AAS running and number of hours running. Any time you can scale in and scale out. But it supports if you cannot go further down tiers if you creates the AAS in higher category like standard.
  • Data Bricks
Data bricks is an analytical service newly introduced in Azure platform which is based on Apache Spark. Apache Spark is an open source project. Data bricks use to ingest significant amount of data. Hence this is a new platform for big data analysis and machine learning. Python, Scala, R, and SQL languages are used in Data Bricks analysis code and the place holder for those coding snippet called note books. Another component called Jobs are running and ingesting data like ETL. You can see more about Data Bricks in this blog.

Data Store

  • SQL Server
SQL Server is PaaS service for data base management in Azure world. This  service can be created as a single instance or managed instance which support for vertically and horizontally scalable. That means when you want to create multiple databases in the same service you need to create managed instance. SQL server is nothing but similar data base as on-premises which can be connected using SSMS.
  • SQL Data Warehouse
Azure Data Warehouse (ADW) is specifically designed for data warehousing solution with massively parallel processing. This is not just SQL database. When you create ADW  instance, you will have SQL server with ADW database and master database. You can have multiple SQL database and ADW database with in the same instance. But preferred to have one ADW database per instance to get max performance. ADW designed utilize maximum resources by proper workload management with resource classes. This service also can be scaled in and out on demand. Can be paused if data manipulation is done or keep it running in lover DWU stands for Data Warehouse Unit which measures the processing power of ADW.  DWU defines the number of concurrency and concurrency slots available for each resource class. Also it defines the memory available for ADW. There were two generation of ADW and now only Gen-2 is exists and no Gen-1 is available anymore. Pricing of ADW is based on the DWU and number of hours running. Azure Data Warehouse service is vast subject which cannot be discussed in single paragraph. Let's discuss each ADW functionality in separate post.

Integration

  • Data Factory
Azure Data Factory (ADF) is the service that privilege for the integration between different data sources. There is a misunderstanding that ADF is the cloud version of SQL Server Integration Service (SSIS). But SSIS is much more advanced and ADF is simple and capable of integrating data between source and destination using hundreds of connectors in build for on-premises as well as cloud. These connectors are used to inter connect Databases, File systems, Web services and other cloud services together. Salesforce, Oracle Service Cloud are some of the cloud connectors where as SQL Server and Microsoft Access are the on-premises connectors. ADF has basic SSIS components which are called activity and activities are executing in pipeline which is similar to the SSIS package. Pipeline can be scheduled in given time line. These schedules called triggers. Unlike SSIS, in ADF you cannot do complex transformations. Pricing mechanism is little complicated in this service than the other Azure services as it calculates based on different activity execution and execution time as well as data transfers. Not only that but also if there are pipelines created but not executing for a month, there is a cost per in active pipeline 😉.
  • Logic Apps
This  service also a part of integration services but little different than the ADF. Logic apps contains the workflow system and can be executed only sequentially. And there are some limitations in both ADF and Logic Apps with respect to the data load and execution time out. Logic Apps also has it's own connectors to connect different source and destinations. Using logic apps, you can implement the integration workflow with conditional logic. There are many built in functions which can be used for many logic implementation and data transformation with in the workflow. Workflow creation/modification can be done using control panel through the web browser. There is a Visual Studio project template also for Logic Apps workflow implementation. Cost is based on what you use. Logic Apps is a most popular cloud integration service in industry.

Server less computing


  • Functions
  • Durable Functions
  • Web Apps

Comments