Getting started with Azure Synapse data explorer
Published Nov 04 2021 08:09 AM 10.5K Views
Microsoft

Telemetry data is everywhere – we find it in IoT sensors, app/web/infra logs, security logs, metrics, click streams, time series, and more – it drives advances in the connected devices we find in our homes, smart buildings, smart factories, and in devices that consumers and businesses rely upon. It’s generating terabytes and petabytes of data every day that needs to be analyzed in near real-time for improving operations and innovate.

 

At Ignite, we announced the public preview of Azure Synapse data explorer that makes it possible to query huge amounts of structured, semi-structured, and free-text telemetry and time-series data. The following are some of the key capabilities that make this possible:

  • Powerful distributed query engine that indexes all data including free text and semi-structured data. The data is automatically compressed, indexed, auto-optimized, and cached on local SSDs and persisted on storage. Compute and storage are decoupled that gives you full elasticity to auto scale in/out without a downtime.
  • Intuitive Kusto Query Language (KQL) that is highly optimized for exploring raw telemetry and time series data using Synapse data explore’s best-in-class text indexing for efficient free-text search, regex, and parsing on traces\text data.
  • Comprehensive JSON parsing capabilities for querying semi-structured data including arrays and nested structure.
  • Native, advanced time series support for creation, manipulation, and analysis of multiple time series with in-engine Python and R execution support for model scoring.

These capabilities make Azure Synapse data explorer a perfect fit to ingest, store, query, and visualize telemetry data such as logs, events, time series, and any other append-only telemetry data. If you are interested in learning the internals of the Data Explorer engine, see this whitepaper.

 

The following example will help you learn how to get started with provisioning, ingesting sample data, and running your first query.

 

1. Prerequisites

  1. Provision an Azure Synapse Analytics workspace in your Azure subscription. Click here for instructions.
  2. Ensure you have Synapse Administrator role to your workspace. Synapse RBAC roles are documented here.

2. Create a Data Explorer pool

  1. Go to https://web.azuresynapse.net and sign in to your workspace.
  2. Go to Manage tab and select New to create a new Data Explorer pool.
    CreatePool.png
  3. Follow the wizard to create the new Data Explorer pool. Under Additional settings, you can enable or disable the following settings per your requirements:
    • Autoscale - Optimized autoscaling is the recommended autoscale method. This method optimizes cluster performance and costs. If the cluster approaches a state of under-utilization, it will be scaled in. This action lowers costs but keeps performance level. If the cluster approaches a state of over-utilization, it will be scaled out to maintain optimal performance.
    • Streaming ingestion – Enabling streaming will configure internal row store for your stream data using APIs, IoT Hub, or EventHub to achieve under 10 sec latencies.
    • Enable purge – Enables the ability to delete data in a way that can’t be recovered (compliant with requirements such as GDPR).
    • Note: It may take 10-15 min to create the pool, once it’s created you will be able to see the pool with its status.
  4. Click on the pool name to view the properties. Make a note of the query and data ingestion endpoints. You will need them for accessing the pool via other Azure Data Explorer tools and SDKs.

3. Create a database, table, and ingest sample data

  1. Go to the Data tab and select New to create a new Data Explorer database.
    CreateDB.png
  2. Select the pool, enter the database name, and carefully choose the retention and cache period options based on your business requirements.
    Note: All data beyond the default retention period will be deleted by the system.
  3. Expand the newly created database, select the action menu (three dots), and select New KQL script.
  4. Run the following commands (one at a time) to create a sample table, ingest sample data, and query.

.create table StormEvents (StartTime: datetime, EndTime: datetime, EpisodeId: int, EventId: int, State: string, EventType: string, InjuriesDirect: int, InjuriesIndirect: int, DeathsDirect: int, DeathsIndirect: int, DamageProperty: int, DamageCrops: int, Source: string, BeginLocation: string, EndLocation: string, BeginLat: real, BeginLon: real, EndLat: real, EndLon: real, EpisodeNarrative: string, EventNarrative: string, StormSummary: dynamic)

 

.ingest into table StormEvents 'https://kustosamplefiles.blob.core.windows.net/samples/StormEvents.csv' with (ignoreFirstRecord=true)

 

StormEvents
| sort by StartTime desc
| take 10

 

Congratulations on setting up your first Data Explorer pool and running a sample query. Please refer this tutorial for information on running additional sample queries and learning KQL.

 

4. Ingest your data

Now let’s try to ingest your data. There are many ways to ingest data into Azure Synapse data explorer. For this blog, lets focus on ingesting a file from your local machine. For this, we will use the one-click ingestion method provided by the Data Explorer web interface.

 

  1. Go to https://dataexplorer.azure.com/oneclick/ingest?sourceType=file and login with your Azure Active Directory credentials.
  2. Select "Add cluster connection".
  3. Enter the URL of your Data Explorer pool (the query endpoint you made a note of earlier) in the following format
                       https://{data explorer pool name}.{synapse workspace name}.kusto.azuresynapse.net
  4. Follow the wizard to create a new table and ingest your sample file.

Other ways of ingesting data are documented here.


5. Visualize your data

Now that you have your data in your Data Explorer pool, you can create a Power BI dashboard using the Azure Data Explorer Power BI connector. Instructions are documented here. Ensure you use the cluster URL in the following format (the query endpoint you made a note of earlier):
               https://{data explorer pool name}.{synapse workspace name}.kusto.azuresynapse.net

 

Other supported visualization tools include: ADX Dashboards, Grafana, Redash, Kibana, and Tableau



Go ahead, try out this tutorial yourself today!



Additional references:

Co-Authors
Version history
Last update:
‎Nov 04 2021 06:33 AM
Updated by: