Navigation Bar

Sunday, 24 July 2022

Real-time video analytics with OpenVINO on Azure IoT Edge

 

In this blog post, I will go through how we can build real-time video analytics on IoT Edge devices with OpenVINO.

Before I start, let me explain what is OpenVINO and what is good for?

OpenVINO stands for Open Visual Inference and Neural network Optimization.

As its name suggests, it is used to optimize the models and allows you to host the models based on your process architecture.

There are two main steps given below:

1.       Convert your favorite model to IR format as shown below:



2.       Host your IR (intermediate representation) to the OpenVINO model server. The runtime is process architecture dependent.

With this approach, you can develop your own model with your favorite framework or download the prebuilt models from Model Zoo and host them in OpenVINO.

You can read more at https://docs.openvino.ai/latest/index.html

Now we know the OpenVINO, let’s see how we can use it for IoT Edge. We can host the OpenVINO model in a Docker container which makes it perfect for IoT Edge devices. The below diagram shows the general architecture of how OpenVINO can be used:



Now let’s see how we can host the model and how we can consume it to make inference for given video frame. For the example, I have chosen a prebuilt model called “Vehicle Detection” from Model Zoo.

Run below command to host model in a Docker container:

$ docker run --rm -d -v "path/to/your/models":/models:ro -p 9000:9000 -p 9001:9001 openvino/model_server:latest --config_path models/config.json --port 9000 --rest_port 9001

Once the model is hosted you can query the metadata of the model by navigating to the URL

http://iotedgedevice-ip:9001/v1/models/vehicle-detection-0202/versions/1/metadata and response will be as shown below:



Which shows the shape of input and output. It means our vehicle detection model is hosted.

Now let’s consume this model from the python script (which later will be converted to the IoT Edge module to process the frame).

The first step is to load the frame in the OpenCV object as shown below:





The endpoint for model prediction is at http://iotedgedevice-ip:9001/v1/models/vehicle-detection-0202/versions/1:predict

We need to submit this image as JSON data and get the result as JSON which contains a NumPy array as per the specification mentioned at https://docs.openvino.ai/latest/omz_models_model_vehicle_detection_0202.html

The below method is used to convert an image to JSON input for the model:


The code flow is shown below:



Here is the model that predicted the vehicle bounding boxes which are then drawn to the mage as shown below:



We can see the model predicted all those vehicles shown in above image.

Contact us @MusoftTech if you need assistance on a Real-time video analytics project.

That’s it for now and thanks for reading my blog.

 

 

Thursday, 2 June 2022

Exploring Databricks SQL endpoint and Databricks Dashboard

In this blog post, I will be exploring how we can use SQL Endpoint to expose Delta tables to the outer world and build a Dashboard in the Databricks environment which uses delta tables from the Silver and/or the Gold layer.

What is SQL Endpoint?

"A SQL endpoint is a computation resource that lets you run SQL commands on data objects within Databricks SQL." From Microsoft.

As you might know that now Databricks support three modes given below:

  • Data Science and Engineering
  • Machine Learning
  • SQL

To create an SQL endpoint and dashboard you have to use SQL workspace.

Follow https://docs.microsoft.com/en-us/azure/databricks/sql/admin/sql-endpoints to create an SQL endpoint.

Scenario 1: I have got sales data in the source system and want to view the summary of the sales or return in Databricks Dashboard.

Scenario 2: I have got aggregated data in Delta format and wanted to expose it to the custom application (for instance C# desktop or web application).

To cover both scenarios, I have got below is reference architecture from Databricks:


The above architecture is slightly changed as I am not using Azure Synapse Analytics as my serving layer instead I am exposing my Delta tables through SQL Endpoint and also I am not using a streaming dataset.

I am loading data from source to all layers (raw lake to delta lake (bronze, silver, and gold). The Gold layer contains the aggregated results (or summary).

If you want to create the below-aggregated data in the Gold layer, simply run the below code to a python notebook in the “Data Science and Engineering” workspace:


What are the components of the Dashboard?

The Dashboard in Databricks is composed of the following two major components:

  • .       Visualizations
  • .       Text

The Visualization component is based on the Query. So a query can have multiple visualizations which can be used in the dashboard.

The text component is where you can compose text using markdown. For instance, adding graphics and information.

The query has a schedule component that allows it to refresh the data shown in the dashboard:



To me, the Query and associated visualizations follow a publisher-subscriber model where the query is the publisher, and all visualization is a subscriber. Any changes to query results will reflect those changes to visualizations as well.



For instance, if you have a streaming source that is constantly updating your aggregated results then the query should have a scheduled component to refresh it automatically as per your schedule. 

There are lots of visualizations like Bar chart, pie chart, counter etc. When you add a visualization widget to the dashboard it asks you to select your query first and then select your visualization associated with that query.

 

Now we understood the Dashboard, let’s create a query called “Sales-summary” as shown below:



As you can see I have used a very simple query (just select * …) the reason is that the Gold layer is populated with complex queries (multiple joins and/or group by etc) when we were loading data from Bronze to Silver and Silver to Gold.

You can also see there are four visualizations (tabs in the above screenshot) associated with the “Sales-Summary” query mentioned below:

  •         Table (which is showing the results as a table)
  •        Comparing Sales vs Return (It is a Bar chart)
  •        Return Transaction (It is a counter showing the returned amount with formatting)
  •        Sales Transaction (It is a counter)

I also created two more queries to show items sold and items returned as counter visualization.

Now I have got all my queries and associated visualizations ready, I can simply create a Dashboard in the Dashboard collections named “Northwind Sales” and arranged my visualizations as shown below:



I also used the Text component to place a MusoftTech logo and text with the help of markdown.

You may be thinking I have got more returns than sales and this is because my random number generated more returns than the sale.

One of the cool things is that each component of the Dashboard shows how fresh your data is.

The Dashboard can be refreshed based on configured schedule or it can manually be refreshed as required and also it can be shared via email by subscribing the email address to the Dashboard.


For scenario 2, the Databricks SQL endpoint is used to expose the Delta tables. The Delta tables can be consumed directly from PowerBI with SQL endpoint connection details.

I am going to consume the Delta table from the C# console application which shows that it can be consumed by any .net application. To connect your C# application to the Databricks SQL endpoint you need to create System DSN or User DSN and you have to install the ODBC driver.

Below is the screenshot of my DSN configuration screen:


Where

  •  Host(s) is your Databricks SQL endpoint host URL.
  • User name is your azure ad account
  • Password is your personal access token.

You also need to set “HTTP Options” with the HTTP path for your endpoint and “SSL Option” to tick on “SSL”.

Once you have provided all settings you can test the connectivity by clicking on the “Test” button and the result will be shown below:



After testing the connection I have used this DSN in my C# console app, like in the code is given below:

and running the app would return the below result:


With this approach, we can stream the result to the PowerBI streaming dataset or Azure Stream analytics.

That’s it for now.

 

 

 

 

 

Monday, 22 October 2018

Use of Power BI PushDatasets


In PowerBI, real-time streaming can be consumed in three (please visit https://docs.microsoft.com/en-us/power-bi/service-real-time-streaming ) different ways listed below:

  •  Consuming streaming data from Azure Stream Analytic
  •  Consuming streaming data from PubNub
  •  Consuming streaming data vis PushDataset


If you have got sensors which are emitting data and you want to visualize it in real-time, you can use PowerBI in conjunction with Azure Stream Analytics to build the dashboard. But if data is less frequent and you want to have a dashboards that auto-refreshes then you can use any one of three methods. In this post I will show how Push Datasets can be used to develop a dashboard.

Below is a simple architecture for this post:














We need following components to build a complete end-to-end simple solution shown in above diagram:

  1. Data Generator (to simulate data is coming to db at every x interval). This could your source which generates data less frequently.
  2. Console App (that pushes data to PowerBI PushDataset)
  3. PowerBI Dashboard


I have created a sample code that generates some data and inserts into database. Please see below code snippet:

The console application will be leveraging PowerBI REST API programming interface for pushing data. For this reason, console app needs to authenticate/authorize with PowerBI. So you need to register your console app with Azure AD in order to get OAuth token from Azure Authorization server. Please follow https://docs.microsoft.com/en-us/power-bi/developer/walkthrough-push-data-register-app-with-azure-ad to register your app.

OAuth provides four different OAuth flows based how you want to authenticate/authorize your application with Authorization server. Please visit https://auth0.com/docs/api-auth/which-oauth-flow-to-use to know which flow is best suited for your scenario.

I will be using Client Credential Flow (an OAuth flow which can be read at https://oauth.net/2/grant-types/client-credentials/ ) as console app will be treated as trusted application and also there would not be any human interaction if any authorization popup appears it would not be able to deal with.

Below is the code to get oauth token using Microsoft.IdentityModel.Clients.ActiveDirectory version 2.29.0 of nuget package:


I treat this scenario as syncing two systems (from source to target but target is PowerBI). Most of the syncing solution, we need to maintain what we have synced so far so that next time system should pick delta of data.
For this purpose, we are using ROWVERSION datatype which is auto generated by database. Please visit https://www.mssqltips.com/sqlservertip/4545/synchronizing-sql-server-data-using-rowversion/ for how to use rowversion for syncing scenario.

To maintain what has been synced, I have created a table to keep track the last row version, console application has sent to PowerBI, against a table like shown below:












For the first time, last row version should 0x000.

I also created a stored procedure that returns delta with the help of last row version and table name. Below is the stored procedure code:


Now, we got the data (delta amount), we need to send it to PushDataset in PowerBI. Every PushDataset has a unique id, and data needs to be sent to correct id.

I have created a dataset called “DeviceTelemetry” using REST API. To find the dataset id, you need to call the Power BI REST API like shown below:










And result is shown like below:









Now we got the Dataset Id as GUID, we need to use it to send data to Power BI. We will use PowerBI REST API to do this. You can do it in your console app to fetch all the datasets and grab the id for which you want to send to. For demonstration purpose I have shown you how you achieve it.

Now, the console app can you use dataset id and keep pushing data to it. Again you can leverage Power BI REST API to send data into batches or one by one. Below is a snapshot how I am sending data:









Here is the code that wraps to add rows to PowerBI PushDatasets leveraging api wrapper:


Here is the code for PowerBI Rest Api wrapped around a nice method:


Once your console app start sending data, you can go to PowerBI.com and start creating reports and dashboard like shown below:




Note that the dataset is listed as Push dataset.
Click on red boxed area (create report link) to create report as I created reports and composed them into one dashboard shown below:














That’s it so far.

Monday, 3 September 2018

Setting up an environment for Monte Carlo Simulation in Docker

In this blog I will walk you through to install JAGS inside a docker container. You might be thinking why I have chosen docker for this. The answer is very simple, when I was install JAGS on my personal computer, the OS did not recognise as a trusted software so I did not take a risk of installing on my personal computer.

If you want to play with JAGS and you don't want to install it in your computer, then Docker is the best option as I can play with the package/software and then I can delete the container.

Now you got the idea why I have chosen Docker container for this. Let's proceed to setup an environment for Monte Carlo simulation. Make sure you have got Docker installed. Follow below steps to setup the environment:

1. Open Command prompt with administrative privilege and issue follow command:
$ docker run --name mybox -t -p 8004:8004 opencpu/rstudio

Above command will download the opencpu/rstudio image locally.

2. Issue below command to start/run the container:
$ docker container start mybox

3. Open browser in your host computer and point http://localhost:8004/rstudio/ and provide opencpu as username and password like shown below:






4. Now, you need to connect to container, by issuing below command in your command prompt, to install JAGS - a tool that generate Gibbs Sampling:

$ docker exec -I -t mybox /bin/bash

You will be taken to terminal of container like shown below:



5. Issue below commands to terminal of container:
$ sudo apt-get update
$ sudo apt-get install jags

5. Now go the browser (you opened in step 3) and install "rjags" and "runjags" packages like shown below and you are done. Now you use this environment to create a simulation using Monte Carlo.


That's it so far. Stay tuned.

Wednesday, 9 May 2018

Azure IoT and IoT Edge - Part 2 (Building a Machine Learning model using generated data)


This blog is part 2 of Azure IoT Edge series. Please see http://blog.mmasood.com/2018/03/azure-iot-and-iot-edge-part-1.html if you have not read part 1.

In this blog I will cover the how we can build a logistic regression model in R using the data the captured in tables storage via IoT Hub.

We can run the simulated devices (all three at once) and wait for data to be generated and save it to table storage. But for the simplicity I have created an R script to generate the data so that I can build the model and deploy it to IoT Edge and hence we can leverage the this Edge device to apply Machine Learning model on the data it is receiving from the downstream devices.

I am using exactly the same minimum temperatures, pressure and humidity as our simulated device was using. Please see http://blog.mmasood.com/2018/03/azure-iot-and-iot-edge-part-1.html here are few lines of R script.




Let’s plot the data and see how it looks like. There are only 3 fields/feature so I will plot  Temperature vs Pressure using ggplot2:







Output of above R commands:

















We can see as the temperature and pressure increases the device is becoming bad or getting away from the good devices. For the simplicity the simulation generates higher number for temperature and pressure if device is flagged as defective.

Now let’s build a simple logistic regression model to find out the probability of device being defective.
I am using caret package for building model. Here is the code to split the training and test data:








The proportion of good vs bad for original data is: 66% (good)/33% (bad). So we make sure we don’t have skewness in the data.







Now applying glm function to data using R script shown below:







Here is the summary of the model:

















We can see from above output, the pressure is not statistically significant. The idea of this post is to have a model that we will be using in IoT Edge device.

Let’s test this model on test data set and find out the best threshold to separate the bad from good. I could have used cross-validation to find the best threshold. Use cross validation set to fine tune the parameters (eg. threshold or lambda if ridge regression is used etc).

Below is the confusion matrix when I use threshold 0.5:







Let’s construct a data frame which contains actual, predicted and calculated probability using below code:



And view first and last 5 records:

















The higher (or closer to 1) the probability the device is good.

With threshold 0.65, the confusion matrix look like below:








So we can see from above two confusion matrix, the best threshold should be 0.50 as it miss-classifies only 4 instances but when 0.65 is used it miss-classifies 5 instances.

The final model is given below:










So far I have got the model built. I will use this model in IoT Edge module which will make Edge intelligent, which I will post soon so stay tuned and happy IoTing J

Tuesday, 13 March 2018

Azure IoT and IoT Edge (Part 1)

In this blog post I will walk you through how and IoT device (for IoT Hub and IoT Edge gateway) can be created.

The simulated device will generate telemetry data that will be used by IoT Edge Module (e.g. Clustering) to find out which device need to be replace or restart it etc.

I will be posting few more blogs to achieve below:




We can see from above diagram, the main components are:
  •            IoT Hub
  •          Configuration of IoT Edge device as gateway
  •          IoT Edge Module
  •          Downstream devices


I will develop a Machine Learning model (k-means clustering) in R and will leverage in MachineLearningModel Edge Module to find which device need to be replaced or need to restart etc.

For the simplicity, the downstream device will generate following telemetry data:
  • Temperature
  • Humidity
  • Pressure


Let’s develop a downstream device that generates above random data. Follow https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-csharp-csharp-getstarted to setup you IoT Hub. I have got my IoT Hub setup now, I am creating a .Net console app that acts as device which generates some random data.

Here are some code snippets:

Here is the Main method:
Here is an example batch file to run as device1:


Now you need to register 3 devices in Azure Portal in IoT Hub here are the steps:
  •       Navigate to Azure Portal then your IoT Hub
  •       Navigate to IoT Devices
  •       Click on Add button and fill the details for the device like shown below:





  •       Now, go to the device you just created and copy the primary connection string to respective .bat file.
  •       Repeat for 3 times to create 3 devices.


Once you have created three devices, start running device1.bat, device2.bat and device3.bat. It will start sending data to IoT Hub like shown below:



And your IoT Hub will show number of messages received like shown below:




So far we have created/simulated 3 devices that started sending temperature and other data to IoT Hub. These devices will be used to send the data to IoT Edge gateway (by appending GatewayHostName=<your-gateway-host-name> to device connection string) and I will explain in next blog so stay tuned.

Thursday, 18 May 2017

Exploring SparkR using Databricks environment

In this exploration I will share what I have learnt so far R with Spark. Spark, as you all know, is a distributed computing framework. It allows you to program in Scala/python/Java and now in R for performing distributed computation.

I implemented gradient descent in Hadoop to understand how we are going to parallelize gradient computation. Please have a read about it at http://blog.mmasood.com/2016/08/implementing-gradient-decent-algorithm.html for understanding mathematics behind it.

Now I am implementing same gradient descent algorithm in SparkR using Databricks community edition. You might be wondering why I am implementing it again J

I always start with the knowledge I have right now then I use those knowledge to learn new language. For this instance Gradient Descent algorithm bets fit here. Also we learn couple of things while implementing GD like:

  •      How we break big loop into cluster of computers
  •      How we are transforming data in parallel
  •      How we share/send common variables/values to worker nodes
  •      How we are aggregating results from worker nodes.
  •      Finally combining those results


If your algorithm has to iterate over millions or more records then it is worth parallelizing it. Any computation you do, you will almost be doing same sort of things as I outlined above. I can use above high-level tasks mentioned above to build a complex Machine Learning model like ensembling models or model stacking etc.

Please write in comments if you have other items than I have listed above
J to learn from you as well.


Now I have talked too much, let's do some coding J

You need to sign-up at https://databricks.com/ first. Once you have done it you can follow it.
Now, navigate to databricks community edition home page like shown below:





First you need to create a cluster first, click on Clusters > Create Cluster to create a cluster. Use Spark 2.1 (Auto-updating, Scala 2.10)

Next, upload your data to cluster. To do this, click on Tables > Create Table you will be presented like below screen:



Click on “Drop file or click here to upload” section and upload your file. Once you have uploaded the file it will show you the path. Note that path to somewhere.



Now, create a notebook by navigating to Workspace and click on dropdown and select Create > Notebook like shown below:




And provide the name for the notebook. I called “SparkR-GradientDescent”



Make sure you have selected R as language. Click on Create button to create the notebook. Now navigate to your newly created notebook and start writing R code J

We now need to load the data. Remember that we are running R code in Spark so we need to use read.df (from SparkR package) to load data into a SparkDataFrame (not data.frame).



Note that all above methods are similar but they are from SparkR package. All these methods understand SparkDataFrame object. Let's run below code to see the structure of the object:





Now run below code:



You can see both are two different object.

Now, I define a method that calculates partial gradient so that we can compute it on worker nodes and get the result back to driver program.

Here is the code:
 

Now, we write code that initiate worker nodes to calculate partial gradients on each partition, collect those calculated data and update our thetas using below codes:



Here is the result of above code:



Few things to note in above code:

  1. We are caching (using cache(data)) data in memory so that in each iteration Spark does not need to load data from storage.
  2. We are defining schema because dapply needs to transform an r data.frame object to SparkDataFrame with provide schema
  3. We are performing some calculations (partial gradient) on each partition using dapply.  So we are telling spark to run given function on each partition residing on worker nodes.
  4. Each worker nodes are getting a shared variable/object. In Spark-scala we had to broadcast the variable.
  5. We are collecting data from worker nodes as r data.frame object using collect method.
  6. Updating theta and that will be available to each worker in next iteration.

You can view available functions in SparkR package at https://docs.databricks.com/spark/latest/sparkr/index.html
Finally we can validate our estimated coefficient using lm package in R (running locally on my machine)





We can validate our calculation on sample data so that we can debug it easily. We can see that estimated coefficients are close to what lm model gave me. If we increase number of iterations we can get thetas close to it.

I hope that this post will help you understanding SparkR. Please provide your feedback if I missed anything.

That’s it for now. Enjoy coding :)