Call Center Analytics — No Code to Process Speech and Convert to Text and get insights
Published Nov 13 2021 06:33 AM 5,993 Views

Using Azure Cognitive Services Speech to Text and Logic apps

No Code — Workflow style

please use this as reference, as this might not be your exact use case. But you can see how AI can help drive insights and automate large audio dataset

Pre requisite

  • Azure Account
  • Azure Storage account
  • Azure Cognitive Services
  • Azure Logic apps
  • Get connection string for storage
  • Get the primary key to be used as subcription key for cognitive services
  • Audio file should be wav format
  • Audio file cannot be too big
  • Audio time 10 min

Logic apps

  • First create a trigger from Blob
BalamuruganBalakreshnan_0-1635600449968.jpeg

 

BalamuruganBalakreshnan_1-1635600449965.jpeg

 

  • Create a connection string using blob connection string
BalamuruganBalakreshnan_2-1635600449947.jpeg

 

BalamuruganBalakreshnan_3-1635600450019.jpeg

 

  • Now bring “Reads Blob Content from Azure Storage”
BalamuruganBalakreshnan_4-1635600449882.jpeg

 

BalamuruganBalakreshnan_5-1635600450047.jpeg

 

  • Container name: audioinput
  • Choose dynamics and select the blob name as above picture
  • Bring HTTP action
  • Here we need to call the speech to text and pass the parameter
  • Accept: application/json;text/xml
  • Content-type: audio/wav; codecs=audio/pcm; samplerate=16000
  • Expect: 100-continue
  • Ocp-Apim-Subscription-Key: xxxx-xxxxxx-xxxxxx-xxxx
  • Transfer-Encoding: chunked
BalamuruganBalakreshnan_6-1635600450051.jpeg

 

BalamuruganBalakreshnan_7-1635600450033.jpeg

 

  • For Body Choose the read blob content
  • This should pass the audio binary content to cognitive service api
  • Now lets parseJSON the api output
  • Now select the body from http output
  • Provide the schema as
{
"properties": {
"Duration": {
"type": "integer"
},
"NBest": {
"items": {
"properties": {
"Confidence": {
"type": "number"
},
"Display": {
"type": "string"
},
"ITN": {
"type": "string"
},
"Lexical": {
"type": "string"
},
"MaskedITN": {
"type": "string"
}
},
"required": [
"Confidence",
"Lexical",
"ITN",
"MaskedITN",
"Display"
],
"type": "object"
},
"type": "array"
},
"Offset": {
"type": "integer"
},
"RecognitionStatus": {
"type": "string"
}
},
"type": "object"
}
BalamuruganBalakreshnan_8-1635600450104.jpeg

 

BalamuruganBalakreshnan_9-1635600450155.jpeg

 

  • Now add action for upload data to blob
  • Give a container output
  • Give a output name
BalamuruganBalakreshnan_10-1635600450106.jpeg

 

BalamuruganBalakreshnan_11-1635600450119.jpeg

 

  • Go to Overview and then click Run Trigger and click -> Run
  • Upload the wav file
  • Wait for it to process
BalamuruganBalakreshnan_12-1635600449894.jpeg

 

BalamuruganBalakreshnan_13-1635600450135.jpeg

 

  • Give some time for the Speech API to process
  • Now go to blob storage
{
"RecognitionStatus": "Success",
"Offset": 300000,
"Duration": 524000000,
"NBest": [
{
"Confidence": 0.972784698009491,
"Lexical": "the speech SDK exposes many features from the speech service but not all of them the capabilities of the speech SDK are often associated with scenarios the speech SDK is ideal for both real time and non real time scenarios using local devices files azure blob storage and even input and output streams when a scenario is not achievable with the speech SDK look for a rest API alternative speech to text also known as speech recognition transcribes audio streams to text that your applications tools or devices can consume more display use speech to text with language understanding louis to deride user intents from transcribed speech and act on voice commands you speech translation to translate speech input to a different language with a single call for more information see speech to text basics",
"ITN": "the speech SDK exposes many features from the speech service but not all of them the capabilities of the speech SDK are often associated with scenarios the speech SDK is ideal for both real time and non real time scenarios using local devices files azure blob storage and even input and output streams when a scenario is not achievable with the speech SDK look for a rest API alternative speech to text also known as speech recognition transcribes audio streams to text that your applications tools or devices can consume more display use speech to text with language understanding louis to deride user intents from transcribed speech and act on voice commands you speech translation to translate speech input to a different language with a single call for more information see speech to text basics",
"MaskedITN": "the speech sdk exposes many features from the speech service but not all of them the capabilities of the speech sdk are often associated with scenarios the speech sdk is ideal for both real time and non real time scenarios using local devices files azure blob storage and even input and output streams when a scenario is not achievable with the speech sdk look for a rest api alternative speech to text also known as speech recognition transcribes audio streams to text that your applications tools or devices can consume more display use speech to text with language understanding louis to deride user intents from transcribed speech and act on voice commands you speech translation to translate speech input to a different language with a single call for more information see speech to text basics",
"Display": "The Speech SDK exposes many features from the speech service, but not all of them. The capabilities of the speech SDK are often associated with scenarios. The Speech SDK is ideal for both real time and non real time scenarios using local devices files, Azure blob storage and even input and output streams. When a scenario is not achievable with the speech SDK, look for a rest API. Alternative speech to text, also known as speech recognition, transcribes audio streams to text that your applications, tools or devices can consume more display use speech to text with language, understanding Louis to deride user intents from transcribed speech and act on voice commands. You speech translation to translate speech input to a different language with a single call. For more information, see speech to text basics."
}
]
}
  • Above is the sample output
  • Confidence score and Display is available
  • Now process Text analytics and pull Key phrases, PII, Sentiment and Entities
  • Create 3 variable one for id, text and language
  • Create id
BalamuruganBalakreshnan_14-1635600449890.jpeg

 

BalamuruganBalakreshnan_15-1635600450157.jpeg

 

  • Create language
BalamuruganBalakreshnan_16-1635600449906.jpeg

 

BalamuruganBalakreshnan_17-1635600450164.jpeg

 

  • Create Text
BalamuruganBalakreshnan_18-1635600450166.jpeg

 

BalamuruganBalakreshnan_19-1635600450176.jpeg

 

  • Next is compose
BalamuruganBalakreshnan_20-1635600450178.jpeg

 

BalamuruganBalakreshnan_21-1635600450185.jpeg

 

{
"documents": [
{
"id": @{variables('id')},
"language": @{variables('language')},
"text": @{variables('text')}
}
]
}
  • text analytics API
https://cogsvcname.cognitiveservices.azure.com/text/analytics/v3.1/keyPhrases
  • Provide Header - Ocp-Apim-Subscription-Key
  • Headers - Content-Type
  • Body - Content from compose output
BalamuruganBalakreshnan_22-1635600450186.jpeg

 

BalamuruganBalakreshnan_23-1635600450199.jpeg

 

  • Parse JSON output
BalamuruganBalakreshnan_24-1635600450201.jpeg

 

BalamuruganBalakreshnan_25-1635600450215.jpeg

 

  • Schema
{
"properties": {
"documents": {
"items": {
"properties": {
"id": {
"type": "string"
},
"keyPhrases": {
"items": {
"type": "string"
},
"type": "array"
},
"warnings": {
"type": "array"
}
},
"required": [
"id",
"keyPhrases",
"warnings"
],
"type": "object"
},
"type": "array"
},
"errors": {
"type": "array"
},
"modelVersion": {
"type": "string"
}
},
"type": "object"
}
  • Delete the blob
  • name of blob: textanalytics.json
BalamuruganBalakreshnan_26-1635600450217.jpeg

 

BalamuruganBalakreshnan_27-1635600450226.jpeg

 

  • Save the blob now
  • name of blob: textanalytics.json
BalamuruganBalakreshnan_28-1635600450224.jpeg

 

BalamuruganBalakreshnan_29-1635600450233.jpeg

 

  • Now call Text analytics for PII
BalamuruganBalakreshnan_30-1635600450235.jpeg

 

BalamuruganBalakreshnan_31-1635600450049.jpeg

 

https://cogsvcnmae.cognitiveservices.azure.com/text/analytics/v3.1/entities/recognition/pii
  • Provide Header — Ocp-Apim-Subscription-Key
  • Headers — Content-Type
  • Body — Content from compose output
  • Now bring parseJSON
BalamuruganBalakreshnan_32-1635600449892.jpeg

 

BalamuruganBalakreshnan_33-1635600449997.jpeg

 

{
"type": "object",
"properties": {
"documents": {
"type": "array",
"items": {
"type": "object",
"properties": {
"redactedText": {
"type": "string"
},
"id": {
"type": "string"
},
"entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"text": {
"type": "string"
},
"category": {
"type": "string"
},
"offset": {
"type": "integer"
},
"length": {
"type": "integer"
},
"confidenceScore": {
"type": "number"
}
},
"required": [
"text",
"category",
"offset",
"length",
"confidenceScore"
]
}
},
"warnings": {
"type": "array"
}
},
"required": [
"redactedText",
"id",
"entities",
"warnings"
]
}
},
"errors": {
"type": "array"
},
"modelVersion": {
"type": "string"
}
}
}
  • now bring delete
  • blob name: textpii.json
BalamuruganBalakreshnan_34-1635600449980.jpeg

 

BalamuruganBalakreshnan_35-1635600450007.jpeg

 

  • now save the file to blob
  • blob name: textpii.json
BalamuruganBalakreshnan_36-1635600449903.jpeg

 

BalamuruganBalakreshnan_37-1635600449939.jpeg

 

  • now get Sentiment API
BalamuruganBalakreshnan_38-1635600449983.jpeg

 

BalamuruganBalakreshnan_39-1635600450066.jpeg

 

https://cogsvcnmae.cognitiveservices.azure.com/text/analytics/v3.1/sentiment
  • Provide Header — Ocp-Apim-Subscription-Key
  • Headers — Content-Type
  • Body — Content from compose output
  • Bring parseJSON
BalamuruganBalakreshnan_40-1635600449986.jpeg

 

BalamuruganBalakreshnan_41-1635600450092.jpeg

 

{
"type": "object",
"properties": {
"documents": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string"
},
"sentiment": {
"type": "string"
},
"confidenceScores": {
"type": "object",
"properties": {
"positive": {
"type": "number"
},
"neutral": {
"type": "number"
},
"negative": {
"type": "number"
}
}
},
"sentences": {
"type": "array",
"items": {
"type": "object",
"properties": {
"sentiment": {
"type": "string"
},
"confidenceScores": {
"type": "object",
"properties": {
"positive": {
"type": "number"
},
"neutral": {
"type": "number"
},
"negative": {
"type": "number"
}
}
},
"offset": {
"type": "integer"
},
"length": {
"type": "integer"
},
"text": {
"type": "string"
}
},
"required": [
"sentiment",
"confidenceScores",
"offset",
"length",
"text"
]
}
},
"warnings": {
"type": "array"
}
},
"required": [
"id",
"sentiment",
"confidenceScores",
"sentences",
"warnings"
]
}
},
"errors": {
"type": "array"
},
"modelVersion": {
"type": "string"
}
}
}
  • now bring delete
  • blob name: textsentiment.json
BalamuruganBalakreshnan_42-1635600449976.jpeg

 

BalamuruganBalakreshnan_43-1635600450094.jpeg

 

  • now bring save blob
  • blob name: textsentiment.json
BalamuruganBalakreshnan_44-1635600450096.jpeg

 

BalamuruganBalakreshnan_45-1635600450103.jpeg

 

  • now get the entities
BalamuruganBalakreshnan_46-1635600449896.jpeg

 

BalamuruganBalakreshnan_47-1635600450257.jpeg

 

https://cogsvcnmae.cognitiveservices.azure.com/text/analytics/v3.1/entities/recognition/general
  • Provide Header — Ocp-Apim-Subscription-Key
  • Headers — Content-Type
  • Body — Content from compose output
  • Bring parseJSON
BalamuruganBalakreshnan_48-1635600450259.jpeg

 

BalamuruganBalakreshnan_49-1635600450273.jpeg

 

{
"type": "object",
"properties": {
"documents": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string"
},
"entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"text": {
"type": "string"
},
"category": {
"type": "string"
},
"subcategory": {
"type": "string"
},
"offset": {
"type": "integer"
},
"length": {
"type": "integer"
},
"confidenceScore": {
"type": "number"
}
},
"required": [
"text",
"category",
"offset",
"length",
"confidenceScore"
]
}
},
"warnings": {
"type": "array"
}
},
"required": [
"id",
"entities",
"warnings"
]
}
},
"errors": {
"type": "array"
},
"modelVersion": {
"type": "string"
}
}
}
  • now bring delete
  • blob name: textentities.json
BalamuruganBalakreshnan_50-1635600450274.jpeg

 

BalamuruganBalakreshnan_51-1635600450283.jpeg

 

  • now save the final output
BalamuruganBalakreshnan_52-1635600449886.jpeg

 

BalamuruganBalakreshnan_53-1635600450305.jpeg

 

Original article at — Samples2021/audiotext.md at main · balakreshnan/Samples2021 (github.com)

Medium Article - Call Center Analytics — No Code to Process Speech and Convert to Text and get insights like Key phra...

 

Version history
Last update:
‎Nov 13 2021 06:33 AM