☁ Automate Interactions with Contact Center AI: Challenge Lab | logbook

Loading ...
No ad for you

In this article, we will go through the lab GSP311 Automate Interactions with Contact Center AI: Challenge Lab, which is labeled as an expert-level exercise. You will practice the skills and knowledge to deploy Cloud Dataflow Pipeline to transcript audio files and store the data to BigQuery. You will also need to implement Data Loss Prevention API for redacting the sensitive data from the audio transcriptions (such as name, email, phone number, and SSN).

The challenge contains 8 required tasks:

  1. Create a Regional Cloud Storage bucket
  2. Create a Cloud Function
  3. Create a BigQuery dataset
  4. Create a Pub/Sub topic
  5. Create a Regional Cloud Storage bucket with DFaudio folder
  6. Deploy Dataflow pipeline
  7. Process the sample audio files
  8. Run a Data Loss Prevention Job

Setting up the environment

First of all, open the Cloud Shell to clone the Speech Analysis Framework source repository:

git clone https://github.com/GoogleCloudPlatform/dataflow-contact-center-speech-analysis.git

Task 1: Create a Cloud Storage Bucket

Make sure you:

  • create the bucket in the region us-central1

Task 2: Create a Cloud Function

Make sure you:

  • change the Trigger to Cloud Storage and select Finalize as the Event Type
  1. In the Cloud Console, navigate to Cloud Functions.
  2. Create a new function called saf-longrun-job-func.
  3. Select Cloud Storage from the dropdown for the Trigger setting.
  4. In the Event Type dropdown, select Finalize/Create.
  5. Click on the BROWSE button, and choose the bucket created in Task 1.

  1. Select the Runtime to be Node.js 8
  2. Open the source repository in a new window.
  3. Replace the INDEX.JS AND PACKAGE.JSON in the Cloud Function with the source codes from the repository.
  4. Type safLongRunJobFunc in the Function to execute.
  5. Click on ENVIRONMENT VARIABLES, NETWORKING, TIMEOUTS AND MORE, ensure the region is configured to us-central1 under the Advanced options.
  6. Click CREATE.

Task 3: Create a BigQuery Dataset

  1. Navigate to BigQuery, click on CREATE DATASET.
  2. Assign a Dataset ID, e.g. lab.
  3. Click Create dataset.

Task 4: Create Cloud Pub/Sub Topic

  1. Navigate to Pub/Sub > Topics, click on CREATE TOPIC.
  2. Assign a Topic ID, e.g. speech2text.
  3. Click Create Topic.

Task 5: Create a Cloud Storage Bucket for Staging Contents

  1. Navigate to Cloud Storage, click on the bucket created in Task 1.
  2. Create a folder called DFaudio in the bucket.

Task 6: Deploy a Cloud Dataflow Pipeline

In the Cloud Shell, run the following commands to deploy the Dataflow pipeline

cd dataflow-contact-center-speech-analysis/saf-longrun-job-dataflow

python -m virtualenv env -p python3
source env/bin/activate
pip install apache-beam[gcp]
pip install dateparser

export PROJECT_ID=[YOUR_PROJECT_ID]
export TOPIC_NAME=speech2text
export BUCKET_NAME=[YOUR_BUCKET_NAME]
export DATASET_NAME=lab
export TABLE_NAME=transcript

python3 saflongrunjobdataflow.py --project=$PROJECT_ID --input_topic=projects/$PROJECT_ID/topics/$TOPIC_NAME --runner=DataflowRunner --region=us-central1 --temp_location=gs://$BUCKET_NAME/tmp --output_bigquery=$DATASET_NAME.$TABLE_NAME --requirements_file="requirements.txt"

Task 7: Upload Sample Audio Files for Processing

In the Cloud Shell, run the following commands to upload the sample audio files into your Audio Uploads Bucket:

# mono flac audio sample
gsutil -h x-goog-meta-callid:1234567 -h x-goog-meta-stereo:false -h x-goog-meta-pubsubtopicname:$TOPIC_NAME -h x-goog-meta-year:2019 -h x-goog-meta-month:11 -h x-goog-meta-day:06 -h x-goog-meta-starttime:1116 cp gs://qwiklabs-bucket-gsp311/speech_commercial_mono.flac gs://$BUCKET_NAME

# stereo wav audio sample
gsutil -h x-goog-meta-callid:1234567 -h x-goog-meta-stereo:true -h x-goog-meta-pubsubtopicname:$TOPIC_NAME -h x-goog-meta-year:2019 -h x-goog-meta-month:11 -h x-goog-meta-day:06 -h x-goog-meta-starttime:1116 cp gs://qwiklabs-bucket-gsp311/speech_commercial_stereo.wav gs://$BUCKET_NAME

Q: What is the TOP named entity in the 5 audio files processed by the pipeline? A: pair

Task 8: Run a Data Loss Prevention Job

You must make a copy of your BigQuery table before running a Data Loss Prevention Job

  1. Navigate to BigQuery in the Cloud Console
  2. Select the table generated by the Dataflow pipeline.
  3. Click on More > Query settings.
  4. Assign a Table name, e.g. copied, then click Save.
  1. Run the following SQL query:

    SELECT * FROM `[YOUR_PROJECT_ID].[DATASET_NAME].[TABLE]`
    
  2. Select the copied table, then click on EXPORT > Scan with DLP.
  3. In the Create job or job trigger pane, assign a Job ID and then click CREATE.
  4. Click CONFIRM CREATE.


Congratulations! You completed this challenge lab.

Summary

Tasks 1 to 5 were pretty straightforward. If you prefer using the command line to create the resources, please refer to the README file of the Speech Analysis Framework in the GitHub repository. You can also find the commands to deploy the saflongrunjobdataflow.py Python script in Task 6 and the SQL query for getting the answer in Task 7.

Only Task 7 was a little tricky. You will get stuck if you try to make a copy of the table using the COPY TABLE button in the BigQuery console. It can copy the table structure but cannot copy the data in the table. Once you know how to correctly clone the table, the task is just a piece of cake.

Demonstration Video

This browser does not support the YouTube video player. Watch on YouTube

Timestamps:
00:00 Lab start
00:59 Task1: Create a Cloud Storage Bucket
01:58 Task2: Create a Cloud Function
04:07 Task3: Create a BigQuery Dataset
05:00 Task4: Create Cloud Pub/Sub Topic
05:52 Task5: Create a Cloud Storage Bucket for Staging Contents
06:21 Task6: Deploy a Cloud Dataflow Pipeline
13:36 Task7: Upload Sample Audio Files for Processing
17:05 Task8: Run a Data Loss Prevention Job

Keep on reading:

Chris F. Author of this blog, M.Phil.
Load more
Loading Disqus Comments...
Please enable JavaScript to view the comments powered by Disqus.