Cloud

☁ Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab | logbook

08 Sep 2020 | 25 Jun 2021

In this article, we will go through the lab GSP323 Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab, which is an expert-level exercise on Qwiklabs. You will practice the skills and knowledge for running Dataflow, Dataproc, and Dataprep as well as Google Cloud Speech API.

The challenge contains 4 required tasks:

Task 1: Run a simple Dataflow job

In this task, you have to transfer the data in a CSV file to BigQuery using Dataflow via Pub/Sub. First of all, you need to create a BigQuery dataset called lab and a Cloud Storage bucket called with your project ID.

1.1 Created a BigQuery dataset called `lab`

In the Cloud Console, click on Navigation Menu > BigQuery.
Select your project in the left pane.
Click CREATE DATASET.
Enter lab in the Dataset ID, then click Create dataset.

Optional
Run gsutil cp gs://cloud-training/gsp323/lab.schema . in the Cloud Shell to download the schema file.
View the schema by running cat lab.schema.
Go back to the Cloud Console, select the new dataset lab and click Create Table.
In the Create Table dialog, select Google Cloud Storage from the dropdown in the Source section.
Copy gs://cloud-training/gsp323/lab.csv to Select file from GCS bucket.
Enter customers to “Table name” in the Destination section.
Enable Edit as text and copy the JSON data from the lab.schema file to the textarea in the Schema section.
Click Create table.

1.2 Create a Cloud Storage bucket

In the Cloud Console, click on Navigation Menu > Storage.
Click CREATE BUCKET.
Copy your GCP Project ID to Name your bucket.
Click CREATE.

1.3 Create a Dataflow job

In the Cloud Console, click on Navigation Menu > Dataflow.
Click CREATE JOB FROM TEMPLATE.
In Create job from the template, give an arbitrary job name.
From the dropdown under Dataflow template, select Text Files on Cloud Storage to BigQuery under “Process Data in Bulk (batch)”. (DO NOT select the item under “Process Data Continuously (stream)”).

Under the Required parameters, enter the following values:

Field	Value
JavaScript UDF path in Cloud Storage	`gs://cloud-training/gsp323/lab.js`
JSON path	`gs://cloud-training/gsp323/lab.schema`
JavaScript UDF name	`transform`
BigQuery output table	`YOUR_PROJECT:lab.customers`
Cloud Storage input path	`gs://cloud-training/gsp323/lab.csv`
Temporary BigQuery directory	`gs://YOUR_PROJECT/bigquery_temp`
Temporary location	`gs://YOUR_PROJECT/temp`

Replace YOUR_PROJECT with your project ID.

Click RUN JOB.

Task 2: Run a simple Dataproc job

Create a Dataproc cluster

In the Cloud Console, click on Navigation Menu > Dataproc > Clusters.
Click CREATE CLUSTER.
Make sure the cluster is going to create in the region us-central1.
Click Create.
After the cluster has been created, click the SSH button in the row of the master instance.

In the SSH console, run the following command:

hdfs dfs -cp gs://cloud-training/gsp323/data.txt /data.txt

Close the SSH window and go back to the Cloud Console.
Click SUBMIT JOB on the cluster details page.
Select Spark from the dropdown of “Job type”.
Copy org.apache.spark.examples.SparkPageRank to “Main class or jar”.
Copy file:///usr/lib/spark/examples/jars/spark-examples.jar to “Jar files”.
Enter /data.txt to “Arguments”.
Click CREATE.

Task 3: Run a simple Dataprep job

Import runs.csv to Dataprep

In the Cloud Console, click on Navigation menu > Dataprep.
After entering the home page of Cloud Dataprep, click the Import Data button.
In the Import Data page, select GCS in the left pane.
Click on the pencil icon under Choose a file or folder.
Copy gs://cloud-training/gsp323/runs.csv to the textbox, and click the Go button next to it.
After showing the preview of runs.csv in the right pane, click on the Import & Wrangle button.

Transform data in Dataprep

After launching the Dataperop Transformer, scroll right to the end and select column10.
In the Details pane, click FAILURE under Unique Values to show the context menu.
Select Delete rows with selected values to Remove all rows with the state of “FAILURE”.
Click the downward arrow next to column9, choose Filter rows > On column value > Contains.
In the Filter rows pane, enter the regex pattern /(^0$|^0\.0$)/ to “Pattern to match”.
Select Delete matching rows under the Action section, then click the Add button.
Rename the columns to be:
1. runid
2. userid
3. labid
4. lab_title
5. start
6. end
7. time
8. score
9. state
Confirm the recipe. It should like the screenshot below.
Click Run Job.

Task 4: AI

Use Google Cloud Speech API to analyze the audio file

In the Cloud Console, click on Navigation menu > APIs & Services > Credentials.
In the Credentials page, click on + CREATE CREDENTIALS > API key.
Copy the API key to the clipboard, then click RESTRICT KEY.
Open the Cloud Shell, store the API key as an environment variable by running the following command:
```
export API_KEY=<YOUR-API-KEY>
```
Replace <YOUR-API-KEY> with the copied key value.
In the Cloud Shell, create a JSON file called gsc-request.json.

Save the following codes to the file.

{
  "config": {
      "encoding":"FLAC",
      "languageCode": "en-US"
  },
  "audio": {
      "uri":"gs://cloud-training/gsp323/task4.flac"
  }
}

Use the following curl command to submit the request to Google Cloud Speech API and store the response to a file called task4-gcs.result.

curl -s -X POST -H "Content-Type: application/json" --data-binary @gsc-request.json \
"https://speech.googleapis.com/v1/speech:recognize?key=${API_KEY}" > task4-gcs.result

Upload the resulted file to Cloud Storage by running:
```
gsutil cp task4-gcs.result gs://<YOUR-PROJECT_ID>-marking/task4-gcs.result
```
Replace <YOUR-PROJECT_ID> with your project ID.

Use the Cloud Natural Language API to analyze the sentence

In the Cloud Shell, run the following command to use the Cloud Natural Language API to analyze the given sentence.

gcloud ml language analyze-entities --content="Old Norse texts portray Odin as one-eyed and long-bearded, frequently wielding a spear named Gungnir and wearing a cloak and a broad hat." > task4-cnl.result

Upload the resulted file to Cloud Storage by running:
```
gsutil cp task4-cnl.result gs://<YOUR-PROJECT_ID>-marking/task4-cnl.result
```
Replace <YOUR-PROJECT_ID> with your project ID.

Use Google Video Intelligence and detect all text on the video

In the Cloud Shell, create a JSON file called gvi-request.json.

Save the following codes to the file.

{
   "inputUri":"gs://spls/gsp154/video/train.mp4",
   "features": [
       "LABEL_DETECTION"
   ]
}

Go back to the Cloud Console, click on Navigation menu > APIs & Services > Credentials.
Click the service account named “Qwiklabs User Service Account” to view the details.
Click ADD KEY > Create new key.
Choose JSON and click CREATE to download the Private key file to your computer.
Upload the file to the Cloud Shell environment.
Rename the uploaded file to key.json.

Run the following commands to create a token.

gcloud auth activate-service-account --key-file key.json
export TOKEN=$(gcloud auth print-access-token)

Run the following command to use theGoogle Video Intelligence and detect all text on the video.

curl -s -H 'Content-Type: application/json' \
   -H 'Authorization: Bearer '$(gcloud auth print-access-token)'' \
   'https://videointelligence.googleapis.com/v1/videos:annotate' \
   -d @gvi-request.json > task4-gvi.result

Upload the resulted file to Cloud Storage by running:
```
gsutil cp task4-gvi.result gs://<YOUR-PROJECT_ID>-marking/task4-gvi.result
```
Replace <YOUR-PROJECT_ID> with your project ID.

Congratulations! You completed this challenge lab.

Demonstration Video

This browser does not support the YouTube video player. Watch on YouTube

⏱Timestamps:
00 Start Lab
38 Task1: Run a simple Dataflow job
34 Task2: Run a simple Dataproc job
24 Task3: Run a simple Dataprep job
22 Task4: AI

☁ Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab | logbook

Task 1: Run a simple Dataflow job

1.1 Created a BigQuery dataset called `lab`

Optional

1.2 Create a Cloud Storage bucket

1.3 Create a Dataflow job

Task 2: Run a simple Dataproc job

Create a Dataproc cluster

Task 3: Run a simple Dataprep job

Import runs.csv to Dataprep

Transform data in Dataprep

Task 4: AI

Use Google Cloud Speech API to analyze the audio file

Use the Cloud Natural Language API to analyze the sentence

Use Google Video Intelligence and detect all text on the video

Demonstration Video

Featured Posts

Learning Google Cloud Platform on Qwiklabs: Learning Map, Assisti...

How to find the installation path of Windows store apps for defau...

Useful Google Cloud Platform Commands Cheat Sheet (15 Practical T...

Recent Posts

Mastering Git & GitHub CLI: Essential Commands You Should Know

Speed Up GitHub CLI: Custom Completion for 'gh repo clone' Command

Qwiklabs Lab Completion Tracker - Version History

My MATLAB Package and Toolbox Picks

Useful Links

Ads

Search for

☁ Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab | logbook

Task 1: Run a simple Dataflow job

1.1 Created a BigQuery dataset called lab

Optional

1.2 Create a Cloud Storage bucket

1.3 Create a Dataflow job

Task 2: Run a simple Dataproc job

Create a Dataproc cluster

Task 3: Run a simple Dataprep job

Import runs.csv to Dataprep

Transform data in Dataprep

Task 4: AI

Use Google Cloud Speech API to analyze the audio file

Use the Cloud Natural Language API to analyze the sentence

Use Google Video Intelligence and detect all text on the video

Demonstration Video

Sidebar

Featured Posts

Learning Google Cloud Platform on Qwiklabs: Learning Map, Assisti...

How to find the installation path of Windows store apps for defau...

Useful Google Cloud Platform Commands Cheat Sheet (15 Practical T...

Recent Posts

Mastering Git & GitHub CLI: Essential Commands You Should Know

Speed Up GitHub CLI: Custom Completion for 'gh repo clone' Command

Qwiklabs Lab Completion Tracker - Version History

My MATLAB Package and Toolbox Picks

Useful Links

Ads

1.1 Created a BigQuery dataset called `lab`