Running Notebook using Cloud Pub-Sub & Cloud Functions

Mehul Jani
3 min readJun 28, 2021

In collaboration with Rishi Singhal, Amit Verma & Ashmita Kapoor

Brief Outline
Covering from the previous blog, the idea is to run a Jupyter notebook via a trigger initiated by the User via a Web interface or via a Cron Scheduler (Cloud Scheduler).

Either of these steps would place a message into Cloud Pub/Sub. This message would hold the parameters for the Cloud Function execution.
Using the parameters the Cloud Function would trigger creation of Deep Learning VM with a metadata script which pulls the
* Jupyter Notebook
* Parameters File
* Google Cloud Storage and BigQuery Source Locations
* Google Cloud Storage Destination Locations
And triggers a Papermill command to do a parameterised execution of the Jupyter Notebook.

Complete Setup Flow
1. In Cloud PubSub we create a topic. Below is a Snapshot of the topic created:

2. We create the subscription for the Cloud PubSub, refer below snapshot:

3. In Cloud Function, we create a new function and choose the PubSub Trigger. In the drop down we choose the earlier created topic as seen below:

4. Click on Save and Next.
5. We choose Python 3.9 from the drop down for the cloud function code and create the main.py and requirements.txt file

Below is the snippet of Cloud Function entrypoint code which executes on event from PubSub

def execute(event,context):
#print(“””This Function was triggered by messageId {} published at {} “””.format(context.event_id, context.timestamp))
projectId = event[‘attributes’][‘projectId’]
bucketInput = event[‘attributes’][‘bucketInput’]
bucketOutput = event[‘attributes’][‘bucketOutput’]
zone = event[‘attributes’][‘zone’]
machineType = event[‘attributes’][‘machineType’]
machineName = event[‘attributes’][‘machineName’]
resp = create_instance(
GCP_PROJECT=projectId,
GCS_BUCKET_PATH_INPUT=bucketInput,
GCS_BUCKET_PATH_OUTPUT=bucketOutput,
ZONE=zone,
MACHINE_TYPE=machineType,MACHINE_NAME=machineName)
print(“Response”,str(resp))
return str(resp)

For the entire code refer GitHub.

We execute the below command via Cloud Shell to trigger the above code :

gcloud pubsub topics publish ainotebooktrigger \
—message=VitaminG \
—attribute=”projectId=<PROJECTID>,bucketInput=<GCS_INPUTBUCKET>,bucketOutput=<GCP_OUTPUTBUCKET>/<OUTPUTFOLDER>,zone=us-west1-b,machineType=n1-standard-4,machineName=ainotebooktest-vitaming”
Output :
messageIds:
- ‘2554148569439898’

Alternatively Cloud Scheduler can also be configured to send the same message to Pub/Sub to trigger the Cloud Function.

7. Logs for all Cloud Function and Deep Learning VMs will be available in Stackdriver for review and analysis/troubleshooting.

The completion of the process is indicated by the rdata.csv file in Google Cloud Storage bucket

Watch out for the third blog of the series…

In the next blog of this series we will be explaining how we can use cloud composer (managed Apache airflow service) to automate the notebook deployment pipeline in production environments. Here is the high level snippet of architecture.

--

--