Serverless Data Processing Pipelines with Google Cloud Run

I'm Samuel Fajreldines

I am a specialist in the entire JavaScript and TypeScript ecosystem (including Node.js, React, Angular and Vue.js)

I am expert in AI and in creating AI integrated solutions

I am expert in DevOps and Serverless Architecture (AWS, Google Cloud and Azure)

I am expert in PHP and its frameworks (such as Codeigniter and Laravel).

Chat with me on WhatsApp

Message me on LinkedIn

Send me an E-mail

Samuel Fajreldines

I am a specialist in the entire JavaScript and TypeScript ecosystem.

I am expert in AI and in creating AI integrated solutions.

I am expert in DevOps and Serverless Architecture

I am expert in PHP and its frameworks.

+55 (51) 99226-5039

samuelfajreldines@gmail.com

Home Page

Serverless Data Processing Pipelines with Google Cloud Run

In today's data-driven world, processing vast amounts of data efficiently is crucial for businesses of all sizes. With the advent of serverless computing, developers can now build scalable and cost-effective data processing pipelines without the overhead of managing infrastructure. Google Cloud Run, a fully managed serverless platform, provides the perfect environment for deploying such pipelines.

This post explores how to leverage Google Cloud Run to build serverless data processing pipelines. We'll delve into the benefits of using Cloud Run, compare it with traditional data processing methods, and provide a step-by-step guide to building your own pipeline.

Why Choose Serverless for Data Processing?

Serverless computing offers numerous advantages for data processing:

Scalability: Automatically handle variable workloads without manual intervention.
Cost-Efficiency: Pay only for the compute time you consume, eliminating idle resource costs.
Reduced Operational Overhead: Focus on writing code instead of managing servers.
Event-Driven Architecture: Process data in response to events, enabling real-time data handling.

By adopting serverless architectures, organizations can accelerate development, reduce costs, and improve scalability.

Introduction to Google Cloud Run

Google Cloud Run is a serverless platform that allows you to run stateless containers that are invocable via HTTP requests. It combines the power of containers with the simplicity of serverless computing.

Key Features:

Fully Managed: No need to manage any infrastructure or servers.
Fast Scaling: Scale from zero to n instances seamlessly based on traffic.
Any Language, Any Library: Run code written in any language or framework.
Container-Based Deployment: Package your application as a container for consistent deployment.

Building a Data Processing Pipeline with Cloud Run

Overview

A data processing pipeline typically involves:

Data Ingestion: Receiving data from various sources.
Data Processing: Transforming, enriching, or analyzing the data.
Data Storage: Saving the processed data for future use.

Using Cloud Run, we can build a pipeline that processes data in real-time, scales automatically, and integrates seamlessly with other Google Cloud services.

Step 1: Setting Up Your Environment

Before starting, ensure you have:

A Google Cloud account with billing enabled.
Google Cloud SDK installed for command-line access.
Cloud Run API enabled in your Google Cloud project.

Step 2: Writing Your Data Processing Application

For this example, we'll create a simple data processing service using Node.js and Express.

app.js:

const express = require('express');
const app = express();
app.use(express.json());

app.post('/', async (req, res) => {
  const data = req.body;

  // Perform data processing here
  const processedData = processData(data);

  // Optionally, store or forward the processed data
  // For example, publish to a Pub/Sub topic, write to BigQuery, etc.

  res.status(200).send('Data processed successfully');
});

function processData(data) {
  // Simulate data transformation
  data.processedAt = new Date().toISOString();
  return data;
}

const PORT = process.env.PORT || 8080;
app.listen(PORT, () => {
  console.log(`Service listening on port ${PORT}`);
});

Step 3: Containerizing the Application

Create a Dockerfile to containerize your application.

Dockerfile:

# Use the official Node.js 14 runtime as a parent image
FROM node:14-slim

# Create and set the working directory
WORKDIR /usr/src/app

# Copy package.json and package-lock.json
COPY package*.json ./

# Install dependencies
RUN npm install --only=production

# Copy the rest of the application code
COPY . .

# Expose the port
ENV PORT 8080
EXPOSE 8080

# Start the application
CMD [ "node", "app.js" ]

Step 4: Building and Deploying to Cloud Run

Build the container image and push it to Google Container Registry (GCR).

# Build the container image
gcloud builds submit --tag gcr.io/PROJECT_ID/data-processor

# Replace PROJECT_ID with your Google Cloud project ID

# Deploy the image to Cloud Run
gcloud run deploy data-processor \
  --image gcr.io/PROJECT_ID/data-processor \
  --platform managed \
  --region REGION \
  --allow-unauthenticated

# Replace REGION with your preferred deployment region

After deployment, you'll receive a URL for your Cloud Run service.

Step 5: Integrating with Event Sources

Triggering via Pub/Sub

To process messages from Cloud Pub/Sub, set up a Pub/Sub topic and subscription.

Create a Pub/Sub topic:

gcloud pubsub topics create data-topic

Create a service account with the Pub/Sub Subscriber role:

gcloud iam service-accounts create pubsub-invoker \
  --display-name "Pub/Sub Invoker"

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:pubsub-invoker@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/pubsub.subscriber"

Create a subscription that pushes messages to Cloud Run:

gcloud pubsub subscriptions create data-subscription \
  --topic=data-topic \
  --push-endpoint=YOUR_CLOUD_RUN_URL \
  --push-auth-service-account=pubsub-invoker@PROJECT_ID.iam.gserviceaccount.com

Replace YOUR_CLOUD_RUN_URL with your Cloud Run service URL.

Handling Cloud Storage Events

Use Eventarc to forward Cloud Storage events to Cloud Run.

Create an Eventarc trigger:

gcloud eventarc triggers create storage-trigger \
  --destination-run-service=data-processor \
  --event-filters="type=google.cloud.storage.object.v1.finalized" \
  --location=REGION \
  --service-account=pubsub-invoker@PROJECT_ID.iam.gserviceaccount.com

Now, when a file is uploaded to Cloud Storage, an event is sent to your Cloud Run service.

Step 6: Scaling and Configuration

Configure your Cloud Run service for optimal performance.

Set concurrency and memory limits:

gcloud run services update data-processor \
  --concurrency=80 \
  --memory=512Mi

Configure environment variables for flexibility:

gcloud run services update data-processor \
  --update-env-vars "ENV=production,DEBUG=false"

Step 7: Monitoring and Logging

Utilize Cloud Monitoring and Cloud Logging to gain insights into your pipeline's performance.

Cloud Logging: View logs generated by your application.
Cloud Monitoring: Set up dashboards and alerts for key metrics like CPU utilization, memory usage, and request latency.

Benefits of Using Cloud Run for Data Pipelines

Automatic Scaling: Handles unexpected traffic spikes smoothly.
Language and Framework Agnostic: Use any programming language or library.
Seamless Integration: Works well with other Google Cloud services.
Reduced Costs: No need to pay for idle compute instances.

Best Practices

Optimize Cold Starts

Minimize Image Size: Use slim base images to reduce startup time.
Set Minimum Instances: Keep instances warm to avoid cold starts.

gcloud run services update data-processor \
  --min-instances=1

Secure Your Service

Use Service Accounts: Grant least privilege necessary for service-to-service communication.
Network Security: Integrate with VPC Service Controls for advanced security.

Implement Error Handling and Retries

Idempotency: Ensure your processing logic can handle duplicate messages.
Retry Policies: Configure Pub/Sub and Eventarc to retry failed requests.

Real-World Use Cases

Streaming Data Analytics

Process live data streams from sources like IoT devices or user interactions for real-time analytics.

ETL (Extract, Transform, Load) Processes

Transform and load data into data warehouses like BigQuery for large-scale analytics.

Image and Video Processing

Handle media uploads by performing processing tasks like resizing images or transcoding videos.

Machine Learning Pipelines

Preprocess data for machine learning models or serve predictions in real-time.

Conclusion

Building serverless data processing pipelines with Google Cloud Run empowers developers to create scalable, efficient, and cost-effective solutions. By leveraging Cloud Run's serverless capabilities and seamless integration with other Google Cloud services, you can focus on delivering value through your data processing logic rather than managing infrastructure.

Embrace serverless architectures to modernize your data workflows and stay ahead in the rapidly evolving tech landscape.

Keywords: Serverless Data Processing, Google Cloud Run, Event-Driven Architecture, Scalable Pipelines, Cloud Functions, Google Cloud Platform

See More Posts

About Me

Samuel Fajreldines

Since I was a child, I've always wanted to be an inventor. As I grew up, I specialized in information systems, an area which I fell in love with and live around it. I am a full-stack developer and work a lot with devops, i.e., I'm a kind of "jack-of-all-trades" in IT. Wherever there is something cool or new, you'll find me exploring and learning... I am passionate about life, family, and sports. I believe that true happiness can only be achieved by balancing these pillars. I am always looking for new challenges and learning opportunities, and would love to connect with other technology professionals to explore possibilities for collaboration. If you are looking for a dedicated and committed full-stack developer with a passion for excellence, please feel free to contact me. It would be a pleasure to talk with you!

Chat with me on WhatsApp

Message me on LinkedIn

Send me an E-mail

Resume

Experience

SecurityScoreCard

Nov. 2023 - Present

New York, United States

Senior Software Engineer

I joined SecurityScorecard, a leading organization with over 400 employees, as a Senior Full Stack Software Engineer. My role spans across developing new systems, maintaining and refactoring legacy solutions, and ensuring they meet the company's high standards of performance, scalability, and reliability.

I work across the entire stack, contributing to both frontend and backend development while also collaborating directly on infrastructure-related tasks, leveraging cloud computing technologies to optimize and scale our systems. This broad scope of responsibilities allows me to ensure seamless integration between user-facing applications and underlying systems architecture.

Additionally, I collaborate closely with diverse teams across the organization, aligning technical implementation with strategic business objectives. Through my work, I aim to deliver innovative and robust solutions that enhance SecurityScorecard's offerings and support its mission to provide world-class cybersecurity insights.

Technologies Used:
Node.js Terraform React Typescript AWS Playwright and Cypress

SecurityScoreCard

Nov. 2023 - Present

New York, United States

Senior Software Engineer

I joined SecurityScorecard, a leading organization with over 400 employees, as a Senior Software Engineer, focusing primarily on frontend development. My role involves designing and building user-centric interfaces, optimizing performance, and ensuring seamless user experiences across our web applications.

I specialize in modern frontend technologies, crafting scalable and maintainable codebases while integrating them efficiently with backend systems. I also contribute to UI/UX improvements, enhancing usability and accessibility to align with industry best practices.

Beyond development, I collaborate closely with designers, product managers, and backend engineers to ensure cohesive and intuitive applications. By leveraging my expertise in frontend architecture and performance optimization, I help SecurityScorecard deliver high-quality cybersecurity insights through fast, responsive, and visually compelling interfaces.

Technologies Used:
Node.js Terraform React Typescript AWS Playwright and Cypress
Mahisoft Inc

Dec. 2022 - Present

New York, United States

Senior Software Engineer

I joined Mahisoft as a Senior Software Engineer, where I serve as the lead technologist responsible for all the technology and systems related to the projects under my charge.

I specialize in translating the directives from the board members of Top Trader League into functional, scalable code. My work often involves architecting backend systems, optimizing database queries, and building responsive, user-friendly front-end interfaces to convert the leadership team's vision into tangible results that drive business impact.

One of my key responsibilities is writing efficient and maintainable code that not only meets but exceeds the technical requirements, ensuring that our software solutions are robust and scalable.

Technologies Used:
Node.js PHP (Laravel) React Google Cloud AWS Terraform
Vagalume Midia

Aug. 2021 - Dec. 2022

Senior Software Engineer

I was privileged to join Vagalume as a Senior Full Stack Developer, brought on board by the company's owner, Daniel. At Vagalume, I was the go-to person for a wide array of tasks spanning both coding and DevOps.

As the largest enterprise I've worked for in terms of user base and visitor traffic, Vagalume provided a complex and stimulating environment where I honed my skills in DevOps and high-scalability systems. The guidance and mentorship from Daniel have been invaluable, shaping not only my professional development but also forging a lasting friendship.

One of my most notable contributions was the complete overhaul of Vagalume's radio systems. This involved rearchitecting the infrastructure and rewriting the codebase. The end result was a significant boost in system performance and a marked reduction in AWS operating costs.

Technologies Used:
Node.js PHP React Vue.js AWS Terraform Extensive use of SEO techniques
Anilha

Feb. 2019 - Dec. 2021

Side period

Founder

I believe that the best way to learn is by doing something with what you are learning.

So, during my free time, I started an app called Anilha. Anilha means dumbbell in Portuguese and the app was designed to help users with flexible diet and workouts.

I can say with 100% certainty that Anilha was the biggest factor in my learning process.

Technologies Used:
Node.js Ionic Angular AWS Lambda DynamoDB Terraform
Secretária Virtual

Sep. 2019 - Aug. 2021

Senior Software Engineer

I had the privilege of being recruited by Leonardo Leffa, a close friend and mentor, to oversee the technology initiatives across a diverse portfolio of enterprises under the umbrella of Secretaria Virtual. In this capacity, my responsibilities extended beyond mere code writing to shaping the development processes and workflows that governed how tasks were requested by users and collaborators within the company.

During my tenure at Secretaria Virtual, I led an array of complex development projects as directed by the company's senior leadership. My scope of work covered:

Infrastructure Development: Ensuring robust, scalable backend solutions.

System Development: Architecting and coding business-critical applications.

Monitoring & Testing: Establishing metrics and frameworks to ensure software reliability.

I was instrumental in ushering the company into a new technological era by advocating for and implementing cloud computing solutions and continuous integration practices. Moreover, I led the shift towards Agile development by introducing the Scrum methodology, fostering a more collaborative and efficient work environment.

Technologies Used:
Node.js PHP AngularJS Laravel CodeIgniter Ionic
E-TRUST

Sep. 2017 - Sep. 2019

Senior Software Engineer

Operating in the critical sphere of Information Security, E-trust necessitates the utmost safeguarding of data across all its platforms.

As a Senior Developer at E-trust, I was entrusted with a multi-faceted role that included not only coding but also shaping the development processes in collaboration with upper management. My primary mission was to innovate new features while modernizing both frontend and backend architectures of our Horacius system—all while maintaining rigorous security protocols.

In pursuit of code excellence and efficient workflows, I established clean coding practices based on Object-Oriented Programming (OOP) principles. I was also instrumental in introducing continuous integration processes, which included automated migrations and code validation through specialized robotic checks.

Among my proudest achievements was the complete revamping of the Horacius system's frontend. The challenge was not just to modernize it but also to ensure backward compatibility with legacy systems. The successful implementation resonated well with our client base, which includes some of Brazil's largest banks and corporations.

Being at the forefront of creating the company's code culture, I was exposed to substantial responsibilities and unparalleled learning experiences, particularly in the realms of security and DevOps.

Technologies Used:
PHP Microsoft SQL Server Windows Server

Education

UniRitter

2015 - 2018

UniRitter Laureate International Universities

Bachelor of Computer Science

Engaged in a rigorous program at UniRitter Laureate International Universities, renowned as one of the top institutions in the country. Excelled in System Analysis and Development, receiving accolades for academic excellence. Although I did not complete the degree due to career opportunities in the field, my foundational education and achievements at UniRitter have significantly contributed to my professional capabilities and expertise.