What you may not know about Google Cloud Platform just like me, until much later?

Updated: Jan 7

I am summarising the following as I have slowly getting more hands-on with GCP. I thought I should share with you esp. if you are reading for GCP exam. I will update this living draft as and when I have some spare cycles and NEW entries will be shown at the top of the bullet points.


  • You can use the lifecycle command to get or set lifecycle management policies for a given bucket. This command is supported for buckets only, not objects.

Lifecycle configurations allows you to automatically delete or change the storage class of objects when some criterion is met.


To enable lifecycle for a bucket with settings defined in the config_file.json file, run:

$ gsutil lifecycle set <config_file.json> gs://<bucket_name>


For instance, in order to delete the content of the bucket after 30 days, the config file [config_file.json] would look like:

{
   "lifecycle": {
 	"rule": [
 	    {
 		"action": {"type": "Delete"},
 		"condition": {
 		    "age": 30,
 		    "isLive": true
 		}
 	    }
 	]
    }
}

Another example here is to change the storage class of a bucket to Nearline after a year would be:

{
 "action": {
       "type": "SetStorageClass",
       "storageClass": "NEARLINE"
 },
 "condition": {
       "age": 365,
       "matchesStorageClass": ["MULTI_REGIONAL", "STANDARD", "DURABLE_REDUCED_AVAILABILITY"]
 }
}
  • Google Cloud Storage versioning is NOT available on GCP console. The command to set versioning on is:

gsutil versioning set on gs://BUCKET_NAME 

Once Object Versioning is enabled, each time a live object version is replaced or deleted, that version becomes a noncurrent version.

To list both live and noncurrent versions of an object and view their generation numbers:


gsutil ls -a gs://BUCKET_NAME
  • BigQuery can save you cost if you choose the Default Table Expiration option. Take note that you have to enter MANUALLY the number of days you want the table(s) in your BigQuery Dataset to "expire". It is NOT automatic as the term "Default Table Expiration" implies.

  • Common gcloud commands esp. for Associate Cloud Engineer certification

https://cloud.google.com/sdk/docs/images/gcloud-cheat-sheet.pdf

  • There are 2 ways for SSH access to VM instances 1) Add SSH key to the project or individual instance metadata 2) Use OS Login feature Use OS Login to manage SSH access to your instances using IAM without having to create and manage individual SSH keys. Imagine if you have more than 20 Compute Engine VM instances to add metadata too.


OS Login maintains a consistent Linux user identity across VM instances and is the recommended way to manage many users across multiple instances or projects


OS Login provides the following benefits:

  1. Automatic Linux account lifecycle management - You can directly tie a Linux user account to a user's Google identity so that the same Linux account information is used across all instances in the same project or organization.

  2. Fine grained authorization using Google IAM - Project and instance-level administrators can use IAM to grant SSH access to a user's Google identity without granting a broader set of privileges. For example, you can grant a user permissions to log into the system, but not the ability to run commands such as sudo.

  3. Automatic permission updates - With OS Login, permissions are updated automatically when an administrator changes IAM permissions. For example, if you remove IAM permissions from a Google identity, then access to VM instances is revoked.

  4. Ability to import existing Linux accounts - Administrators can choose to optionally synchronize Linux account information from Active Directory (AD) and Lightweight Directory Access Protocol (LDAP) that are set up on-premises. For example, you can ensure that users have the same user ID (UID) in both your Cloud and on-premises environments.

  • A service account is a special Google account that belongs to your application or a virtual machine (VM) instead of an individual end user. Your application uses the service account to call the Google API of a service, so that the users aren't directly involved.

For example, a Compute Engine VM may run as a service account, and that account can be given permissions to access the resources it needs. This way the service account is the identity of the service, and the service account's permissions control which resources the service can access.


A service account is identified by its email address, which is unique to the account and there is two types: User-managed Service Account (service-account-name@project-id.iam.gserviceaccount.com); GCP Default Service Account (App Engine - project-id@appspot.gserviceaccount.com) and (Compute Engine - project-number-compute@developer.gserviceaccount.com)


User-managed Service Account will not have the Access Scopes and you have to use Identity and Access Management to add in the required roles: Primitive; Predefined; Custom

Service accounts can be thought of as both a resource and as an identity.

When thinking of the service account as an identity, you can grant a role to a service account, allowing it to access a resource (such as a project resource like Google Cloud Storage bucket object).


When thinking of a service account as a resource, you can grant roles to other users to access or manage that service account.


Service Accounts do not have username and password like users; They need either google-managed or user-managed API keys for authentication prior to the access to resources.

  • The browser-based Google Cloud console SSH feature which we used to connect to VM instances simply works because your security admin has already predefined a firewall rule that allows the source IP address. However, source IP addresses for browser-based SSH sessions are dynamically allocated by the GCP Console and can vary from session to session. For the feature to work, you must allow connections either from any IP address, or from Google's IP address range, which you can retrieve using public SPF records. Either of these options may pose unacceptable risks, depending on your requirements. Instead, you would allow the IP address of the SSH clients you are using to connect in “Production” environment.

  • Google Cloud Platform has two primary ways to handle Google user account authentication. Google authentication and SAML 2.0 OpenID complaint Single Sign-On authentication. Using the latter method, Google operates as a Service Provider and your SSO system operates as an Identity Provider. You manage your own authentication mechanism and manage your own credentials like Microsoft Active Directory or Lightweight Directory Access Protocol.

Many new GCP customers get started by logging in to the GCP console with the Gmail account. Gmail accounts and Google Groups are often the easiest way to get started, but they offer no centralized way to manage these users. GCP customers who are also GSuite or rebranded Google Workspace customers can define GCP policies in terms of GSuite users and groups. This way, when someone leaves your organization, an administrator can immediately disable their account and remove them from the groups using the Google Admin Console. GCP customers who are not GSuite customers can get the same capabilities through Cloud Identity.

Cloud Identity (IDaaS) lets you manage users in groups using the Google Admin Console. But you do not pay for or receive G Suites collaboration products such as Gmail, Docs Drive, and Calendar. Cloud identity is available both as a free and a premium edition. A premium addition also adds capabilities for mobile device management.

  • The default network is pre-populated with the following four firewall rules that allow incoming connections to instances. They can be deleted or modified as necessary:

  • Remember, all networks have the following 2 implied firewall rules (which will not be displayed in the Google Cloud Console) to block all incoming traffic and allow all outgoing traffic. They can not be deleted.

Unlike the default network, user-created networks do not have any other rules by default, so currently no inbound traffic is allowed.

  • VPC Flow Logs record network flows sent from or received by VM instances. VPC Flow Logs will only include traffic seen by VM. For example, if outbound traffic was blocked by an egress rule, it will be seen and logged; Inbound traffic blocks by an ingress rule not reaching a VM, will not be seen and not to be logged.

These logs can be used to monitor network traffic to and from your VMs. For forensics, real-time security analysis, and expense optimization. You can view Flow Logs in Stackdriver Logging, and you can also export logs to any destination that Stackdriver Logging or rebranded Cloud Logging exports supports. For example, Cloud Logging bucket, Cloud Pub/Sub topic, BigQuery dataset, Splunk and other project. Flow Logs are aggregated by connection at five second intervals from Compute Engine VMs and exported in real-time.


VPC flow log is not enabled by default and it can be enabled on a subnet basis

  • Identity and Access Management roles of "browser" and "view" are very different though they sound alike: "browser" role limits you to have the following permissions:

% gcloud iam roles describe roles/browser

Output of the above command:

description: Access to browse GCP resources.
etag: AA==
includedPermissions:
- resourcemanager.folders.get
- resourcemanager.folders.list
- resourcemanager.organizations.get
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
- resourcemanager.projects.list
name: roles/browser
stage: GA
title: Browser

"viewer" role on the other hand has a lot more permissions like storage bucket list, storage bucket get and others. As of the time I ran this command, it has 1,616 permissions!

% gcloud iam roles describe roles/viewer|wc -l
 1616
  • Google Cloud Firewall rules, if unspecified, is assigned the default priority of 1,000 (0 -65535 is the allowable priorities with 0 being the highest). Firewall rules are evaluated based on priority, starting from the lowest value. The first rule that matches gets applied.

  • Google Cloud resources are organized hierarchically:

The organization is the root node in the hierarchy.

Folders are children of the organization.

Projects are children of the organization, or of a folder.

Resources for each service are descendants of projects.

  • You can not assign permission(s) directly to any GCP user or Service Account; Best practice is to assign a ROLE which itself is a collection of permissions to such user or Service Account or better still as another best practice is to group user accounts into group, then we assign a ROLE to such group. An example of a ROLE is VM Instance Network Admin.

In fact, in Google Cloud Console, you will not see any "permission" option which you can assign to a Google Cloud user, group and service account.

  • A Cloud IAM policy is a JSON or YAML file with binding information in terms a binding binds a list of members to a role where the members can be user accounts, Google group, Google domains, and service accounts. A role is a named list of permissions defined by Cloud IAM.

Once applied to a resource is a result of the merging of a. Resource and b.Resource’s Parent roles.


Imagine we define a less restrictive Resource’s Parent policy like EDITOR Role defined at PROJECT level, and then we have a more restrictive VIEWER Role defined at the RESOURCE level. The result is the less restrictive EDITOR role will take hold.

  • Every products and/or services you deploy in Google Cloud Platform needs to be hosted in a PROJECT . If you have FOLDER(s) too which is optional to demarcate different divisions/departments, then you must have the ORGANIZATION node

  • Google's Stackdriver suite of products is NOW rebranded as Google Operations Suite: a. Cloud Logging b. Cloud Monitoring c. Cloud Trace d. Cloud Profiler e. Cloud Debugger

  • Data at rest is automatically encrypted

  • Unlike Google Cloud Shell, Google Cloud Deployment Manager (Google's Infrastructure As Code offering) will deploy resources in parallel. Using the self link reference for the network name ensures that the VPC network is created before the firewall. This is very important, because deployment manager creates all resources IN PARALLEL unless you use such self link references.

  • Everything in GCP or for that matter any cloud is all the APIs of the various web services behind the scene. So before you jump into any new product or service, make sure the API is ENABLED

  • Firewall rule's Target can be a. All instances in the network b. Specified target tags c. Specified Service Account; Firewall rule's Source can be a. IP ranges b. Source tags c. Service Account

  • Cloud NAT and Cloud Router are typically created together and as expected, they operate at OSI Layer 3 or NETWORK layer

  • Firewall rules are set at NETWORK level so if you have a REGIONAL Load Balance Compute Engine VM Instances in two ZONES over the same VPC network, you achieve High Availability with the same firewall rules

  • In Google Cloud, you can use a combination of Load Balancers together. An example is we can have a GLOBAL HTTP(s) Load Balancer with a single unicast IP being presented for access to our global clients. Depending on where the client connects from, a client may be redirected to a REGIONAL Load Balancer closer to him/her or their services. From REGIONAL Load Balancer, the client requests can be further downstream to an INTERNAL Load Balancer whereby the BACKEND VM Instances or Services are hosted. Sometime, we may not want to expose our INTERNAL business-critical applications so they are hosted with Private IP addresses

  • The fastest way to save money or control cost if Google Cloud project cost is going out of control is to SHUT DOWN the project using Google Cloud Platform User Interface.https://cloud.google.com/resource-manager/docs/creating-managing-projects

  • TAGS vs LABELS - Tags are used for identifying the source and target IP or CIDR block of Compute Engine VM Instances. Tags enable you to make firewall rules and routes

  • Labels are key-value pair entities we assign to GCP resources to identify, organize and manage them. You can attach a label to each resource, then filter the resources based on their labels. Information about labels is forwarded to the billing system, so you canbreak down your billing charges by label to identify your spend. You can label location of your egress traffic as an example

  • Dedicated Interconnect and Partner Interconnect are OSI Layer 2 services which means that you can attach your VLAN with RFC1918 IP addresses

  • Direct Peering and Carrier Peering are OSI Layer 3 services and you will have to use External IP addresses

  • To use Global Load Balancer like HTTP(s), TCP Proxy and SSL Proxy, you will need to subscribe to Google "Premium" Network Tier

  • Auto Healing & Auto Scaling features for Compute Engine VM Instances are only applicable when you are using Managed Instance Groups

  • VPC network is Global in nature where VPC Subnet is Regional

  • You can not use Shared VPC in the same project and Shared VPC can not cross an organization; VPC Peering allows you to share the VPC between different organisations

  • Private Google Access only allows you to access Google API, products and services like YouTube from your RFC1918 or Internal IP addresses of your Compute Engine VM instances and you can not access other Internet resources like https://www.aws.com; Private Google Access is enabled on A SUBNET BY SUBNET basis and another thing to remember is Private Google Access has NO effect on VM Instances with EXTERNAL IP. To allow such Internet connectivity to other non-Google resources, you will need to configure Cloud NAT and Cloud Router.

  • Google Cloud Network Address Translation DOES NOT implement Inbound NAT from the greater Internet INTO Compute Engine VM Instances' Private Network

  • Premium Network Tier has SLA but not Standard Network Tier

  • Google managed scheduler is Cloud Scheduler which helps in automating any kind of jobs or services

  • Google Cloud Audit Logs for Compute Engine VM Instances is: Admin Activity Log; System Event Log; Data Access Log. Only the first two are enabled by default. Data Access Log once enabled allows you to log your application activities. One example of Admin Activity is updating and patching of VM instances or create a new subnet in your VPC network. An example of System Event is reset or migrate the VM Instance

  • VPC Subnet can be EXPANDED from CIDR/20 to CIDR/16 to cater for more IP addresses to provide a total of 65,534

  • Compute Engine VM Instance is a ZONAL/ZONE resource

  • GCP Persistent Disks are Block Storage and it is accessible via Google Internal Network. There is also Local Attached Solid State Disk but it is typically used as swap or temporary disk so the content is ephemeral

  • For AI or Machine Learning, it is recommended to use Google Cloud VM Instances with TPU (Tensorflow Processing Unit) and they are not available in all regions

  • BigQuery has concept of Dataset followed by Table(s) embedded in Dataset placeholder. It also has the concept of Default Table Expiration Time which is configurable

  • GCP VPN has Classic VPN and HA VPN. Classic VPN only has one bidirectional VPN tunnel so there is no high availability as expected; HA VPN allows you to have 2 or more VPN tunnels configured bidirectional between your on-premises and Google Cloud. Classic VPN tunnel has three BGP (Border Gateway Protocol) routing options: a. Dynamic Routing based on BGP b. Route based VPN c. Policy based Routing. b and c differs in that Policy based Routing allows you to create and maintain manual routes in your VPC network of your peer router whereby Route based VPN allows you to create VPN tunnel with LOCAL & REMOTE traffic selectors set to 0.0.0.0/0 (Any IP address)

  • Google Cloud Storage Signed URL allows you to set an expiration time on the Google Cloud Storage Bucket or Object(s) in the Bucket whereas Public URL does not have such feature. To generate a Signed URL, you will need to pre-create a security key either in JSON or P12 format and you will download and saved the Private key

  • Google Cloud BigQuery is "SERVERLESS" and Google Cloud BigTable is NOT

  • Google Cloud App Engine (PaaS) FLEXIBLE allows VPN access but not App Engine STANDARD (Another SaaS variant) which runs in a Google Sandbox environment

  • Google Cloud Dataproc differentiates from typical on-premises HADOOP Clusters in that Dataproc Cluster can be torn down when they are not in use and HADOOP Clusters' disks (HDFS) are replaced by Google Cloud Storage which provides much higher availability and they are cheaper esp. if Lifecycle Management Policy is being enabled on the Google Cloud Storage objects to move rarely used objects to cheaper Storage Class like from NEARLINE to COLDLINE or to ARCHIVE. With HADOOP HDFS, the clusters have to be up at all time especially if you want to keep those data in HDFS disks: You pay a lot more for traditional HADOOP infrastructure capital expense, but use a little while you are submitting HADOOP job!

  • Another ground breaking feature of Google Cloud Dataproc Cluster is that many of the heavy churning of HADOOP Map Reduce jobs can be ran using Google Cloud "PRE-EMTIBLE" VM Instances which can save unto 70-80% compared to the typical VM Instances

  • A Google Cloud "Project ID" once created, it CAN'T be deleted anymore. It has to be globally unique if you have chosen to generate the Project ID yourself








10 views0 comments

Recent Posts

See All