My Platform Engineering Journey #1: The Beginning
It began with a Saturday morning coffee meeting with former colleagues. He demonstrated what he had been working on over the past three months, and I was truly amazed. He had built not just a user-friendly all-in-one portal, but specifically
- Ticket system: Tracks progress stages and involved personnel, with notifications
- Service section: Integrates everything internally - deployments, metrics, logs - eliminating the need for external Grafana dashboards
- Deployment: Standard templates allowing users to get endpoints/deployments with minimal clicks
That is my motivation come from, let's begin. My first resources for this article:
- Introduction to Platform Engineering
- platformengineering.org
- InternalDeveloperPlatform.org
- Platform engineering vs DevOps
Let's talk about salary, career opportunities
- Most people will you tell that a Platform Engineer will have a higher salary than DevOps/SRE/CloudOps/Infrastructure engineers. But it will be the same result if you replace the word DevOps vs SRE/CloudOps/etc... xD. So what is the point? In my personal opinion, it depended on what you have built/are able to build, not fancy certifications or beautiful resume/curriculum vitae (It is my personal opinion, I repeated xD)
- When I'm writing this article, I don't have much of an idea about career opportunities. I just like the keyword Platform Engineering and what its capabilities are, that's all, nothing more. And this will consume about 6 months of mine to achieve the basic stuff, I guess xD
Problems to be resolved - repeated DevOps tasks
First example - repeated tasks
I'm currently working in a DevOps Engineer role. I managed GitLab groups that have hundreds of repositories and 50 repositories needed to be set with CI/CD (even rewrite Dockerfile to best practices!), monitoring, and so on.
When I begin to set up, the first 10 repositories are completely fine, it costs about 3 hours (because they are all Go and Python). But from the 11th repositories, I have a feeling that I'm doing a repeated job and this is not the first time, before was Jenkins. So I'm starting to think there is no solution that would allow me to escape from this boring task?
Second example complicated for users (Developers):
- They need to remember which URL is for their application logs or how to query for their service.
- Which Grafana dashboard and query is correct for their service's metrics?
- How to check their service status, is their pod gets restarted by OOM or crashes, and do they know the reason if they don't have access to the K8s cluster? (And even with alert and notify sometimes you still need to check what is going on to make sure xD)
Third example
When I moved to a new company, each of the teams had their own DevOps and config CI/CD by their own, self-managed infrastructure -> duplicate effort.
A lot of stuff I have been reading, but only one thing made me feel interested. That is Platform Engineering, after watching its concepts, I think this is the key to solving problems I have seen in the past (In the past not only a lack of experience, I also lacked knowledge!)
What do I know/understand about Platform Engineering?
- It is based on DevOps and allows DevOps to scale, so it will not gonna replace DevOps. Platform Engineering evolves from DevOps!
- Centralized developer-facing services/tools (portal is an interface in Platform Engineering)
- Build an Internal Developer Platform (IDP) as an integrated product, not another internal tool or project. And the user is Developer. Making developers satisfied and happy is our job!
- An Internal Developer Platform must be built as a product following product management best practices.
- Cover every operational necessity of the entire lifecycle (services & applications).
- Automation & Standardization, Compliance, Governance & Security by design.
Golden path for developers
From what I've learned so far... It will have a project, CI/CD pipelines, and document templates for developers. - Less time to ask deploy to where, who is involved in that step, then the goal will be less time spent for new developer onboarding? - Everything at 1 portal, easy to remember, less effort to learn, well documented?
That is my expectation: build for developers, used by developers.
Standardization, Compliance, Governance & Security by design
Another important thing I want to talk about. The problems were that not every developer in our company is equal; some developers lack knowledge, some developers are lazy and afraid of change. For example, I will talk about myself, I'm not only lacking basic knowledge but also lazy AF, I don't want new things since I will have to relearn and always have the thought that using new tools will be a waste of time and that the tool will be sent to the graveyard soon xD. Hmm, seems like an unrelated story to this section =.=
Let's take some examples from my knowledge.
Standardization: Standard for Dockerfile is what we want to achieve, for example, with Golang, a multi build layer, use Distroless as an image to run the application and non-root users. We will not need to configure User and Group ID in the securityContext of K8S pod/containers....
spec:
template:
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: gcr.io/distroless/static:nonroot
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
And step build in Dockerfile, for example, with optimized flags:
# Best practice: specify Go version
FROM golang:1.21-alpine AS builder
# Best practice: use specific user ID
USER 65532:65532
RUN CGO_ENABLED=0 \
go build \
-ldflags="-s -w" \
-o main .
Let me explain these flags Lets AI explain these flags:
CGO_ENABLED=0: Disable CGO to create a fully static binary (no C dependencies), perfect for minimal containers-ldflags="-s -w": Strip debug information and symbol table to reduce binary size-s: Omit the symbol table and debug information-w: Omit the DWARF symbol table-o main: Output binary name
With these flags, we get a smaller, more secure binary that works perfectly with Distroless images!
Most of us know how it works, how we can secure it, but we won't implement it anyway with a ton of excuses! We can not force every Golang application but if they use our IDP, this will be the default, right? xD
After Standardization, after standardization? I think I already included some basic security stuff to reduce surface attack. Compliance and Governance? Developers using our product used what we defined!
Wait, I mentioned the word "product", yes. We should treat IDP (internal developer platform, not internal developer portal, haha) as a "product".
Treat IDP as a product
Why product, is it another internal tool?. Because developers are the main users here:
- Mindset shift: Developers are internal customers.
- We want to satisfy developers, make them happy by bringing them a good product, so they only need to use our product and focus on coding/development, not environment, deployment hell. So developers are users/customers, and you know what, customers are the king xD
- We need to understand developer pain points and then solve for them.
- What would I approach? Implement basic stuff, tight feedback loop until we get MVP (Minimum Viable Platform).
What's Next?
In the next part, I'll share about Backstage (I'm learning it, btw). It is a component to make the Internal Platform Portal, not the whole IDP
Conclusion
To be honest, I'm still learning about this anyway. I estimated that this shit will cost 6 months of my time to have a running demo with my current ability.
I guess there is some hidden, wrong stuff/theory I wrote in this article, if you know, please give me feedback to improve!
Platform Engineer will not resolve the golden path for everything, only with "right implementation!"
But what is "right implementation"? There is nothing called right implementation, it depends on a lot of things like people, company/work culture, resources... You have to find the right way, so do I xD.