System Design Practice (1)

Today I went through the basic knowledge about sys design. Actually, it includes a well-structured skill-tree. The majority of those technological terms is quite familiar for me. During my daily reading, I have already heard or used some aspects such as replication, master-slaver, load balance and federation. But I didn’t have time to review and combine my working experience with the best practice.

I have been using MongoDB for few months and I need to quick pick up the concept of SQL DB. There are some incorrect concepts regarding NoSQL design. But I still learned a lot from it

Tomorrow I will go through all topics and terms again and some methodologies about system design best practice. But we need to start with a handy and quick try. Then I can figure out my weakness and what I need to learn tomorrow.

Let’s begin and have some fun!

Step 1: Outline Use Cases and constraints

First, we need to have a clear understanding of what kind of situations and scenarios we need to handle. That’s why we need to abstract the use cases.

Previously, I worked on the Revel-Xero integration project.

Use cases

Here are some scoped use cases:

  • User register and connect with Revel and Xero account
  • Service extracts sales records from Revel account
    • Updates Daily
    • Categorizes sales orders by Product, Product Class, and Establishment
    • Analysis monthly spending by category
  • Service generate sales invoice and push to Xero account
    • Pushes Daily
    • Allow users to manually set the account mapping and pushing schedule
    • Sends notifications when approaching or fail to send
  • Service has high availability

Now we have three use cases. The real scenario is much more complex than this one. It also includes sales, purchase orders, payroll and item sync. The invoice also has multiple formats and the sales, payment and tax mapping should be flexible enough. But it has a similar workflow. Let’s focus on the current situation

Constraints and assumptions

(Question: what’s the best practice of calculating the constraints and assumptions? Need research)

State assumptions

  • Usually once the account setup and begin to work, user only come back on a monthly basis
  • There is no need for real-time update. Revel is not a strong consistency system so we need to delay 1-2 days and then sync the data
  • Revel only have around 1000 customers in AU. But our target is the entire Asia market. So let’s assume 10 thousand users
    • Usually, one user will only have 1 establishment. So 10k establishment
    • Each establishment usually will have around 1000 sales orders per day. 10 million transactions per day
    • 300 million transactions per month
    • one user has one revel account and one xero account. so 10k revel account and 10k xero account
    • 20k read request per month
    • 100:1 write to read ratio
    • write-heavy, user make transactions daily but few visit the site daily.

Calculate Usage

In case we forget: 1 English letter = 1 byte, 2^8 = 1 byte, 1 Chinese letter = 2 bytes

  • Size per transaction:
    • user_id: 8 bytes
    • created: 8 bytes
    • product: 32 bytes
    • product_class: 10 bytes
    • establishment: 12 bytes
    • amount: 5 bytes
    • Total: ~ 75 bytes
  • 20 GB of new transaction content per month, 240 GB per year
    • 720 GB of new transaction in 3 years
  • 116 transaction per second on average
  • 0.017 read request per second on average

Handy conversion guide:

  • 2.5 million seconds per month
  • 1 request per second = 2.5 million requests per month
  • 40 requests per second = 100 million requests per month
  • 400 requests per second = 1 billion requests per month

Step 2: Create a high-level design

Outline a high-level design with all important components.

(To be continued… It’s 12 am now so I will try to finish it tomorrow!)

Kubernetes (4)

Kubernetes (4)

Setup controller and service

Now, we need to create a Replication Controller for the application. Because if a standalone Pod dies, it won’t restart automatically.

# web-controller.yml
apiVersion: v1
kind: ReplicationController
metadata:
  labels:
    name: web
  name: web-controller
spec:
  replicas: 2
  template:
    metadata:
      labels:
        name: web
    spec:
      containers:
      - image: gcr.io/<YOUR-PROJECT-ID>/myapp
        name: web
        ports:
        - containerPort: 3000
          name: http-server
kubectl create -f web-controller.yml

Then we need to create a service as an interface for those pods.

This is just like the “link” command line option we used with Docker compose.

# web-service.yml
apiVersion: v1
kind: Service
metadata:
  name: web
  labels:
    name: web
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 3000
      protocol: TCP
  selector:
    name: web

kubectl create -f web-service.yml
  • The type is LoadBalancer. This is a cool feature that will make Google Cloud Platform create an external network load balancer automatically for this service!
  • We map external port 80 to the internal port 3000, so we can serve HTTP traffic without messing with Firewalls.

We can use command to check pods status.

kubectl get pods

In order to find the IP address of our app, run this command:

$ gcloud compute forwarding-rules list
NAME     REGION        IP_ADDRESS       IP_PROTOCOL TARGET
abcdef   us-central1   104.197.XXX.XXX  TCP         us-xxxx

The ideal structure

So now we have two pods for the application and one web service which contains the extrenal ip.

Now we need to setup db service for our application.

MongoDB has a concept of Replica Set

A replica set is a group of mongod instances that maintain the same data set. A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.

When a primary does not communicate with the other members of the set for more than 10 seconds, an eligible secondary will hold an election to elect itself the new primary. The first secondary to hold an election and receive a majority of the members’ votes becomes primary.

We can follow another blog to set it up

The writer create a repository to auto config the mongodb Replica Set. I forked the repository

To setup:

git clone https://github.com/thesandlord/mongo-k8s-sidecar.git
cd /mongo-k8s-sidecar/example/StatefulSet/
kubectl apply -f googlecloud_ssd.yaml
kubectl apply -f mongo-statefulset.yaml

Tips:

  • Be careful about the zone difference

Rolling update

If we only need to update the container image:

kubectl rolling-update NAME [NEW_NAME] --image=IMAGE:TAG

Web UI

It’s better to setup a UI dashbaord for your cluster. All relevant operations can be done via that dashboard

Bind static IP to service external ip

  • create a service as usual
  • Once your app is up, make note of the External IP using
kubectl get services
  • Now go to the Google Cloud Platform Console -> Networking -> External IP Addresses.
  • Find the IP you were assigned earlier. Switch it from “Ephemeral” to “Static.” You will have to give it a name and it would be good to give it a description so you know why it is static.
  • Then modify your service (or service yaml file) to point to this static address. I’m going to modify the yaml.
apiVersion: v1
kind: Service
metadata:
  name: web
  labels:
    name: web
spec:
  type: LoadBalancer
  loadBalancerIP: 104.199.187.56
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  selector:
    name: web
  • Once your yaml is modified you just need to run it; use
kubectl apply -f service.yml

Kubernetes (3)

Kubernetes (3)

Package Up The Image With Dockerfile

By following the tutorial, we need to first generate an independent image for production.

Later we can use google’s new fea ture (Build triggers) to trigger the branch update and automatically build and install staging image into the container registry.

Entire Process to Setup Application

Setup cluster via gcloud

We can setup new cluster via gcloud command or via gcloud GUI

Login to the gcloud cluster

gcloud container clusters get-credentials cluster-name --zone=asia-east1-a

Kubernetes (2)

Kubernetes (2)

Expose the Application Publicly

Structure

Master  
  |_ Deployment (Multi)  = Application (Deployed)
         |_ Node (Multi)
              |_ Pod (Multi) (Internal IP) 
                   |_ Container (Multi)
Service (External IP for public)
   |_ Pod (From different nodes) (optional)

Tips:

  • the kubernets needs at least 3 nodes to enable the auto update function for the application.

Scale Your Application

Scaling is accomplished by changing the number of replicas in a Deployment

Kubernetes also supports autoscaling of Pods

Services have an integrated load-balancer that will distribute network traffic to all Pods of an exposed Deployment. Services will monitor continuously the running Pods using endpoints, to ensure the traffic is sent only to available Pods.

Update Your Application

Rolling updates allow Deployments’ update to take place with zero downtime by incrementally updating Pods instances with new ones.

Object Management Using kubectl

There are 3 ways to manage the object

  • Imperative commands (directly via commands)
  • Imperative Management of Kubernetes Objects Using Configuration Files (use yaml files for config)
  • Declarative Management of Kubernetes Objects Using Configuration Files (using config files within the repository )

Usually we should use the third one. The basic concept is to create a config file for the project with everything prepared.

Deploy Real MEAN Stack application with Kubernetes

Link

Tips

  • COPY within dockerfile only works for copying file from outside to inside. If we want to move files within the container, we need
RUN cp A B

Kubernetes (1) & flask signal

Kubernetes

Cluster up and running

Using minikube to create a cluster

> minikube version
minikube version: v0.15.0-katacoda
> minikube start
Starting local Kubernetes cluster...

to check out kubectl

> kubectl version
Client Version: version.Info{Major:"1", Minor:"5", G
itVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07
b448ab3ed78d0520507", GitTreeState:"clean", BuildDat
e:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compi
ler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", G
itVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07
b448ab3ed78d0520507", GitTreeState:"clean", BuildDat
e:"1970-01-01T00:00:00Z", GoVersion:"go1.7.1", Compi
ler:"gc", Platform:"linux/amd64"}

check cluster details

> kubectl cluster-info
Kubernetes master is running at http://host01:8080
heapster is running at http://host01:8080/api/v1/pro
xy/namespaces/kube-system/services/heapster
kubernetes-dashboard is running at http://host01:808
0/api/v1/proxy/namespaces/kube-system/services/kuber
netes-dashboard
monitoring-grafana is running at http://host01:8080/
api/v1/proxy/namespaces/kube-system/services/monitor
ing-grafana
monitoring-influxdb is running at http://host01:8080
/api/v1/proxy/namespaces/kube-system/services/monito
ring-influxdb

We have a running master and a dashboard. The Kubernetes dashboard allows you to view your applications in a UI. During this tutorial, we’ll be focusing on the command line for deploying and exploring our application. To view the nodes in the cluster, run the kubectl get nodes command:

  > kubectl get nodes
NAME      STATUS    AGE
host01    Ready     5m

This command shows all nodes that can be used to host our applications. Now we have only one node, and we can see that it’s status is ready (it is ready to accept applications for deployment).

Deploy an App

Once you have a running Kubernetes cluster, you can deploy your containerized applications on top of it. To do so, you create a Kubernetes Deployment. The Deployment is responsible for creating and updating instances of your application.

If the Node hosting an instance goes down or is deleted, the Deployment controller replaces it. This provides a self-healing mechanism to address machine failure or maintenance.

When you create a Deployment, you’ll need to specify the container image for your application and the number of replicas that you want to run. You can change that information later by updating your Deployment;

Let’s run our first app on Kubernetes with the kubectl run command. The run command creates a new deployment. We need to provide the deployment name and app image location (include the full repository url for images hosted outside Docker hub). We want to run the app on a specific port so we add the –port parameter:

  > kubectl run kubernetes-bootcamp --image=docker.i
ort=8080lin/kubernetes-bootcamp:v1 --p
deployment "kubernetes-bootcamp" created

To list your deployments

 > kubectl get deployments
NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kubernetes-bootcamp   1         1         1            1           2m

To view the application output without exposing it externally, we’ll create a route between our terminal and the Kubernetes cluster using a proxy:

 > kubectl proxy
Starting to serve on 127.0.0.1:8001kubectl proxy

We now have a connection between our host (the online terminal) and the Kubernetes cluster. The started proxy enables direct access to the API. The app runs inside a Pod (we’ll cover the Pod concept in next module). Get the name of the Pod and store it in the POD_NAME environment variable:

export POD_NAME=$(kubectl get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}')

> echo Name of the Pod: $POD_NAME
Name of the Pod: kubernetes-bootcamp-390780338-k4k25

To see the output of our application, run a curl request:

curl http://localhost:8001/api/v1/proxy/namespaces/default/pods/$POD_NAME/

OD_NAME/ http://localhost:8001/api/v1/proxy/namespaces/default/pods/$P
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-390780338-k4k25
| v=1

Pods and Nodes

A Pod is a Kubernetes abstraction that represents a group of one or more application containers (such as Docker or rkt), and some shared resources for those containers. Those resources include:

  • Shared storage, as Volumes
  • Networking, as a unique cluster IP address
  • Information about how to run each container, such as the container image version or specific ports to use

A Pod models an application-specific “logical host” and can contain different application containers which are relatively tightly coupled. For example, a Pod might include both the container with your Node.js app as well as a different container that feeds the data to be published by the Node.js webserver. The containers in a Pod share an IP Address and port space, are always co-located and co-scheduled, and run in a shared context on the same Node.

Pods are the atomic unit on the Kubernetes platform. When we create a Deployment on Kubernetes, that Deployment creates Pods with containers inside them (as opposed to creating containers directly). Each Pod is tied to the Node where it is scheduled, and remains there until termination (according to restart policy) or deletion. In case of a Node failure, identical Pods are scheduled on other available Nodes in the cluster.

A Pod always runs on a Node. A Node is a worker machine in Kubernetes and may be either a virtual or a physical machine, depending on the cluster. Each Node is managed by the Master. A Node can have multiple pods, and the Kubernetes master automatically handles scheduling the pods across the Nodes in the cluster. The Master’s automatic scheduling takes into account the available resources on each Node.

Every Kubernetes Node runs at least:

  • Kubelet, a process responsible for communication between the Kubernetes Master and the Nodes; it manages the Pods and the containers running on a machine.
  • A container runtime (like Docker, rkt) responsible for pulling the container image from a registry, unpacking the container, and running the application. Containers should only be scheduled together in a single Pod if they are tightly coupled and need to share resources such as disk.

We’ll use the kubectl get command and look for existing Pods:

 > kubectl get pods
NAME                                  READY     STATUS    RESTARTS   AGE
kubernetes-bootcamp-390780338-m454n   1/1       Running   0          10s

Next, to view what containers are inside that Pod and what images are used to build those containers we run the describe pods command:

> kubectl describe pods

To get logs from the container, we’ll use the kubectl logs command:

 > kubectl logs $POD_NAME
Kubernetes Bootcamp App Started At: 2017-05-05T06:46:41.845Z | Running On:  k
ubernetes-bootcamp-390780338-m454n

Running On: kubernetes-bootcamp-390780338-m454n | Total Requests: 1 | App Upt
ime: 230.702 seconds | Log Time: 2017-05-05T06:50:32.547Z

We can execute commands directly on the container.

  > kubectl exec $POD_NAME env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=kubernetes-bootcamp-390780338-m454n
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=10.0.0.1
KUBERNETES_SERVICE_HOST=10.0.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT=tcp://10.0.0.1:443
KUBERNETES_PORT_443_TCP=tcp://10.0.0.1:443
KUBERNETES_PORT_443_TCP_PROTO=tcp
NPM_CONFIG_LOGLEVEL=info
NODE_VERSION=6.3.1
HOME=/root

Next let’s start a bash session in the Pod’s container:

 > kubectl exec -ti $POD_NAME bash
root@kubernetes-bootcamp-390780338-m454n:/#

We have now an open console on the container where we run our NodeJS application. The source code of the app is in the server.js file:

cat server.js

You can check that the application is up by running a curl command:

curl localhost:8080

Note: here we used localhost because we executed the command inside the NodeJS container

To close your container connection type exit.

Although Pods each have a unique IP address, those IPs are not exposed outside the cluster without a Service. Services allow your applications to receive traffic. Services can be exposed in different ways by specifying a type in the ServiceSpec:

  • ClusterIP (default) – Exposes the Service on an internal IP in the cluster. This type makes the Service only reachable from within the cluster.
  • NodePort – Exposes the Service on the same port of each selected Node in the cluster using NAT. Makes a Service accessible from outside the cluster using :. Superset of ClusterIP.
  • LoadBalancer – Creates an external load balancer in the current cloud (if supported) and assigns a fixed, external IP to the Service. Superset of NodePort.
  • ExternalName – Exposes the Service using an arbitrary name (specified by externalName in the spec) by returning a CNAME record with the name. No proxy is used. This type requires v1.7 or higher of kube-dns.

Few new concepts for flask

Signal

Flask supports signal and signals relevant plugins. We can use the signal to decouple the big application.

For example, capturing the login and logout signal can be used for recording user activities.

API URL Design Rules

  • Only use noun for the url
  • Use HTTP Request Type to control the behavior
  • Return the next link within the request

Reading Note 05/02/2017

Coding interview university

This is a famous GitHub repository. It contains the essential knowledge about how to pass a big company interview. It mainly focuses on the computer science knowledge.
There are few concepts about how to become a better developer in this article.

  • ABC: always be coding
  • Whiteboarding

Language selection

  • C
  • C ++
  • Java
  • Python

I got some basic knowledge about those above languages. But I haven’t used those for ages (I’m using objective-c, javascript and PHP for development quite ofter).

C is linked with assembling language. C ++ has been widely used in big companies and game industry, same as java. Java is also used for mobile dev. Python is a tool, which can be used for both web dev, security attack and protection and data analysis.

Computer Architecture

There is a good book Write Great Code: Volume 1: Understanding the Machine

  • Chapter 2 – Numeric Representation
  • Chapter 3 – Binary Arithmetic and Bit Operations
  • Chapter 4 – Floating-Point Representation
  • Chapter 5 – Character Representation
  • Chapter 6 – Memory Organization and Access
  • Chapter 7 – Composite Data Types and Memory Objects
  • Chapter 9 – CPU Architecture
  • Chapter 10 – Instruction Set Architecture
  • Chapter 11 – Memory Architecture and Organization

Reading Note 05/01/2017

kubernates

A framework based on docker which can be used in a production scenario.

Problem may have for using kubernates

Storage

Dockers provide data volume. However, this data volume cannot be used for production env directly. It will cause a lot of issues such as data backup, data recovery and distribution data storage.
The blog recommended a method call NAS(Network Attached Storage). The advantages of NAS are:

  • even if the application server down, we still can get the data
  • nas only contains the service about storage, no application service. So it reduces the risk of server crashes

There is another term called SAN (Storage Area Network). SAN is treated as the direct connection with the server, while NAS (e.g. NFS) is a remote storage solution.

SAN is often used as a disaster backup.


System design

Standard Rules

Monitor

It’s essential to clearly describe the status of the entire service system.

Interface standard

There should be a clear standard for the interface, such as naming, meaning, and functionality.

Error handler

Following the standard rule to define error handler and error message

Create code example with real scenario

If the developer can have an example to follow, then they will have less chance to make mistakes.


PHP OpCode

What’s OpCode

OpCode is a cache which can be used to improve the performance of PHP. It can cache the compiled result during the php life cycle. Sometimes it can improve the performance 3 times.

How does it work? Example: Zend OpCode

php life cycle with zend OpCode


Data visualization Standard

a standard


Developer career

One of the issues for developers is that they only focus on the latest technologies such as latest frameworks, Memcache, Nginx load balances and distribution sys, etc.

However, they all ignore the basic knowledge. If you don’t have a good understanding of the basic knowledge, you cannot join some large companies with the huge distribution system. If you don’t improve yourself, you could only work in small companies.

So relearn the knowledge about the computer system, operating system, C & C++, Unix, Object oriented programming.

Move to New Server, Hello World Again!

Since the previous one had an old operating sys (ubuntu 13) and old version of wordpress, I decided to move the current new server (with 1G memory finally!).

It took some time to config the domain last night. But now it works well.