HDFS Summary

HDFS architecture is actually similar to the GFS system. It’s a popular topic especially for big data engineer. It can also be used during the inverview as a part of the system design.

So, what is HDFS?

Hadoop distributed file system
First, it is a file system. But it is not just a file system. It’s a fault-tolernat file system and it’s designed to run on inexpensive hardware

Why we need it?

It’s faster(compared with reading data within one machine) and easy to use (with all the configuration)

How to use it?

You can use hdfs commands to use it e.g. setup input and output

What’s the architecture of the HDFS?

Similar to GFS, it has master node and slave node. Master node will store the meta data and control the file storage location. Slave node will save the blocks.
It also has the replication function and using hdfs client to send requests.

Summary of Array algorithms questions (medium)

Recently I tried to solve the interesting questions on leetcode. I think the thinking strategy should be recorded here.

1. max chunks to make sorted 

The question is about finding the max amount of divided arrays for an unsorted array.

The idea is:

select k elements (1, 2, … k-1) from left to right, if all elements are smaller than the rest of elements then it should be divided.

It can be converted to:

find the max amount of left array, if the amount is equal to the index (specific condition for that question), then the array can be divided

2.  when calculating the sum of an array with the combination, backtracking should be considered first

3. Dynamic programming

When to use: if the problem can be abstracted to a subtask which is depended on the previous result or results

e.g. get the best result of the problem.

the best result of n is based on the best result of n-1


Another good example about dynamic programming
The current result is based on previous result
e.g. length ++

Don’t limit the format of DP. Not all dp need something like the stock

class Solution:
    def minimumTotal(self, triangle):
        :type triangle: List[List[int]]
        :rtype: int
        row = len(triangle)
        if row == 0:
            return 0
        for r in range(1, row):
            for c in range(len(triangle[r])):
                if c == 0:
                    triangle[r][c] = triangle[r-1][c] + triangle[r][c]
                elif c == len(triangle[r]) -1:
                    triangle[r][c] = triangle[r-1][c-1] + triangle[r][c]
                    triangle[r][c] = min(triangle[r-1][c-1], triangle[r-1][c]) + triangle[r][c]
        minisum = min(triangle[row-1])
        return minisum

class Solution:
    def findLength(self, A, B):
        :type A: List[int]
        :type B: List[int]
        :rtype: int
        hashTable = [[0]* (len(B)+1) for _ in range(len(A)+1)]
        for x in range(len(A)-1, -1, -1):
            for y in range(len(B)-1 , -1, -1):
                if A[x] == B[y]:
                    hashTable[x][y] = hashTable[x+1][y+1] + 1
        maxcount = 0
        maxcount = max([max(hashTable[x]) for x in range(len(hashTable))])
        return maxcount
class Solution:
    def maxProduct(self, nums):
        :type nums: List[int]
        :rtype: int
        if len(nums) == 0:
            return 0
        maxpre = nums[0]
        minpre = nums[0]
        maxnow = maxpre
        minnow = minpre
        maxsofar = nums[0]
        nl = len(nums)
        for x in range(1,nl):
            maxnow = max(max(minpre*nums[x], maxpre* nums[x]), nums[x])
            minnow = min(min(minpre*nums[x], maxpre*nums[x]), nums[x])
            maxsofar = max(maxnow, maxsofar)
            maxpre = maxnow
            minpre = minnow
        return maxsofar

4. backtracing questions

Backtracking can be solved always as follows:

Pick a starting point.
while(Problem is not solved)
    For each path from the starting point.
        check if selected path is safe, if yes select it
        and make recursive call to rest of the problem
        before which undo the current move.
    End For
If none of the move works out, return false, NO SOLUTON.

a good example about back tracking is getting all substring of a string or all subset of a array


class Solution:
    def backTrace(self, candidates, index, curList, remain, res):
        if remain == 0:
        elif remain < 0:
            for x in range(index, len(candidates)):
                self.backTrace(candidates, x, curList + [candidates[x]], remain-candidates[x], res)

    def combinationSum(self, candidates, target):
        :type candidates: List[int]
        :type target: int
        :rtype: List[List[int]]
        self.backTrace(candidates, 0, [], target, res)
        return res

5. Cycle Detection

Cycle detection can be used within array or queue or link


Assume that the first time slow point meets the fast point at M

Assume the cycle length is C

The dis of slow point is v= a*C + m + p

The dis of fast point is 2v= b*C + m + p

2v-v = v = (b-a)C=n*C

So v is nC, v = aC + m + p, Therefore m+p is n*C

Therefore, we let fast point back to the start and both fast and slow points move at the same speed.  Assume they meet the second time:

Fast point moved p

The dis between cycle entry and slow point is m, and slow point moved p

m +p  is n*C . So the fast and slow points will meet and the entry of the cycle

Usually array value and index can be a potential cycle


index 0 1 2 3 4 5

value  1 2 3 4 2 5

if we keep getting nums[nums[i]],  it will become a cycle

6. Scope classification



     start1       end1         start2    end2

how to judge a point is within multiple scopes?

The best option is the binary tree

We can define a tree structure with python:

class Node:
   __slots__ = 'start','end','left','right'
   def __init__(self, start, end):
      self.start = start
      self.end = end
      self.left = self.right = None

    def insert(self, node):
        if node.start = self.end:
            if not self.right:
              self.right = node
              return True
            return self.right.insert(node)
        elif node.end <= self.start:
            if not self.left:
               self.left = node
               return True
            return self.left.insert(node)
            return False

7. binary search

If we want to find a point in a set of numbers and the set is sorted. Then it may be a classisic binary search problem
Be careful about the +1 / -1 and return l or r or mid

int binarySearch(int nums[], int l, int r, int x) {
        while (r >= l && r < nums.length) {
            int mid = (l + r) / 2;
            if (nums[mid] >= x)
                r = mid - 1;
                l = mid + 1;
        return l;

when using binary searching, we need to be careful about the mid selection. for example, left+right / 2 or left+right +1 /2 for odd or even.
Later I should go deep into the extrame situation of binary search

Now it’s time to go deep into the question!


Given an array of integers sorted in ascending order, find the starting and ending position of a given target value.

Your algorithm’s runtime complexity must be in the order ofO(logn).

If the target is not found in the array, return[-1, -1].

For example, Given[5, 7, 7, 8, 8, 10]and target value 8, return[3, 4].

The idea is first use binary search to find the left side of the target and then use the same method to find the right side.

To find the left side:

mid = int(l+r)/2 // this will always get the low integer e.g. int(1.6) == 1

Since it’s an ascending order, if nums[mid] < target, then target exist in (mid, r], we should have l=mid +1

If nums[mid] > target, the target exist in [l, mid) so we should have r = mid -1

if nums[mid] == target, we have 2 situation:

target start == mid
target start < mid
Therefore target start <= mid, so target start is within [l, mid], we should have r = mid

1 2 3 3 3 4 5

l r

1 2 3 3 3 4 5 mid = 3 nums[3] == 3

l r

1 2 3 3 3 4 5 mid = 1 nums[1] == 2 <3

l r

1 2 3 3 3 4 5 mid = 1 nums[1] == 2 <3

l r

1 2 3 3 3 4 5 mid = 2 nums[2] == 3

l=r and nums[l] is the target left side

Now we have target left side == l. The next step is to find the right side

The right side is within [l, len(nums)-1]

l = l

r = len(nums)-1

mid = int(l+r)/2

if nums[mid] > target, then the target end should within the [l, mid] therefore r = mid -1

if nums[mid] == target, then 2 situation

target end is on the right side of mid, then l = mid +1

target end is mid, l = mid

Therefore target end >= mid, target end is within [mid r], then l = mid

3 3 3 4 5 mid = 2 nums[2] = 3

l r

3 3 3 4 5 mid = 3 nums[3] > 3

l r

3 3 3 4 5 mid = 2 nums[2] = 3 l=

l r
Then we have the problem that at some stage we alway have l = min

So l pointer stops moving.

To avoid this we can let mid = l+r+1/2, So everytime we get the highest integer as mid

3 3 3 4 5 mid = 2 nums[2] = 3

l r

3 3 3 4 5 mid = 3 nums[3] > 3

l r
3 3 3 4 5 mid = 3 nums[3] =4 > 3

l r
3 3 3 4 5


8. Linear scan instead of brute force

Some time we will meet the brute force situation.
e.g. 3sum smaller  or the validntriangle number 

If we use brute force solution to solve the problem, we will meet the time exceed issue

Obviously, for those questions we can sort the array first and then try to get the extrame limit. We can use the binary search to get that limit or use the linear scan to get the result

9. 3D dynamic programming

Similar to dynamic programming
We go through the matrix once but for each element, we need the previous value for the calculation. Therefore, we need a 3D array to hold previous values

class Solution:
    def longestLine(self, M):
        if len(M) == 0:
            return 0
        rows, columns = len(M), len(M[0])
        maxcount = 0
        dp = [ [[0]*4 for c in range(columns)] for r in range(rows)]
        for r in range(rows):
            for c in range(columns):
                if M[r][c] == 1:
                    dp[r][c][0] = dp[r][c-1][0] + 1 if c > 0 else 1
                    dp[r][c][1] = dp[r-1][c][1] + 1 if r > 0 else 1
                    dp[r][c][2] = dp[r-1][c+1][2] +1 if r > 0 and c < columns -1 else 1
                    dp[r][c][3] = dp[r-1][c-1][3] +1 if r >0 and c>0 else 1
                    maxcount = max(maxcount,max(dp[r][c]))
        return maxcount

10. Matrix direction switch

Assume the point in a matrix move in clockwise, we can easily check if we need to change the direction by the limit of border and direction=direction+1%4

class Solution:
    def spiralOrder(self, matrix):
        :type matrix: List[List[int]]
        :rtype: List[int]
        if rnum == 0:
            return []
        if cnum == 0:
            return []

        seen= [[False] * cnum for x in range(rnum)]
        dr= [0, 1, 0, -1]
        dc= [1, 0, -1, 0]
        r=c=di = 0
        for _ in range(rnum * cnum):
            seen[r][c] = True
            rnext,cnext = r + dr[di],c+dc[di]
            if rnext >= 0 and rnext<rnum and cnext >= 0 and cnext < cnum and not seen[rnext][cnext]:
                r,c = rnext,cnext
                di = (di +1) %4
                r,c = r + dr[di],c+dc[di]
        return ans
another good example based on the privous solution:
class Solution:
    def generateMatrix(self, n):
        :type n: int
        :rtype: List[List[int]]
        res= [[0]*n for x in range(n)]
        count = 1
        dr= [0,1,0,-1]
        dc= [1,0,-1,0]
        for _ in range(n*n):
            if 0<=rn<n and 0<=cn<n and res[rn][cn] == 0:
                r,c = rn,cn
                di = (di+1)%4
        return res

11. culumative sum

A simple idea of culumative sum is:
Assume we have an array contains n integer, the sum of subarray between i and j is equal to the sum from 0 to i minus the sum from 0 to j , and: i<j

e.g. subarray sum equals k

class Solution:
        def subarraySum(self, nums, k):
            sumhash = {}
            sumhash[0] = 1 // need to add 1 to 0 because if sum == k (sum - k ==0), we need to count 1 for each sum == k
            sumN = 0
            count = 0
            for x in range(len(nums)):
                sumN=sumN + nums[x]
                if (sumN-k) in sumhash:
                    count += sumhash[(sumN-k)]
                if sumN in sumhash:
                    sumhash[sumN] += 1
                    sumhash[sumN] = 1
            return count

12. 2Sum, 3Sum, 4Sum, KSum

The main idea of those questions is to downgrade the Nsum to N-1 sum. We also try to reduce the calculation with different limitation

e.g. 3Sum

class Solution:
    def threeSum(self, nums):
        :type nums: List[int]
        :rtype: List[List[int]]
        ans = []
        for x in range(len(nums)-2):
            if nums[x] > 0:
            if x > 0 and nums[x] == nums[x-1]:
            y = x+1
            z = len(nums) -1
            while y < z:
                cursum = nums[x] + nums[y] + nums[z]
                if cursum == 0:
                    while y<z and nums[y] == nums[y+1]:
                        y+= 1
                    while y<z and nums[z] == nums[z-1]:
                elif cursum <0:
                    y += 1
                    z -= 1
        return ans


class Solution:
    def threeSum(self, nums, target, ans,index, previous, lnum):
        for i in range(index, lnum-2):
            if (3 * nums[i]) > target:
            if (3*nums[lnum-1])<target:
            if i > index and nums[i] == nums[i-1]:
            l = i+1
            r = lnum -1
            while l < r:
                cursum = nums[i] + nums[l] + nums[r]
                if cursum == target:
                    ans.append([previous, nums[i], nums[l], nums[r]])
                    while l<r and nums[l] == nums[l+1]:
                        l = l+1
                    while l < r and nums[r] == nums[r-1]:
                        r =r-1
                    l = l+1
                    r= r-1
                elif cursum < target:
                    l +=1

    def fourSum(self, nums, target):
        :type nums: List[int]
        :type target: int
        :rtype: List[List[int]]
        ans = []
        lnum = len(nums)
        if lnum <4:
            return ans
        if nums[0]*4 > target:
            return ans
        if nums[lnum-1] * 4 < target:
            return ans
        for i in range(lnum-3):
            if (nums[i] + 3*nums[lnum-1]) < target:
            if (nums[i]+ 3*nums[i+1]) > target:
            if i>0 and nums[i] == nums[i-1]:
            if nums[i] * 4 == target:
                if i+3 < lnum and nums[i] == nums[i+3]:
            self.threeSum(nums, target - nums[i], ans, i+1, nums[i], lnum)

        return ans

12. Merge interval

Given a collection of intervals, merge all overlapping intervals.

For example,
Given [1,3],[2,6],[8,10],[15,18],
return [1,6],[8,10],[15,18].

The idea is to first sort the list with start
and then check for each element if the next one’s start is in the previous range

* python sorted can use lambda function
* we don’t need to use binary tree for this one, a simple sort can solve the problem

class Solution:
def merge(self, intervals):
:type intervals: List[Interval]
:rtype: List[Interval]
if len(intervals) == 0:
return res
intervals = sorted(intervals, key=lambda interval: interval.start)
start =intervals[0].start
end =intervals[0].end
for i, interval in enumerate(intervals):
if interval.start <= end:
start = min(start, interval.start)
end = max(end, interval.end)
res.append(Interval(start, end))
start = interval.start
end = interval.end
res.append(Interval(start, end))
return res

### 13. Majority Element
For those who aren’t familiar with Boyer-Moore Majority Vote algorithm,
I found a great article (http://goo.gl/64Nams) that helps me to understand this fantastic algorithm!!
Please check it out!

The essential concepts is you keep a counter for the majority number X. If you find a number Y that is not X, the current counter should deduce 1. The reason is that if there is 5 X and 4 Y, there would be one (5-4) more X than Y. This could be explained as “4 X being paired out by 4 Y”.

And since the requirement is finding the majority for more than ceiling of [n/3], the answer would be less than or equal to two numbers.
So we can modify the algorithm to maintain two counters for two majorities.

class Solution:
    def majorityElement(self, nums):
        :type nums: List[int]
        :rtype: List[int]
        count1 =0
        count2 =0
        cand1 =0
        for x, n in enumerate(nums):
            if n == cand1:
                count1 +=1
            elif n == cand2:
                count2 +=2
            elif count1 == 0:
                cand1 = n
                count1 +=1
            elif count2 == 0:
                cand2 =n
                count2 +=1
                count1 -= 1
                count2 -= 1
        return [n for n in (cand1, cand2) if nums.count(n) > len(nums)/3]

14. Permutation

Usually the idea of getting permutation is to:
– limit the condition
– use backtrace to get all possible results

There is one question to get next permutation

The idea is to assume the next permutation should be follow the order. We can use single-pass method to get the result

class Solution:
    def swap(self, nums, i, j):
        tem = nums[i]
        nums[i] = nums[j]
        nums[j] = tem
    def reverse(self, nums, i, j):
        while i <= j:
            self.swap(nums, i, j)
    def nextPermutation(self, nums):
        :type nums: List[int]
        :rtype: void Do not return anything, modify nums in-place instead.
        i = len(nums) - 2
        while i >= 0 and nums[i+1] <= nums[i]:
            i -=1
        if i >= 0:
            j = len(nums) -1
            while j >=0 and nums[j] <= nums[i]:
                j -=1
            self.swap(nums,i, j)
        self.reverse(nums, i+1, len(nums) - 1)

15. Palindrome

we need to consider different situation
e.g. if the string is Palindrome
– all char is even
– one char is odd, all others are even

Then we can convert this question into the permutation issue and use back trace to get the result

class Solution:
    def backTracing(self, nums, ans, pos, res, oddc):
        # print(ans)
        if len(ans) == len(nums):
            ans += [oddc] + ans[::-1]
            res.append(''.join(str(ans[y]) for y in range(len(ans))))
            dup = []
            for x in range(len(nums)):
                # print("x:",x,"nums[x]:",nums[x],"dup:",dup, "pos:",pos, "ans:",ans)
                if not x in pos and not nums[x] in dup:
                    self.backTracing(nums, ans + [nums[x]], pos+[x], res, oddc)

    def generatePalindromes(self, s):
        :type s: str
        :rtype: List[str]
        if len(s) == 0:
            return []

        ul = dict()
        for x in s:
            if not x in ul:
                ul.setdefault(x, 1)
                ul[x] += 1

        oddc = ''
        oddCount = 0
        for x in ul.keys():
            if ul[x] %2 != 0:
                oddc = x
                oddCount += 1
                ul[x] = int((ul[x]-1)/2)
                ul[x] = int(ul[x]/2)
        if oddCount >1:
            return []
        nums = []
        for x in ul.keys():
            while ul[x] >0:
                nums += [x]
                ul[x] -=1
        res = []
        # print(nums)
        self.backTracing(nums, [], [], res, oddc)
        return res  

16. preorder and inorder of tree

The basic idea is here:
Say we have 2 arrays, PRE and IN.
Preorder traversing implies that PRE[0] is the root node.
Then we can find this PRE[0] in IN, say it’s IN[5].
Now we know that IN[5] is root, so we know that IN[0] – IN[4] is on the left side, IN[6] to the end is on the right side.
Recursively doing this on subarrays, we can build a tree out of it 🙂

# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution:
    def build(self, preorder, postart, poend, inorder, iostart, ioend):
        if postart > poend or iostart > ioend:
            return None
        root = TreeNode(preorder[postart])
        inorderBreakMark = iostart
        for x in range(iostart, ioend+1):
            if inorder[x] == preorder[postart]:
                inorderBreakMark = x
        leftL = inorderBreakMark-iostart
        root.left = self.build(preorder, postart+1, postart+leftL, inorder, iostart, inorderBreakMark-1)
        root.right = self.build(preorder, postart+leftL + 1, poend, inorder, inorderBreakMark+1, ioend)
        return root

    def buildTree(self, preorder, inorder):
        :type preorder: List[int]
        :type inorder: List[int]
        :rtype: TreeNode
        root = self.build(preorder, 0, len(preorder)-1, inorder, 0 ,len(inorder)-1  )
        return root

17 Greedy

Greedy is a special case of DP
Greedy: for each step we only consider the current best solution, because we belive if we choose the current best solution for each step, the final result is the best solution for all

DP: We calculate all solution

Greedy Example linke

class Solution:
    def canJump(self, nums):
        :type nums: List[int]
        :rtype: bool
        lastpos = len(nums)-1
        for x in range(len(nums)-2, -1, -1):
            if x + nums[x] >= lastpos:
                lastpos = x
        return lastpos == 0

a good example to tell the difference between greedy and dp is link

18 quick sort

this is a good example about quick sort
We can use the simple solution: two points from left to right and swap if it is min

Another good solution is:
The idea is to sweep all 0s to the left and all 2s to the right, then all 1s are left in the middle.

It is hard to define what is a “one-pass” solution but this algorithm is bounded by O(2n), meaning that at most each element will be seen and operated twice (in the case of all 0s). You may be able to write an algorithm which goes through the list only once, but each step requires multiple operations, leading the total operations larger than O(2n).

class Solution:
    def swap(self, nums, start, end):
        temp = nums[start]
        nums[start] = nums[end]
        nums[end] = temp

    def sortColors(self, nums):
        :type nums: List[int]
        :rtype: void Do not return anything, modify nums in-place instead.
        low = 0
        high = len(nums) - 1
        i = low
        while i <= high:
            if nums[i] == 0:
                self.swap(nums, i, low)
                low += 1
                i += 1
            elif nums[i] == 2:
                self.swap(nums, i, high)
                high -= 1
                i += 1

19 Two points

Follow up for “Remove Duplicates”:
What if duplicates are allowed at most twice?

For example,
Given sorted array nums = [1,1,1,2,2,3],

Your function should return length = 5, with the first five elements of nums being 1, 1, 2, 2 and 3. It doesn’t matter what you leave beyond the new length.

class Solution:
    def removeDuplicates(self, nums):
        :type nums: List[int]
        :rtype: int
        count = 0
        for num in nums:
            if count < 2 or nums[count-2] < num:
                nums[count] = num
                count += 1
        return count

Summary of Array algorithms questions (easy)

Recently I spent few hours on going through algorithm questions in leetcode. I haven’t touched the algorithm question for 2 years but I think it’s a good time to review those questions in a better way.

From my experience,  the thinking strategy of solving similar problems is limited. Those strategies cannot help you solve all problems relevant to Array but it may be a good beginning of analyzing those algorithm tricks.

The core is the math

  1. Hash Table
  2. shifting the array from beginning to the end or with odd & even order
  3. use *-1 as a mark
  4. two indexes, two-way or one-way moving,  or 1 , n-1
  5. convert the index into other marks e.g. content vice versa
  6. if it’s matrix, sum the columns and rows
  7. switch i and i+1  bubble sort
  8. use mid number  quicksort

GC for HotSpot Java, V8 Nodejs, PHP and Python


  • Java is using Generation strategy for GC.
  • The memory will be divided into 3 sections: Young generation, old generation, and metaSpace
  • The young generation contains: Eden, from, to with space 8:1:1

V8 Nodejs

  • v8 is using generation GC as well
  • the difference is young generation is small
  • young generation only contains 2 spaces: from and to
  • there is no metaspace


  • session (live time) and the reference count


  • reference count
  • 3 generations

Java Container Class

Collection & Map

Collection – A collection represents a group of objects, known as its elements.Some collections allow duplicate elements and others do not. Some are ordered and others unordered.

Map – An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.

3 ways to loop map:

  • Set keySet()
  • Collection values()
  • Set< Map.Entry< K, V>> entrySet()

List Set & Queue

List Set & Queue inherit Collection

  • List: an ordered collection, the element can be duplicated
  • Set: elements can not be duplicated


The list provides a special iterator called ListIterator.

ListIterator<E> listIterator();

ListIterator<E> listIterator(int index);

public interface ListIterator<E> extends Iterator<E> {
    // Query Operations

    boolean hasNext();

    E next();

    boolean hasPrevious();

    E previous();

    int previousIndex();

    void remove();

    void set(E e);

    void add(E e);


  • ArrayList implements List with array.
  • It allows inserting null.
  • size, isEmpty, get, set, iterator, add are all O(1), if add N, it will be O(N)
  • ArrayList is not synchronized for threads
  • By default, the first time we insert the element the size of it is 10.
  • if exceed, the size will increase 50%

source code:

transient Object[] elementData;

private int size;

All elements are saved within the object array and size is used for length control.

source code of add:

public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;

private void ensureCapacityInternal(int minCapacity) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);


private void ensureExplicitCapacity(int minCapacity) {

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)

private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);

The add operation will check the length. If not match then it will call grow (Arrays.copyOf)

source code of remove:

public E remove(int index) {

    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;

It will use System.arraycopy to move all elements behind the target index and remove the last element.

That’s why the cost of adding and removing is expensive 🙂

It alos has a function trimToSize() which can be used for compressing the size of array

public void trimToSize() { 
    if (size < elementData.length) { 
        elementData = Arrays.copyOf(elementData, size); 

Besides, it implements RandomAccess. RandomAccess also includes: ArrayList, AttributeList, CopyOnWriteArrayList,RoleList, RoleUnresolvedList, Stack, Vector

There is one comment within the RandomAccess:

for (int i=0, n=list.size(); i < n; i++) {     
runs faster than this loop:
for (Iterator i=list.iterator(); i.hasNext(); ) { 

Compared with Vector

  • almost the same. Only different is Vector is synchronized so it’s more expensive
  • Vector grows 2 times while ArrayList grows 1.5 times
  • Vector also contains Stack


LinkedList is also an ordered container class. LinkedList implements List with Link

ArrayList V.S LinkedList

  • Get: ArrayList can use index to get. LinkedList have to find from the beginning
  • Add & Remove: LinkedList can easily add and remove by breaking the link. ArrayList have to copy all data and move position
  • Grow: ArrayList has to apply for a larger array and move the data. LinkedList can dynamically create new link node

source code:

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev;

A two way link

transient int size = 0;

transient Node<E> first;

transient Node<E> last;

Each linkedList will have first and last point

Add and Delete:

private void linkFirst(E e) {
    final Node<E> f = first;
    final Node<E> newNode = new Node<>(null, e, f);
    first = newNode;
    if (f == null)
        last = newNode;
        f.prev = newNode;

void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;
        l.next = newNode;

void linkBefore(E e, Node<E> succ) {
    // assert succ != null;
    final Node<E> pred = succ.prev;
    final Node<E> newNode = new Node<>(pred, e, succ);
    succ.prev = newNode;
    if (pred == null)
        first = newNode;
        pred.next = newNode;

private E unlinkFirst(Node<E> f) {
    // assert f == first && f != null;
    final E element = f.item;
    final Node<E> next = f.next;
    f.item = null;
    f.next = null; // help GC
    first = next;
    if (next == null)
        last = null;
        next.prev = null;
    return element;

private E unlinkLast(Node<E> l) {
    // assert l == last && l != null;
    final E element = l.item;
    final Node<E> prev = l.prev;
    l.item = null;
    l.prev = null; // help GC
    last = prev;
    if (prev == null)
        first = null;
        prev.next = null;
    return element;

E unlink(Node<E> x) {
    // assert x != null;
    final E element = x.item;
    final Node<E> next = x.next;
    final Node<E> prev = x.prev;

    if (prev == null) {
        first = next;
    } else {
        prev.next = next;
        x.prev = null;

    if (next == null) {
        last = prev;
    } else {
        next.prev = prev;
        x.next = null;

    x.item = null;
    return element;

LinkedList also implements the Deque interface which inheirts from Queue. So it also supports pop, push and peek


Set doesn’t implement any function like colleciton. Set is just a concept: elements cannot be duplicated

e.g. HashSet, LinkedHashSet, TreeSet


  • HashSet implements Set and it is based on HashMap.
  • Disorderd
  • Allow null element

source code:

private transient HashMap<E,Object> map;

private static final Object PRESENT = new Object();

So all add, reomve etc are the operaion of HashMap. The Iterator is just the keySet of HashMap

public Iterator<E> iterator() {
    return map.keySet().iterator();

public boolean contains(Object o) {
    return map.containsKey(o);

public boolean add(E e) {
    return map.put(e, PRESENT)==null;

public void clear() {


LinkedHashSet can use Link to keep the order of set elements.
LinkedHashSet is based on LinkedHashMap


The order of TreeSet is ( e1.compareTo(e2) == 0 ) TreeSet is based on TreeMap and the element must implement Comparable interface ( e1.compareTo(e2) == 0 )



It’s stored by hash table.

transient Node<K,V>[] table;

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

table is used for saving element. If any conflicts, save it to the next link table.

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
    return null;
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                p = e;
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            return oldValue;
    if (++size > threshold)
    return null;


It’s similar to hashmap. The difference is it also have the following link tables:

transient LinkedHashMap.Entry<K,V> head;

transient LinkedHashMap.Entry<K,V> tail;


it’s based on red-black tree.


if only the weakHashMap has the reference of an element, it will automatically remove the element


Recently I began to use java again. It’s better to review the knowledge of java and make some notes for future quick review.

multiple inheritance

Java implements MI in a different. It allows you to implement multiple interfaces. But you can only inherit one implementation.

C++ supports multiple inheritances.

class oriented


class loader

java source (.java) will first be converted into Java bytecode (.class) by Java compiler (.javac). Then the .class file will be put into class loader

Classloader will load the class into JVM.

  1. Loading Strategy

JVM uses parent loading model.
If a class loader receives a loading request, it will first assign this request to parent class loader. All loaders follow the same rule. Only when the parent loader doesn’t contain any class, then the child loader will try to load it by itself.

Why we need it?
e.g. if someone changed the java.lang.String with some hacking code, without parent loader then JVN will use that hacking String class as the system String class.

BootStrapClassLoader is the top level classloader.

Tips for myself:

BootStrapClassLoader。它是最顶层的类加载器,是由C++编写而成, 已经内嵌到JVM中了。在JVM启动时会初始化该ClassLoader,它主要用来读取Java的核心类库JRE/lib/rt.jar中所有的class文件,这个jar文件中包含了java规范定义的所有接口及实现。
ExtensionClassLoader。它是用来读取Java的一些扩展类库,如读取JRE/lib/ext/*.jar中的包等(这里要注意,有些版本的是没有ext这个目录的)。 AppClassLoader。它是用来读取CLASSPATH下指定的所有jar包或目录的类文件,一般情况下这个就是程序中默认的类加载器。
CustomClassLoader。它是用户自定义编写的,它用来读取指定类文件 。基于自定义的ClassLoader可用于加载非Classpath中(如从网络上下载的jar或二进制)的jar及目录、还可以在加载前对class文件优一些动作,如解密、编码等。

很多资料和文章里说,ExtClassLoader的父类加载器是BootStrapClassLoader,其实这里省掉了一句话,容易造成很多新手(比如我)的迷惑。严格来说,ExtClassLoader的父类加载器是null,只不过在默认的ClassLoader 的 loadClass 方法中,当parent为null时,是交给BootStrapClassLoader来处理的,而且ExtClassLoader 没有重写默认的loadClass方法,所以,ExtClassLoader也会调用BootStrapLoader类加载器来加载,这就导致“BootStrapClassLoader具备了ExtClassLoader父类加载器的功能”。

execution engine

execute the java bytecode

Runtime Data Areas

It’s the memory section during runtime.
runtime data areas

  • Method Area

It stores structured info. E.g. constant pool, static variables constructors.

  • Heap

It stores all java instances and objects. It’s the main area for GC.

Tips: Method area and Heap are shared by all processes

  • Stack

Once a thread has been setup, JVM will build a stack for this thread.

Foreach Stack, it contains multiple stack frame. Each function will build one java stack frame. It will save temp variables, operation stack and return value. The start and end of one function are mapping to the stack entering and stack exit.

  • PC Register It’s used to saving the memory address of the current process. It can make sure that after the thread switching the process still can recover back to the previous status.
  • Native Method Stack It’s similar to stack but only used for JVM native methods

Memory assignment

JVM will first assign a huge space and all new operation will reassign and release within this space. It reduces the time of calling system and is similar to the memory pool. Secondly, it introduces the concept of GC.

  • Static memory

If the memory size can be defined during compiling, it’s a static memory. It means the memory is fixed. e.g. int

  • Dynamic memory Only knows the memory size when executing it. e.g. object memory space

Stack, PC register, and stack frame are the private process. Once the process died, stack frame will be closed and memory will be released.

But Heap and method areas are different. Only during execution can we know the size of objects. So the memory of this section should be dynamic – GC

In a nutshell, the memory size of the stack is fixed so there is no memory collection problem. But the memory size of the heap is uncertain so it will have a problem of GC.

GC strategy

  1. Mark-sweep

Mark all collected object and then collect. It’s the basic method.

Cons: inefficient, after cleaning there will be a lot of memory fragmentation

  1. Copying

Divide the space into two equal spaces. Only use one of those spaces. During GC, copying all active object to another space. Pros: no memory fragmentation Cons: double memory space

  1. Mark-Compact

stage1: mark all referenced objects
stage2: go through the entire heap, clean all marked objects and compress all alive objects into a block with orders

Pros: no memory fragmentation and double memory space issue

  1. Generational Collection

This is the currently used strategy for java GC.

It divides the objects into different generations by the life cycle: Young Generation, Old Generation, and Permanent Generation.

Permanent Generation will save the class info. Young Generation and Old Generation are closely related to GC.

年轻代:是所有新对象产生的地方。年轻代被分为3个部分——Enden区和两个Survivor区(From和to)当Eden区被对象填满时,就会执行Minor GC。并把所有存活下来的对象转移到其中一个survivor区(假设为from区)。Minor GC同样会检查存活下来的对象,并把它们转移到另一个survivor区(假设为to区)。这样在一段时间内,总会有一个空的survivor区。经过多次GC周期后,仍然存活下来的对象会被转移到年老代内存空间。通常这是在年轻代有资格提升到年老代前通过设定年龄阈值来完成的。需要注意,Survivor的两个区是对称的,没先后关系,from和to是相对的。

年老代:在年轻代中经历了N次回收后仍然没有被清除的对象,就会被放到年老代中,可以说他们都是久经沙场而不亡的一代,都是生命周期较长的对象。对于年老代和永久代,就不能再采用像年轻代中那样搬移腾挪的回收算法,因为那些对于这些回收战场上的老兵来说是小儿科。通常会在老年代内存被占满时将会触发Full GC,回收整个堆内存。


GC generation

Notes from AWS security group last night

There are some concepts I learned from it. I will list it here as a note.

ASD TOP4 rules

  1. Targeted cyber intrusions remain the biggest threat to government ICT systems. Since opening in early 2010, the Cyber Security Operations Centre (CSOC) has detected and responded to thousands of these intrusions.
  2. You should never assume that your information is of little or no value. Adversaries are not just looking for classified information. A lot of activity observed by the CSOC has an economic focus, looking for information about Australia’s business dealings, its intellectual property, its scientific data and the government’s intentions.
  3. The threat is real, but there are things every organisation can do to significantly reduce the risk of a cyber intrusion. In 2009, based on our analysis of these intrusions, the Australian Signals Directorate produced Strategies to Mitigate Targeted Cyber Intrusions – a document that lists a variety of ways to protect an organisation’s ICT systems. At least 85% of the intrusions that ASD responded to in 2011 involved adversaries using unsophisticated techniques that would have been mitigated by implementing the top four mitigation strategies as a package.
  4. The top four mitigations are: application whitelisting; patching applications and operating systems and using the latest version; and minimising administrative privileges. This document is designed to help senior managers in organisations understand the effectiveness of implementing these strategies.

Now I begin to worry about our system…haha let’s backup the data now!


IDS (Intrusion Detection Systems)/ IPS ( Intrusion Prevention System)

The concept of IDS is trying to detect all possible trace of attack while IPS means deny the attack request(one example: as a firewall )

Security Trigger

One thing they mentioned last night is the trigger of getting the attack. Usually, we know which processes will be executed on the server. If the server has some other processes been setup, then we need to trigger the system to execute protecting operations.


It contains many best practices for security

Reading Notes – TCP tuning

TCP tuning

What is delay ack

TCP is a reliable protocol because of sending ack after receiving the request.

TCP delayed acknowledgment is a technique used by some implementations of the Transmission Control Protocol in an effort to improve network performance. In essence, several ACK responses may be combined together into a single response, reducing protocol overhead.

In one word, if delay ack is on, the ack of one request can be sent within another request together. Or if there are two ack packages, it will also be sent. Or if it’s over 40ms, the single ack will be sent.


The logic of Nagle is like:

if there is new data to send
  if the window size >= MSS and available data is >= MSS
        send complete MSS segment now
    if there is unconfirmed data still in the pipe
          enqueue data in the buffer until an acknowledge is received
          send data immediately
    end if
  end if
end if

If the data size is larger than MSS, send it directly. Otherwise, see if there are any requests haven’t received ack. if not, we can wait until we receive the previous ack.

If the client is using Nagle and the Server side is using delay ack

Sometimes, we have to wait until the request reaches the delay ack timeout. example

Some HTTP requests will divide the header and content into two packages which can trigger this issue easier.


Found online

struct curl_slist *list = NULL;
list = curl_slist_append(list, "Expect:");  

if (CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_URL, oss.str().c_str())) &&
        CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_TIMEOUT_MS, timeout)) &&
        CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &write_callback)) &&
        CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L)) &&
        CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_POST, 1L)) &&
        CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, pooh.sizeleft)) &&
        CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_READFUNCTION, read_callback)) &&
        CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_READDATA, &pooh)) &&                
        CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L)) && //1000 ms curl bug
        CURLE_OK == (code = curl_easy_setopt(curl, CURLOPT_HTTPHEADER, list))                
        ) {

        //这里如果是小包就不开delay ack,实际不科学
        if (request.size() < 1024) {
                code = curl_easy_setopt(curl, CURLOPT_TCP_NODELAY, 1L);
        } else {
                code = curl_easy_setopt(curl, CURLOPT_TCP_NODELAY, 0L);
        if(CURLE_OK == code) {
                code = curl_easy_perform(curl);

Few issues for Java LTI integration

Recently I began to work on an LTI integration with Java. I haven’t used Java for almost 1 year so it takes some time to pick it up. I’m happy that Java doesn’t change so quickly like javascript 😛

Welcome back to the world of Java with Hibernate and Spring MVC and Maven

Anyway, I will record all issues I met during the implementation. Later I will write down sth about LTI.

Fix for java.lang.IllegalArgumentException: No converter found for return value of type

I met this issue after adding a new maven dependency. I did some quick research. A lot of people saying that you can solve it by adding the JSON dependency. I tried and it’s not working.

Then I figured out I should add the getter for that class and problem solved.

Example of generating key and secret

I found this code online. It’s good to start with it. But this guy is using encodeBase64String within Base64 and I cannot find it within that class. So it should be replaced with sth like new String(the byte result from Base64)

import javax.crypto.Cipher;
import javax.crypto.spec.IvParameterSpec;
import javax.crypto.spec.SecretKeySpec;

import org.apache.commons.codec.binary.Base64;

public class Encryptor {
    public static String encrypt(String key, String initVector, String value) {
        try {
            IvParameterSpec iv = new IvParameterSpec(initVector.getBytes("UTF-8"));
            SecretKeySpec skeySpec = new SecretKeySpec(key.getBytes("UTF-8"), "AES");

            Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5PADDING");
            cipher.init(Cipher.ENCRYPT_MODE, skeySpec, iv);

            byte[] encrypted = cipher.doFinal(value.getBytes());
            System.out.println("encrypted string: "
                    + Base64.encodeBase64String(encrypted));

            return Base64.encodeBase64String(encrypted);
        } catch (Exception ex) {

        return null;

    public static String decrypt(String key, String initVector, String encrypted) {
        try {
            IvParameterSpec iv = new IvParameterSpec(initVector.getBytes("UTF-8"));
            SecretKeySpec skeySpec = new SecretKeySpec(key.getBytes("UTF-8"), "AES");

            Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5PADDING");
            cipher.init(Cipher.DECRYPT_MODE, skeySpec, iv);

            byte[] original = cipher.doFinal(Base64.decodeBase64(encrypted));

            return new String(original);
        } catch (Exception ex) {

        return null;

    public static void main(String[] args) {
        String key = "Bar12345Bar12345"; // 128 bit key
        String initVector = "RandomInitVector"; // 16 bytes IV

        System.out.println(decrypt(key, initVector,
                encrypt(key, initVector, "Hello World")));

Build an OAuth Provider

There are many OAuth client libs on the internet. But it’s difficult to find some libs as an OAuth Provider.

The first step of it is to generate a good customer key and secret. I found the following example last night.

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.security.AlgorithmParameters;
import java.security.GeneralSecurityException;
import java.security.NoSuchAlgorithmException;
import java.security.spec.InvalidKeySpecException;
import java.util.Base64;
import javax.crypto.Cipher;
import javax.crypto.SecretKey;
import javax.crypto.SecretKeyFactory;
import javax.crypto.spec.IvParameterSpec;
import javax.crypto.spec.PBEKeySpec;
import javax.crypto.spec.SecretKeySpec;

public class ProtectedConfigFile {

    public static void main(String[] args) throws Exception {
        String password = System.getProperty("password");
        if (password == null) {
            throw new IllegalArgumentException("Run with -Dpassword=<password>");

        // The salt (probably) can be stored along with the encrypted data
        byte[] salt = new String("12345678").getBytes();

        // Decreasing this speeds down startup time and can be useful during testing, but it also makes it easier for brute force attackers
        int iterationCount = 40000;
        // Other values give me java.security.InvalidKeyException: Illegal key size or default parameters
        int keyLength = 128;
        SecretKeySpec key = createSecretKey(System.getProperty("password").toCharArray(),
                salt, iterationCount, keyLength);

        String originalPassword = "secret";
        System.out.println("Original password: " + originalPassword);
        String encryptedPassword = encrypt(originalPassword, key);
        System.out.println("Encrypted password: " + encryptedPassword);
        String decryptedPassword = decrypt(encryptedPassword, key);
        System.out.println("Decrypted password: " + decryptedPassword);

    private static SecretKeySpec createSecretKey(char[] password, byte[] salt, int iterationCount, int keyLength) throws NoSuchAlgorithmException, InvalidKeySpecException {
        SecretKeyFactory keyFactory = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA512");
        PBEKeySpec keySpec = new PBEKeySpec(password, salt, iterationCount, keyLength);
        SecretKey keyTmp = keyFactory.generateSecret(keySpec);
        return new SecretKeySpec(keyTmp.getEncoded(), "AES");

    private static String encrypt(String property, SecretKeySpec key) throws GeneralSecurityException, UnsupportedEncodingException {
        Cipher pbeCipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
        pbeCipher.init(Cipher.ENCRYPT_MODE, key);
        AlgorithmParameters parameters = pbeCipher.getParameters();
        IvParameterSpec ivParameterSpec = parameters.getParameterSpec(IvParameterSpec.class);
        byte[] cryptoText = pbeCipher.doFinal(property.getBytes("UTF-8"));
        byte[] iv = ivParameterSpec.getIV();
        return base64Encode(iv) + ":" + base64Encode(cryptoText);

    private static String base64Encode(byte[] bytes) {
        return Base64.getEncoder().encodeToString(bytes);

    private static String decrypt(String string, SecretKeySpec key) throws GeneralSecurityException, IOException {
        String iv = string.split(":")[0];
        String property = string.split(":")[1];
        Cipher pbeCipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
        pbeCipher.init(Cipher.DECRYPT_MODE, key, new IvParameterSpec(base64Decode(iv)));
        return new String(pbeCipher.doFinal(base64Decode(property)), "UTF-8");

    private static byte[] base64Decode(String property) throws IOException {
        return Base64.getDecoder().decode(property);

I properly need to ask someone to see if there is a best practice for this.

For Java base64

It’s good to know that from java8 they include base64 into java.util

import java.util.Base64;
byte[] encodedBytes = Base64.getEncoder().encode("Test".getBytes());
System.out.println("encodedBytes " + new String(encodedBytes));
byte[] decodedBytes = Base64.getDecoder().decode(encodedBytes);
System.out.println("decodedBytes " + new String(decodedBytes));

Previously, we have to use sth like:

import org.apache.commons.codec.binary.Base64;
byte[] encodedBytes = Base64.encodeBase64("Test".getBytes());
System.out.println("encodedBytes " + new String(encodedBytes));
byte[] decodedBytes = Base64.decodeBase64(encodedBytes);
System.out.println("decodedBytes " + new String(decodedBytes));

Reading Notes – The super tiny compiler


I just finished the reading of this file. Pretty good explanation about the compiler. Before reading this blog, I thought the concept of the compiler is to find the term of a particular language and replace it with the term of the target language. I have written a small compiler (from Perl to Python) during my uni study. At that time I was a beginner and used a lot of if conditions to handle different situations…

The code example is using a similar solution. But the concept of it really makes sense. So basically we need to handle the input step by step and transform the input into a general format which can be reused to different other languages.

Basic Concept

Most compilers break down into three primary stages: parsing, transformations and code generation

  • Parsing is taking raw code and turning it into a more abstract representation of the code.
  • Transformation takes this abstract representation and manipulates to do whatever the compiler wants it to.
  • Code Generation takes the transformed representation of the code and turns it into new code.


PArsing typically gets broken down into two phases: Lexical Analysis and Syntactic Analysis

  • Lexical Analysis takes the raw code and splits it apart into these things called tokens by a thing called a tokenizer (or lexer).

Tokens are an array of tiny little objects that describe an isolated piece of the syntax. They could be numbers, labels, punctuation, operators, whatever

  • Syntactic Analysis takes the tokens and reformats them into a representation that describes each part of the syntax and their relation to one another. This is known as an intermediate representation or Abstract Syntax Tree.

An Abstract Syntax Tree, or AST for short, is a deeply nested object that represents code in a way that is both easy to work with and tells us a lot of information.


* For the following syntax:
 *   (add 2 (subtract 4 2))
 * Tokens might look something like this:
 *   [
 *     { type: 'paren',  value: '('        },
 *     { type: 'name',   value: 'add'      },
 *     { type: 'number', value: '2'        },
 *     { type: 'paren',  value: '('        },
 *     { type: 'name',   value: 'subtract' },
 *     { type: 'number', value: '4'        },
 *     { type: 'number', value: '2'        },
 *     { type: 'paren',  value: ')'        },
 *     { type: 'paren',  value: ')'        },
 *   ]
 * And an Abstract Syntax Tree (AST) might look like this:
 *   {
 *     type: 'Program',
 *     body: [{
 *       type: 'CallExpression',
 *       name: 'add',
 *       params: [{
 *         type: 'NumberLiteral',
 *         value: '2',
 *       }, {
 *         type: 'CallExpression',
 *         name: 'subtract',
 *         params: [{
 *           type: 'NumberLiteral',
 *           value: '4',
 *         }, {
 *           type: 'NumberLiteral',
 *           value: '2',
 *         }]
 *       }]
 *     }]
 *   }


This just takes the AST from the last step and makes changes to it. It can manipulate the AST in the same language or it can translate it into an entirely new language.

There are these objects with a type property. Each of these is known as an AST Node. These nodes have defined properties on them that describe one isolated part of the tree.

* We can have a node for a "NumberLiteral":
 *   {
 *     type: 'NumberLiteral',
 *     value: '2',
 *   }
 * Or maybe a node for a "CallExpression":
 *   {
 *     type: 'CallExpression',
 *     name: 'subtract',
 *     params: [...nested nodes go here...],
 *   }
 * When transforming the AST we can manipulate nodes by
 * adding/removing/replacing properties, we can add new nodes, remove nodes, or
 * we could leave the existing AST alone and create an entirely new one based
 * on it.


In order to navigate through all of these nodes, we need to be able to traverse through them. This traversal process goes to each node in the AST depth-first.

*   {
 *     type: 'Program',
 *     body: [{
 *       type: 'CallExpression',
 *       name: 'add',
 *       params: [{
 *         type: 'NumberLiteral',
 *         value: '2'
 *       }, {
 *         type: 'CallExpression',
 *         name: 'subtract',
 *         params: [{
 *           type: 'NumberLiteral',
 *           value: '4'
 *         }, {
 *           type: 'NumberLiteral',
 *           value: '2'
 *         }]
 *       }]
 *     }]
 *   }
 * So for the above AST we would go:
 *   1. Program - Starting at the top level of the AST
 *   2. CallExpression (add) - Moving to the first element of the Program's body
 *   3. NumberLiteral (2) - Moving to the first element of CallExpression's params
 *   4. CallExpression (subtract) - Moving to the second element of CallExpression's params
 *   5. NumberLiteral (4) - Moving to the first element of CallExpression's params
 *   6. NumberLiteral (2) - Moving to the second element of CallExpression's params


* The basic idea here is that we are going to create a “visitor” object that
 * has methods that will accept different node types.
 *   var visitor = {
 *     NumberLiteral() {},
 *     CallExpression() {},
 *   };
 * When we traverse our AST, we will call the methods on this visitor whenever we
 * "enter" a node of a matching type.
 * In order to make this useful we will also pass the node and a reference to
 * the parent node.
 *   var visitor = {
 *     NumberLiteral(node, parent) {},
 *     CallExpression(node, parent) {},
 *   };
 * As we traverse down, we're going to reach branches with dead ends. As we
 * finish each branch of the tree we "exit" it. So going down the tree we
 * "enter" each node, and going back up we "exit".
 *   -> Program (enter)
 *     -> CallExpression (enter)
 *       -> Number Literal (enter)
 *       <- Number Literal (exit)
 *       -> Call Expression (enter)
 *          -> Number Literal (enter)
 *          <- Number Literal (exit)
 *          -> Number Literal (enter)
 *          <- Number Literal (exit)
 *       <- CallExpression (exit)
 *     <- CallExpression (exit)
 *   <- Program (exit)

Code Generation

 * The final phase of a compiler is code generation. Sometimes compilers will do
 * things that overlap with transformation, but for the most part code
 * generation just means take our AST and string-ify code back out.
 * Code generators work several different ways, some compilers will reuse the
 * tokens from earlier, others will have created a separate representation of
 * the code so that they can print node linearly, but from what I can tell most
 * will use the same AST we just created, which is what we’re going to focus on.
 * Effectively our code generator will know how to “print” all of the different
 * node types of the AST, and it will recursively call itself to print nested
 * nodes until everything is printed into one long string of code.

The code example can be found at the source link