Model Zoo Practice – AlexNet

I recently started to do a paper reproduction with a group of people in the wechat. It’s a good experience to remaster the classic models in details by reading the paper.
For the first week, we decided to work on the AlexNet. Here is my study notes of AlexNet. It skipped the basic knowledge and focus on the knowledge points important to me. Some of the images cannot be directly copied from my onenote page.

I actually planned to do a short showcase during the weekend paper seminar.


Trained a large deep convolutional neural network to classify the 1.2m high-resolution images in ImageNet into the 1000 different classes

The neural network:

  • Five convolution layers
  • Some of conv layers are followed by max-pooling layers
  • 3 fully-connected layers
  • Final 1000-way softmax

To make trainning faster:

  • Use non-saturating neurons Relu
  • GPU implementation of the convolution operation

To reduce overfitting in the fully-connected layer:

  • Dropout
  • Other tricks


Large dataset:

  • LabelMe: fully-segmented images
  • ImageNet: over 1.5m labeled high resolution images in over 22,000 categories

The size of the network made overfitting a significant problem

The depth is important

Top1 & Top5

  • Top1 the highest possibility is correct
  • Top5 results include the correct result

ImageNet consists of variable resolution images, while our system requires a constants input dimensionality

  • Down-sample the images into a fixed resolution of 256X256
  • Given a rectangular image, first rescaled the image such that the shorter side was of length 256 and then cropped out the central 256X256 patch from the resulting image

**The Architecture **

Five convolutional and three fully connected layers

• f is non-saturating iff (Ilimz_+ = +00) V I limz
= +00)
f is saturating iff f is not non-saturating.
These definitions are not specific to convolutional neural networks.

ReLU nonlinearity

In terms of training time with GD, saturating non-linearities(tanh sigmoid) are much slower than the non-saturating nonlinearity

Comparation between tanh sigmoid and Relu

  • Sigmoid & tanh when the value is closed to the limited range, the GD is closed to 0. During the BP, the update GD is closed to 0. There is no update from the BP.
  • ReLu doesn’t have max limit. But if the learning rate is too large, a large GD may kill the neuron during the training(GD is 0). The neuron cannot affect the result any more.

Training on multiple GPUs

  • Memory limitation at that time
  • Put half of the kernels on each GPU
  • The GPUs communicate only in certain layers

Local response normalization:

  • ReLU has the desirable property that they donot require input normalization the prevent them from saturating
  • They still find a specified local normalization scheme aids generalization
  • 在2015年 Very Deep Convolutional Networks for Large-Scale Image Recognition.提到LRN基本没什么用。

Overlapping pooling:

  • stride < kernel size
  • Reduce overfitting

Overall Arch:

  • First 5 are conv layers and the rest 3 are fully-connected layer
  • The output of last layer is fed to a 1000-way softmax
  • Kernel in different GPU
  • Response normalization layers follow the 1,2 conv layers
  • Max pooling layer follows both normalization layers as well as the 5 conv layer
  • The ReLu is applied to the output of each conv layer and fully-connected layer

Other Tips:

  • Size of input: usually it should be n times 2
  • Input size will be 227 after data processing
  • The size of feature map floor((img_size – filter_size)/stride) +1 = new_feture_size or floor((img_size – filter_size + pad * 2)/stride) +1 = new_feture_size

Reduce overfitting

Data augmentation:

  • Extracting random 224X224 patches and their horizontal reflection from the 256X256 images and training our network on these extracted patches. At the test time, the network make a prediction by extracting five 224X224 patches (the four corner patches and the center patch) as well as their horizontal reflections
  • Altering the intensities of the RGB channels in training images. Performing PCA on the set of RGB pixel values 对RGB空间做PCA(主成分分析),然后对主成分做一个(0, 0.1)的高斯扰动,也就是对颜色、光照作变换,结果使错误率又下降了1%。


At test time we use all the neurons but multiply their outputs by 0.5

**Details of Learning **

Train the model with SGD, momentum is an upgrade of SGD, weight decay is for normalization

Initialized the weights in each layer from zero-mean gaussian distribution with standard deviation 0.01

Init the biased with the constant 1


Use equal learning rate for all layers

Divide the learning rate by 10 when the validation error rate stopped improving with the current learning rate


Use validation and test error rate to compared with other models

Another way to probe the network’s visual knowledge is to consider the feature activations induced by an image L2

Install Caffe and anaconda3 with python3 on Ubuntu18

Recently, I finally begin to do the practice of deep learning network. It’s easy to tell the technologies e.g. BP, dropout, BN and pooling. However, sometimes it can be really difficult to configure and execute the network on the machine.

Previously I didn’t have any desktop with GPU and the experience of using Uni’s lab is a bad experience. They assume you know exactly what you are working on. But in the real world, we have to learn from experiments. Even it’s a bad experiments e.g. breaking the system 😛

In order to have a good experience and the full control of root permission, I recently got one desktop with RTX2080TI, one of the best gaming GPUs in the market. The first step is to install win10 and ubuntu18 on it. Those are the easy part and you can find many tutorials online.

The difficult part is to install the caffe and anaconda3.

I will skip all the common steps and focus on the wired issues I have found:

  1. Caffe should be compiled on the local with make. Although for ubuntu18 we can directly download the caffe with apt install. But if you need to change any configuration of it, you have to compile it manually. That means we need to download the caffe from github and run make

  2. If we want to use caffe with anaconda3,we need to modify the makefile.config. The good thing is it has an existing template of how to configure anaconda3.

  3. Don’t put the path of anaconda/lib into the entire system. It may cause issue for the system booting.

  4. If you need the path of anaconda/lib, put it into the .bashrc.

  5. If you want to use import caffe, you need to manually compile the make pycaffe

Linux Kernel – Process Scheduling

Process Scheduling

What ‘s process scheduler?

A sub system of kernel that puts processes to work. It decides which process runs, when and for how long.
The scheduler divides the resource of processor time between the runnable processes on a system

If there are more runnable processes than processors, some processes will not be running at a given moment. These processes are waiting to run

What’s multitasking?

A multitasking OS is one that can simultaneously interleave execution of more than one process.
Linux can have many processes in memory but only one in a runnable state

What’s preemptive multitasking?

Linux implements preemptive multitasking.
In preemptive multitasking, the scheduler decides when a process is to cease running and a new process is to begin running.

What’s preemption?

The action of involuntarily suspending a running process is called preemption

What’s timeslice?

The time a process runs before it’s preempted is usually predetermined and it’s called timeslice of the process

What’s the benefits of preemptive multitasking?

Managing the timeslice enables the scheduler to make global scheduling decisions for the system. It also prevents any one process from monopolizing the processor.

What’s cooperative multitasking?

A process does not stop running until voluntary decides to do so.

What’s yielding?

The act of a process voluntarily suspending itself is called yielding

What’s the bad thing of it?

Cannot make global decision
A process never yields can burn down the system

What’s the scheduler in kernel 2.6?

CFS Completely Fair Scheduler

What’s Policy?

Policy is the behavior of the scheduler that determines what runs when.
The scheduling policy must attempt to satisfy two goals:
– fast process response time(low latency)
– Maximal system utilization (high throughput)

What’s I/O-Bound process?

Runnable for only short duration because it eventually blocks waiting on more I/O (keyboard input, network I/O or disk I/O)

What’s process-Bound process?

Run less frequently but for longer durations e.g. matlab

What’s process priority?

Linux implements two priority ranks:
– Nice value: from -20 to +19. Larger nice values correspond to a lower priority- you are being nice to the other processes. Process with lower nice value receive a larger proportion of the system processor. In linux it’s a control of the timeslice proportion

  • Real-time priority: 0-99 larger value has high priority. All real-time processes are at a higher priority compared with normal processes

How does timeslice work?

The timeslice is the numeric value that represents how long a task can run until it is preempted.
Linux CFS scheduler assigns process a proportion of the processor. The proportion is affected by the nice value

In Linux the preemption decision is a function of how much a proportion of the processor the newly runnable process has consumed. If it has consumed a smaller proportion of the processor than the currently executing process, it runs immediately, preempting the current process. If not , it is scheduled to run at a later time

What’s Scheduler Classes?

Linux scheduler use scheduler classes to enable different algorithms to scheduler different types of processes.

Scheduler classes have priority. The base scheduler class has the highest priority. The highest schedule class that has a runnable process wins and select the next process

Linux CFS is a scheduler class for normal processes

What’s the Cons of the traditional Unix scheduler?

  1. Context switching
  2. The nice value is a relative term and the difference between 0,1 and 18, 19 is quite large
  3. The absolute mapping between timeslice and nice value can cause issues
  4. ….

What’s Perfect multitasking?

If we have 2 processes, we would run both processes simultaneously for the same time, each at 50% power.

How’s the Fair scheduling works?

CFS calculates how long a process should run as a function of the total runnable processes.
The nice value works as a weight regarding the proportion of the processor.
Each process runs for a “timeslice” proportional to its weight divided by the total weight of all runnable threads.

CFS set a minimum duration called targeted latency as a basis.
Targeted latency is defined by the number of processes. Therefore, it’s a trade off between better interaction and expensive context switch, worse throughput.

Therefore, if we have infinite processes, the targeted latency will be 0.
So we set a floor of the targeted latency called minimum granularity:1 millisecond

CFS is not perfect when we have too many processes but it works good for normal situations

How we account the time via CFS?

Linux is using struct sched_entity to save related info
sched_entity is in the process descriptor

What’s the virtual runtime?

Actual runtime normalized by the number of runnable processes
It’s the ideal estimated runtime for a process on multi processors system.

It’s updated by update_curr() which is invoked periodically by system timer and also whenever a process becomes runnable and unrunnable

Vruntime is an accurate measure of the runtime of a given process

How’s the process selection work?

CFS picks the process with the smallest vruntime to run next.

CFS use a red-black tree to manage the list of runnable processes and find the process with the smallest vruntime.
rbtree( red-black tree) is a self-balancing binary search tree

The key of the rbtree is the vruntime. So we just need to pick up the leftmost node in the tree.
In linux we don’t need to traverse the tree( O(height of the tree) O(LogN) if the tree has n node and is balanced) to find the leftmost node because the value is cached by rb_leftmost.

If leftmost node is null, there are no runnable processes, then CFS schedules the idle task

How do we add process to the tree?

The process will be added when becominng runnable or firstly created by fork()

Implemented by enqueue_entity()

When do we remove process from the tree?
CFS removes processes from the tree when the process blocks or terminates

Now What’s the workflow of the scheduler?

The main entry point is the function scheduler() which call pick_next_task()
This function will go through each scheduler class, starting with the highest priority and select the highest priority process in the highest priority class

CFS is a scheduler class for normal processes.

Where do we store unrunnable processes?

Wait queue: a simple list of processes waiting for an event to occur
Be careful of the implementation because it can lead to race conditions or dead lock

What’s the context switching?

The switch from one runnable task to another is handled by context_switch()
This function is called by schedule()
It does 2 things:
– Call switch_mm() to switch the virtual memory mapping from the previous process to the new process
– Call switch_to() to switch the process state from the previous state to current process’s state

Kernel use a flag to decide when to enable conext switch. Usually it happens when preemption or wake up

The flag is in the thread _info

Linux kernel is SMP-safe. What’s SMP-safe?

SMP is an acronym for Symmetric Multi Processing (meaning multiple CPUs of the same kind – pretty much.)
It means thread safe.
Here it means that it’s safe for multiple processors to avoid race condition from hardware level.

How’s the kernel process preemption happens?

Thread_info holds a preempt_count
If preempt_count is>0 , it means other processor is using it therefore it’s locked
If it’s 0, it’s safe to preempt this process and reschedule

What’s the real-time scheduling policies?

Linux has 2 real-time policies:

Those policies ‘re handled by a special real-time scheduler (schedule class same as CFS)

What’s the detail of SCHED_FIFO?

It’s a simple first in first out algorithm without timeslice
1. A runnable SCHED_FIFO task is always scheduled over any normal tasks.
2. When it runnable, it continues to run until it blocks or yields the processor
3. Same priority tasks will run round-robin
4. No timeslice and can run indefinitely.
5. Only a higher priority SCHED_FIFO and SCHED_RR can preempt it

What’s the detail of SCHED_RR?

Similar to SCHED_FIFO except it has timeslice
Real-time round-robin algorithm

Linux Kernel – Process Management

Process Management

This is a reading notes regarding the Linux Kernel Development

What is process and what is thread?

Process: is a program in the midst of execution, also include a set of resources

Thread: the objects of activity within the process. Includes: a unique program counter, process stack, Set of process registers

The kernel schedules threads not process.

To linux, a thread is implemented as a special kind of process

Process provides two virtualizations
– Virtualized processor (schedule): it gives the process the illusion that it alone monopolizes the system.
– Virtual memory (memory): let the process allocate the manage memory as if it alone owned all the memory in the system.

Threads share the virtual memory abstraction whereas each receives its own virtualized processor

Process is an active program
Two and more processes can exist that are executing the same program, sharing various resources

How the process begins?

The existing process will create another process by calling fork()
The existing process is parent
The new process is child
The parent process will resume execution and the child process will start the execution both when the fork() return

The fork() system call return twice from kernel: one in parent process another one in the child process

What happens after creation?

The process will execute a new, different program. The exec() family of function calls creates new address space and loads a new program into it.

How does the process end?

A program exits via exit() system call. This call terminates the process and free resources.
The parent process can inquire about the status of a terminated child via wait4() system call, which enables a process to wait for the termination of a specific process.
When the process exists, it’s placed into a special zombie state that represents terminated processes until the parent calls wait() or waitpid()

Another name for a process is a task!

How does the kernel store all those processes?

The kernel stores list of processes in a circular doubly linked list called Task List

What’s process descriptor?

Each element in the task list is a process descriptor. Each process descriptor is of the type: struct task_struct which is defined in <linux/sched.h>
The process descriptor contains all the info about a specific process.

The task_struct contains the data e.g. open files, the process’s address space, pending signals, the process’s state and etc

How to allocate the process descriptor

The task_struct is allocated via slab allocator

With the process descriptor dynamically created by slab allocator, a new structure thread_info will be generated and live at the bottom of the stack (for stack that grows down) or top of the stack (for stack that grows up)

thread_info has a pointer to the task_struct

Each task’s thread_info is allocated a the end of its stack.

How to get the current working process descriptor?

Each process has a unique id called PID saved in each porcess descriptor

Pid_max defines the max number of pid. A value to estimated concurrency

In the system, we can use current macro to directly use the current process
In x86, current is calculated via masking out 13 least-significant bits of the stack pointer to obtain the thread_info

This calculation is done within current_thread_info()
To get the current task_struct: current_thread_info()->task

What’s process state?

The state field in the process descriptor present the current condition of the process
There are 5 different states:
– TASK_RUNNING: the process is runnable. It’s either currently running or wait for running in the queue. It’s the only state for process executing in user-space. In kernel-space, it presents that the process is actively running
– TASK_INTERRPTIBLE: the process is sleeping(blocked) and wait for some conditions to exists. When it exists, the kernel will set the process to TASK_RUNNING. The process can also be awaken by signals
– TASK_UNINTERPTIBLE: it’s the same state of TASK_INTERRPTIBLE except that it will not be awaken by signals
– _TASK_TRACED: the process is been traced by other processes e.g. debugger via ptrace
– _TASK_STOPPED: process is stopped nor is it eligible to run. This occurs if the task receives SIGSTOP, SIGSTP,SIGTTIN or SIGTTOU signal or any signal while it’s being debugged

How to change the process state?

Call set_task_state(task, state)
set_current_state is sync to above function

What’s process context?

For each process it will read the execuable code from the execuable files and execute the code in the progress’s address space. Normal program is executed in user-space. If its’s called by system call or trigger an exception, then the process enters in the kernel space
At this point, the kernel is said to be ‘executing on behalf of the process’ and is in process context

After exiting the kernel, the process resumes execution in user-space

What’s process family tree?

All processes are descendants of the init process, whose pid is one.
The kernel starts init in the last step of the boot process and init process reads the system initscripts and execute more programs

Each process has one parent but 0-more children. Process that has same parent are siblings.
The relation is saved in process descriptor, with a pointer called parent
Another list pointers call children.
It’s easy to iterate over all processes in the system because task list is a circular doubly linked list
But expensive time cost

How to create a process in details?

Two steps in unix: fork() and exec()
– Fork() create a child process that is a copy of the current task except pid and certain resources and statistics
PPID is parent’s pid
– exec() loads a new execuable into the address space and executes it

Linux implement fork() via clone() system call via do_fork() via copy_process()
1. It calls dup_task_struct() to create new kernel stack, thread_info and task_struct. At this point the child and parent descriptor are the same
2. Check pid is within the max limit
3. Child process differentiate itself from the parent process
4. Call copy_flags() to update flag of task_struct
5. Child state is set to TASK_UNINTERRUPTIBLE to ensure that it doesn’t yet run
6. Call alloc_pid() to create a new pid
7. Either duplicate or share resources: open files, filesystem information, signal handlers, process address space and namespace. Those are shared between threads in a process or unique and copied for different process
8. copy_process() cleanup and return the caller a pointer to new process

What’s Copy-on-Write? (Known as COW in other system)

Rather than duplicate the process address space, the parent and the child can share a single copy.
The data is marked that if it is written to , a duplicate is made and each process receives a unique copy. COW delays the copying of each page in the address space until it is actually written to.

What’s kernel threads?

It’s useful for kernel to perform some operation in the background. It’s done via kernel threads.
It’s standard process that exist solely in kernel-space.
Kernel thread doesn’t have an address space. They never context switch into user-space

How to terminate a process?

When a process terminates , the kernel release the resources owned by the process and notifies the parent process
The process calls exit() to terminate itself.
The work is handled by do_exit():
1. Set the PF_EXITING flag of the task_struct
2. Call del_timer_sync() to remove any kernel timers
3. If BSD accounting is enabled, then release it
4. Call exit_mm() to release mm_struct held by this process. If no other process is holding it
5. Call exit_files() and exit_fs() to decrement the usage count of objects related to file descriptors and filesystem data. If the usage count is 0, then the object will be destoryed (GC)
6. It sets the exit_code in the task_struct
7. Call exit_notify() to send signals to the parent, reparents children process to another thread in their thread group or init process. Then set the exit state in the exit_state in task_struct to EXIT_ZOMBIE
8. do_exit() call schedule() to switch to a new process

All objects associated with the task are freed. The only memeory it holds is itskernel stack the thread_info and task_Struct. After the parent retrieves the info or notifies the kernel of the received. The remaining memory will be released

HDFS Summary

HDFS architecture is actually similar to the GFS system. It’s a popular topic especially for big data engineer. It can also be used during the inverview as a part of the system design.

So, what is HDFS?

Hadoop distributed file system
First, it is a file system. But it is not just a file system. It’s a fault-tolernat file system and it’s designed to run on inexpensive hardware

Why we need it?

It’s faster(compared with reading data within one machine) and easy to use (with all the configuration)

How to use it?

You can use hdfs commands to use it e.g. setup input and output

What’s the architecture of the HDFS?

Similar to GFS, it has master node and slave node. Master node will store the meta data and control the file storage location. Slave node will save the blocks.
It also has the replication function and using hdfs client to send requests.

Summary of Array algorithms questions (medium)

Recently I tried to solve the interesting questions on leetcode. I think the thinking strategy should be recorded here.

1. max chunks to make sorted 

The question is about finding the max amount of divided arrays for an unsorted array.

The idea is:

select k elements (1, 2, … k-1) from left to right, if all elements are smaller than the rest of elements then it should be divided.

It can be converted to:

find the max amount of left array, if the amount is equal to the index (specific condition for that question), then the array can be divided

2.  when calculating the sum of an array with the combination, backtracking should be considered first

3. Dynamic programming

When to use: if the problem can be abstracted to a subtask which is depended on the previous result or results

e.g. get the best result of the problem.

the best result of n is based on the best result of n-1


Another good example about dynamic programming
The current result is based on previous result
e.g. length ++

Don’t limit the format of DP. Not all dp need something like the stock

class Solution:
    def minimumTotal(self, triangle):
        :type triangle: List[List[int]]
        :rtype: int
        row = len(triangle)
        if row == 0:
            return 0
        for r in range(1, row):
            for c in range(len(triangle[r])):
                if c == 0:
                    triangle[r][c] = triangle[r-1][c] + triangle[r][c]
                elif c == len(triangle[r]) -1:
                    triangle[r][c] = triangle[r-1][c-1] + triangle[r][c]
                    triangle[r][c] = min(triangle[r-1][c-1], triangle[r-1][c]) + triangle[r][c]
        minisum = min(triangle[row-1])
        return minisum

class Solution:
    def findLength(self, A, B):
        :type A: List[int]
        :type B: List[int]
        :rtype: int
        hashTable = [[0]* (len(B)+1) for _ in range(len(A)+1)]
        for x in range(len(A)-1, -1, -1):
            for y in range(len(B)-1 , -1, -1):
                if A[x] == B[y]:
                    hashTable[x][y] = hashTable[x+1][y+1] + 1
        maxcount = 0
        maxcount = max([max(hashTable[x]) for x in range(len(hashTable))])
        return maxcount
class Solution:
    def maxProduct(self, nums):
        :type nums: List[int]
        :rtype: int
        if len(nums) == 0:
            return 0
        maxpre = nums[0]
        minpre = nums[0]
        maxnow = maxpre
        minnow = minpre
        maxsofar = nums[0]
        nl = len(nums)
        for x in range(1,nl):
            maxnow = max(max(minpre*nums[x], maxpre* nums[x]), nums[x])
            minnow = min(min(minpre*nums[x], maxpre*nums[x]), nums[x])
            maxsofar = max(maxnow, maxsofar)
            maxpre = maxnow
            minpre = minnow
        return maxsofar

4. backtracing questions

Backtracking can be solved always as follows:

Pick a starting point.
while(Problem is not solved)
    For each path from the starting point.
        check if selected path is safe, if yes select it
        and make recursive call to rest of the problem
        before which undo the current move.
    End For
If none of the move works out, return false, NO SOLUTON.

a good example about back tracking is getting all substring of a string or all subset of a array


class Solution:
    def backTrace(self, candidates, index, curList, remain, res):
        if remain == 0:
        elif remain < 0:
            for x in range(index, len(candidates)):
                self.backTrace(candidates, x, curList + [candidates[x]], remain-candidates[x], res)

    def combinationSum(self, candidates, target):
        :type candidates: List[int]
        :type target: int
        :rtype: List[List[int]]
        self.backTrace(candidates, 0, [], target, res)
        return res

5. Cycle Detection

Cycle detection can be used within array or queue or link


Assume that the first time slow point meets the fast point at M

Assume the cycle length is C

The dis of slow point is v= a*C + m + p

The dis of fast point is 2v= b*C + m + p

2v-v = v = (b-a)C=n*C

So v is nC, v = aC + m + p, Therefore m+p is n*C

Therefore, we let fast point back to the start and both fast and slow points move at the same speed.  Assume they meet the second time:

Fast point moved p

The dis between cycle entry and slow point is m, and slow point moved p

m +p  is n*C . So the fast and slow points will meet and the entry of the cycle

Usually array value and index can be a potential cycle


index 0 1 2 3 4 5

value  1 2 3 4 2 5

if we keep getting nums[nums[i]],  it will become a cycle

6. Scope classification



     start1       end1         start2    end2

how to judge a point is within multiple scopes?

The best option is the binary tree

We can define a tree structure with python:

class Node:
   __slots__ = 'start','end','left','right'
   def __init__(self, start, end):
      self.start = start
      self.end = end
      self.left = self.right = None

    def insert(self, node):
        if node.start = self.end:
            if not self.right:
              self.right = node
              return True
            return self.right.insert(node)
        elif node.end <= self.start:
            if not self.left:
               self.left = node
               return True
            return self.left.insert(node)
            return False

7. binary search

If we want to find a point in a set of numbers and the set is sorted. Then it may be a classisic binary search problem
Be careful about the +1 / -1 and return l or r or mid

int binarySearch(int nums[], int l, int r, int x) {
        while (r >= l && r < nums.length) {
            int mid = (l + r) / 2;
            if (nums[mid] >= x)
                r = mid - 1;
                l = mid + 1;
        return l;

when using binary searching, we need to be careful about the mid selection. for example, left+right / 2 or left+right +1 /2 for odd or even.
Later I should go deep into the extrame situation of binary search

Now it’s time to go deep into the question!


Given an array of integers sorted in ascending order, find the starting and ending position of a given target value.

Your algorithm’s runtime complexity must be in the order ofO(logn).

If the target is not found in the array, return[-1, -1].

For example, Given[5, 7, 7, 8, 8, 10]and target value 8, return[3, 4].

The idea is first use binary search to find the left side of the target and then use the same method to find the right side.

To find the left side:

mid = int(l+r)/2 // this will always get the low integer e.g. int(1.6) == 1

Since it’s an ascending order, if nums[mid] < target, then target exist in (mid, r], we should have l=mid +1

If nums[mid] > target, the target exist in [l, mid) so we should have r = mid -1

if nums[mid] == target, we have 2 situation:

target start == mid
target start < mid
Therefore target start <= mid, so target start is within [l, mid], we should have r = mid

1 2 3 3 3 4 5

l r

1 2 3 3 3 4 5 mid = 3 nums[3] == 3

l r

1 2 3 3 3 4 5 mid = 1 nums[1] == 2 <3

l r

1 2 3 3 3 4 5 mid = 1 nums[1] == 2 <3

l r

1 2 3 3 3 4 5 mid = 2 nums[2] == 3

l=r and nums[l] is the target left side

Now we have target left side == l. The next step is to find the right side

The right side is within [l, len(nums)-1]

l = l

r = len(nums)-1

mid = int(l+r)/2

if nums[mid] > target, then the target end should within the [l, mid] therefore r = mid -1

if nums[mid] == target, then 2 situation

target end is on the right side of mid, then l = mid +1

target end is mid, l = mid

Therefore target end >= mid, target end is within [mid r], then l = mid

3 3 3 4 5 mid = 2 nums[2] = 3

l r

3 3 3 4 5 mid = 3 nums[3] > 3

l r

3 3 3 4 5 mid = 2 nums[2] = 3 l=

l r
Then we have the problem that at some stage we alway have l = min

So l pointer stops moving.

To avoid this we can let mid = l+r+1/2, So everytime we get the highest integer as mid

3 3 3 4 5 mid = 2 nums[2] = 3

l r

3 3 3 4 5 mid = 3 nums[3] > 3

l r
3 3 3 4 5 mid = 3 nums[3] =4 > 3

l r
3 3 3 4 5


8. Linear scan instead of brute force

Some time we will meet the brute force situation.
e.g. 3sum smaller  or the validntriangle number 

If we use brute force solution to solve the problem, we will meet the time exceed issue

Obviously, for those questions we can sort the array first and then try to get the extrame limit. We can use the binary search to get that limit or use the linear scan to get the result

9. 3D dynamic programming

Similar to dynamic programming
We go through the matrix once but for each element, we need the previous value for the calculation. Therefore, we need a 3D array to hold previous values

class Solution:
    def longestLine(self, M):
        if len(M) == 0:
            return 0
        rows, columns = len(M), len(M[0])
        maxcount = 0
        dp = [ [[0]*4 for c in range(columns)] for r in range(rows)]
        for r in range(rows):
            for c in range(columns):
                if M[r][c] == 1:
                    dp[r][c][0] = dp[r][c-1][0] + 1 if c > 0 else 1
                    dp[r][c][1] = dp[r-1][c][1] + 1 if r > 0 else 1
                    dp[r][c][2] = dp[r-1][c+1][2] +1 if r > 0 and c < columns -1 else 1
                    dp[r][c][3] = dp[r-1][c-1][3] +1 if r >0 and c>0 else 1
                    maxcount = max(maxcount,max(dp[r][c]))
        return maxcount

10. Matrix direction switch

Assume the point in a matrix move in clockwise, we can easily check if we need to change the direction by the limit of border and direction=direction+1%4

class Solution:
    def spiralOrder(self, matrix):
        :type matrix: List[List[int]]
        :rtype: List[int]
        if rnum == 0:
            return []
        if cnum == 0:
            return []

        seen= [[False] * cnum for x in range(rnum)]
        dr= [0, 1, 0, -1]
        dc= [1, 0, -1, 0]
        r=c=di = 0
        for _ in range(rnum * cnum):
            seen[r][c] = True
            rnext,cnext = r + dr[di],c+dc[di]
            if rnext >= 0 and rnext<rnum and cnext >= 0 and cnext < cnum and not seen[rnext][cnext]:
                r,c = rnext,cnext
                di = (di +1) %4
                r,c = r + dr[di],c+dc[di]
        return ans
another good example based on the privous solution:
class Solution:
    def generateMatrix(self, n):
        :type n: int
        :rtype: List[List[int]]
        res= [[0]*n for x in range(n)]
        count = 1
        dr= [0,1,0,-1]
        dc= [1,0,-1,0]
        for _ in range(n*n):
            if 0<=rn<n and 0<=cn<n and res[rn][cn] == 0:
                r,c = rn,cn
                di = (di+1)%4
        return res

11. culumative sum

A simple idea of culumative sum is:
Assume we have an array contains n integer, the sum of subarray between i and j is equal to the sum from 0 to i minus the sum from 0 to j , and: i<j

e.g. subarray sum equals k

class Solution:
        def subarraySum(self, nums, k):
            sumhash = {}
            sumhash[0] = 1 // need to add 1 to 0 because if sum == k (sum - k ==0), we need to count 1 for each sum == k
            sumN = 0
            count = 0
            for x in range(len(nums)):
                sumN=sumN + nums[x]
                if (sumN-k) in sumhash:
                    count += sumhash[(sumN-k)]
                if sumN in sumhash:
                    sumhash[sumN] += 1
                    sumhash[sumN] = 1
            return count

12. 2Sum, 3Sum, 4Sum, KSum

The main idea of those questions is to downgrade the Nsum to N-1 sum. We also try to reduce the calculation with different limitation

e.g. 3Sum

class Solution:
    def threeSum(self, nums):
        :type nums: List[int]
        :rtype: List[List[int]]
        ans = []
        for x in range(len(nums)-2):
            if nums[x] > 0:
            if x > 0 and nums[x] == nums[x-1]:
            y = x+1
            z = len(nums) -1
            while y < z:
                cursum = nums[x] + nums[y] + nums[z]
                if cursum == 0:
                    while y<z and nums[y] == nums[y+1]:
                        y+= 1
                    while y<z and nums[z] == nums[z-1]:
                elif cursum <0:
                    y += 1
                    z -= 1
        return ans


class Solution:
    def threeSum(self, nums, target, ans,index, previous, lnum):
        for i in range(index, lnum-2):
            if (3 * nums[i]) > target:
            if (3*nums[lnum-1])<target:
            if i > index and nums[i] == nums[i-1]:
            l = i+1
            r = lnum -1
            while l < r:
                cursum = nums[i] + nums[l] + nums[r]
                if cursum == target:
                    ans.append([previous, nums[i], nums[l], nums[r]])
                    while l<r and nums[l] == nums[l+1]:
                        l = l+1
                    while l < r and nums[r] == nums[r-1]:
                        r =r-1
                    l = l+1
                    r= r-1
                elif cursum < target:
                    l +=1

    def fourSum(self, nums, target):
        :type nums: List[int]
        :type target: int
        :rtype: List[List[int]]
        ans = []
        lnum = len(nums)
        if lnum <4:
            return ans
        if nums[0]*4 > target:
            return ans
        if nums[lnum-1] * 4 < target:
            return ans
        for i in range(lnum-3):
            if (nums[i] + 3*nums[lnum-1]) < target:
            if (nums[i]+ 3*nums[i+1]) > target:
            if i>0 and nums[i] == nums[i-1]:
            if nums[i] * 4 == target:
                if i+3 < lnum and nums[i] == nums[i+3]:
            self.threeSum(nums, target - nums[i], ans, i+1, nums[i], lnum)

        return ans

12. Merge interval

Given a collection of intervals, merge all overlapping intervals.

For example,
Given [1,3],[2,6],[8,10],[15,18],
return [1,6],[8,10],[15,18].

The idea is to first sort the list with start
and then check for each element if the next one’s start is in the previous range

* python sorted can use lambda function
* we don’t need to use binary tree for this one, a simple sort can solve the problem

class Solution:
def merge(self, intervals):
:type intervals: List[Interval]
:rtype: List[Interval]
if len(intervals) == 0:
return res
intervals = sorted(intervals, key=lambda interval: interval.start)
start =intervals[0].start
end =intervals[0].end
for i, interval in enumerate(intervals):
if interval.start <= end:
start = min(start, interval.start)
end = max(end, interval.end)
res.append(Interval(start, end))
start = interval.start
end = interval.end
res.append(Interval(start, end))
return res

### 13. Majority Element
For those who aren’t familiar with Boyer-Moore Majority Vote algorithm,
I found a great article ( that helps me to understand this fantastic algorithm!!
Please check it out!

The essential concepts is you keep a counter for the majority number X. If you find a number Y that is not X, the current counter should deduce 1. The reason is that if there is 5 X and 4 Y, there would be one (5-4) more X than Y. This could be explained as “4 X being paired out by 4 Y”.

And since the requirement is finding the majority for more than ceiling of [n/3], the answer would be less than or equal to two numbers.
So we can modify the algorithm to maintain two counters for two majorities.

class Solution:
    def majorityElement(self, nums):
        :type nums: List[int]
        :rtype: List[int]
        count1 =0
        count2 =0
        cand1 =0
        for x, n in enumerate(nums):
            if n == cand1:
                count1 +=1
            elif n == cand2:
                count2 +=2
            elif count1 == 0:
                cand1 = n
                count1 +=1
            elif count2 == 0:
                cand2 =n
                count2 +=1
                count1 -= 1
                count2 -= 1
        return [n for n in (cand1, cand2) if nums.count(n) > len(nums)/3]

14. Permutation

Usually the idea of getting permutation is to:
– limit the condition
– use backtrace to get all possible results

There is one question to get next permutation

The idea is to assume the next permutation should be follow the order. We can use single-pass method to get the result

class Solution:
    def swap(self, nums, i, j):
        tem = nums[i]
        nums[i] = nums[j]
        nums[j] = tem
    def reverse(self, nums, i, j):
        while i <= j:
            self.swap(nums, i, j)
    def nextPermutation(self, nums):
        :type nums: List[int]
        :rtype: void Do not return anything, modify nums in-place instead.
        i = len(nums) - 2
        while i >= 0 and nums[i+1] <= nums[i]:
            i -=1
        if i >= 0:
            j = len(nums) -1
            while j >=0 and nums[j] <= nums[i]:
                j -=1
            self.swap(nums,i, j)
        self.reverse(nums, i+1, len(nums) - 1)

15. Palindrome

we need to consider different situation
e.g. if the string is Palindrome
– all char is even
– one char is odd, all others are even

Then we can convert this question into the permutation issue and use back trace to get the result

class Solution:
    def backTracing(self, nums, ans, pos, res, oddc):
        # print(ans)
        if len(ans) == len(nums):
            ans += [oddc] + ans[::-1]
            res.append(''.join(str(ans[y]) for y in range(len(ans))))
            dup = []
            for x in range(len(nums)):
                # print("x:",x,"nums[x]:",nums[x],"dup:",dup, "pos:",pos, "ans:",ans)
                if not x in pos and not nums[x] in dup:
                    self.backTracing(nums, ans + [nums[x]], pos+[x], res, oddc)

    def generatePalindromes(self, s):
        :type s: str
        :rtype: List[str]
        if len(s) == 0:
            return []

        ul = dict()
        for x in s:
            if not x in ul:
                ul.setdefault(x, 1)
                ul[x] += 1

        oddc = ''
        oddCount = 0
        for x in ul.keys():
            if ul[x] %2 != 0:
                oddc = x
                oddCount += 1
                ul[x] = int((ul[x]-1)/2)
                ul[x] = int(ul[x]/2)
        if oddCount >1:
            return []
        nums = []
        for x in ul.keys():
            while ul[x] >0:
                nums += [x]
                ul[x] -=1
        res = []
        # print(nums)
        self.backTracing(nums, [], [], res, oddc)
        return res  

16. preorder and inorder of tree

The basic idea is here:
Say we have 2 arrays, PRE and IN.
Preorder traversing implies that PRE[0] is the root node.
Then we can find this PRE[0] in IN, say it’s IN[5].
Now we know that IN[5] is root, so we know that IN[0] – IN[4] is on the left side, IN[6] to the end is on the right side.
Recursively doing this on subarrays, we can build a tree out of it 🙂

# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution:
    def build(self, preorder, postart, poend, inorder, iostart, ioend):
        if postart > poend or iostart > ioend:
            return None
        root = TreeNode(preorder[postart])
        inorderBreakMark = iostart
        for x in range(iostart, ioend+1):
            if inorder[x] == preorder[postart]:
                inorderBreakMark = x
        leftL = inorderBreakMark-iostart
        root.left =, postart+1, postart+leftL, inorder, iostart, inorderBreakMark-1)
        root.right =, postart+leftL + 1, poend, inorder, inorderBreakMark+1, ioend)
        return root

    def buildTree(self, preorder, inorder):
        :type preorder: List[int]
        :type inorder: List[int]
        :rtype: TreeNode
        root =, 0, len(preorder)-1, inorder, 0 ,len(inorder)-1  )
        return root

17 Greedy

Greedy is a special case of DP
Greedy: for each step we only consider the current best solution, because we belive if we choose the current best solution for each step, the final result is the best solution for all

DP: We calculate all solution

Greedy Example linke

class Solution:
    def canJump(self, nums):
        :type nums: List[int]
        :rtype: bool
        lastpos = len(nums)-1
        for x in range(len(nums)-2, -1, -1):
            if x + nums[x] >= lastpos:
                lastpos = x
        return lastpos == 0

a good example to tell the difference between greedy and dp is link

18 quick sort

this is a good example about quick sort
We can use the simple solution: two points from left to right and swap if it is min

Another good solution is:
The idea is to sweep all 0s to the left and all 2s to the right, then all 1s are left in the middle.

It is hard to define what is a “one-pass” solution but this algorithm is bounded by O(2n), meaning that at most each element will be seen and operated twice (in the case of all 0s). You may be able to write an algorithm which goes through the list only once, but each step requires multiple operations, leading the total operations larger than O(2n).

class Solution:
    def swap(self, nums, start, end):
        temp = nums[start]
        nums[start] = nums[end]
        nums[end] = temp

    def sortColors(self, nums):
        :type nums: List[int]
        :rtype: void Do not return anything, modify nums in-place instead.
        low = 0
        high = len(nums) - 1
        i = low
        while i <= high:
            if nums[i] == 0:
                self.swap(nums, i, low)
                low += 1
                i += 1
            elif nums[i] == 2:
                self.swap(nums, i, high)
                high -= 1
                i += 1

19 Two points

Follow up for “Remove Duplicates”:
What if duplicates are allowed at most twice?

For example,
Given sorted array nums = [1,1,1,2,2,3],

Your function should return length = 5, with the first five elements of nums being 1, 1, 2, 2 and 3. It doesn’t matter what you leave beyond the new length.

class Solution:
    def removeDuplicates(self, nums):
        :type nums: List[int]
        :rtype: int
        count = 0
        for num in nums:
            if count < 2 or nums[count-2] < num:
                nums[count] = num
                count += 1
        return count

Summary of Array algorithms questions (easy)

Recently I spent few hours on going through algorithm questions in leetcode. I haven’t touched the algorithm question for 2 years but I think it’s a good time to review those questions in a better way.

From my experience,  the thinking strategy of solving similar problems is limited. Those strategies cannot help you solve all problems relevant to Array but it may be a good beginning of analyzing those algorithm tricks.

The core is the math

  1. Hash Table
  2. shifting the array from beginning to the end or with odd & even order
  3. use *-1 as a mark
  4. two indexes, two-way or one-way moving,  or 1 , n-1
  5. convert the index into other marks e.g. content vice versa
  6. if it’s matrix, sum the columns and rows
  7. switch i and i+1  bubble sort
  8. use mid number  quicksort

GC for HotSpot Java, V8 Nodejs, PHP and Python


  • Java is using Generation strategy for GC.
  • The memory will be divided into 3 sections: Young generation, old generation, and metaSpace
  • The young generation contains: Eden, from, to with space 8:1:1

V8 Nodejs

  • v8 is using generation GC as well
  • the difference is young generation is small
  • young generation only contains 2 spaces: from and to
  • there is no metaspace


  • session (live time) and the reference count


  • reference count
  • 3 generations

Java Container Class

Collection & Map

Collection – A collection represents a group of objects, known as its elements.Some collections allow duplicate elements and others do not. Some are ordered and others unordered.

Map – An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.

3 ways to loop map:

  • Set keySet()
  • Collection values()
  • Set< Map.Entry< K, V>> entrySet()

List Set & Queue

List Set & Queue inherit Collection

  • List: an ordered collection, the element can be duplicated
  • Set: elements can not be duplicated


The list provides a special iterator called ListIterator.

ListIterator<E> listIterator();

ListIterator<E> listIterator(int index);

public interface ListIterator<E> extends Iterator<E> {
    // Query Operations

    boolean hasNext();

    E next();

    boolean hasPrevious();

    E previous();

    int previousIndex();

    void remove();

    void set(E e);

    void add(E e);


  • ArrayList implements List with array.
  • It allows inserting null.
  • size, isEmpty, get, set, iterator, add are all O(1), if add N, it will be O(N)
  • ArrayList is not synchronized for threads
  • By default, the first time we insert the element the size of it is 10.
  • if exceed, the size will increase 50%

source code:

transient Object[] elementData;

private int size;

All elements are saved within the object array and size is used for length control.

source code of add:

public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;

private void ensureCapacityInternal(int minCapacity) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);


private void ensureExplicitCapacity(int minCapacity) {

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)

private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);

The add operation will check the length. If not match then it will call grow (Arrays.copyOf)

source code of remove:

public E remove(int index) {

    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;

It will use System.arraycopy to move all elements behind the target index and remove the last element.

That’s why the cost of adding and removing is expensive 🙂

It alos has a function trimToSize() which can be used for compressing the size of array

public void trimToSize() { 
    if (size < elementData.length) { 
        elementData = Arrays.copyOf(elementData, size); 

Besides, it implements RandomAccess. RandomAccess also includes: ArrayList, AttributeList, CopyOnWriteArrayList,RoleList, RoleUnresolvedList, Stack, Vector

There is one comment within the RandomAccess:

for (int i=0, n=list.size(); i < n; i++) {     
runs faster than this loop:
for (Iterator i=list.iterator(); i.hasNext(); ) {;

Compared with Vector

  • almost the same. Only different is Vector is synchronized so it’s more expensive
  • Vector grows 2 times while ArrayList grows 1.5 times
  • Vector also contains Stack


LinkedList is also an ordered container class. LinkedList implements List with Link

ArrayList V.S LinkedList

  • Get: ArrayList can use index to get. LinkedList have to find from the beginning
  • Add & Remove: LinkedList can easily add and remove by breaking the link. ArrayList have to copy all data and move position
  • Grow: ArrayList has to apply for a larger array and move the data. LinkedList can dynamically create new link node

source code:

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element; = next;
        this.prev = prev;

A two way link

transient int size = 0;

transient Node<E> first;

transient Node<E> last;

Each linkedList will have first and last point

Add and Delete:

private void linkFirst(E e) {
    final Node<E> f = first;
    final Node<E> newNode = new Node<>(null, e, f);
    first = newNode;
    if (f == null)
        last = newNode;
        f.prev = newNode;

void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;
    else = newNode;

void linkBefore(E e, Node<E> succ) {
    // assert succ != null;
    final Node<E> pred = succ.prev;
    final Node<E> newNode = new Node<>(pred, e, succ);
    succ.prev = newNode;
    if (pred == null)
        first = newNode;
    else = newNode;

private E unlinkFirst(Node<E> f) {
    // assert f == first && f != null;
    final E element = f.item;
    final Node<E> next =;
    f.item = null; = null; // help GC
    first = next;
    if (next == null)
        last = null;
        next.prev = null;
    return element;

private E unlinkLast(Node<E> l) {
    // assert l == last && l != null;
    final E element = l.item;
    final Node<E> prev = l.prev;
    l.item = null;
    l.prev = null; // help GC
    last = prev;
    if (prev == null)
        first = null;
    else = null;
    return element;

E unlink(Node<E> x) {
    // assert x != null;
    final E element = x.item;
    final Node<E> next =;
    final Node<E> prev = x.prev;

    if (prev == null) {
        first = next;
    } else { = next;
        x.prev = null;

    if (next == null) {
        last = prev;
    } else {
        next.prev = prev; = null;

    x.item = null;
    return element;

LinkedList also implements the Deque interface which inheirts from Queue. So it also supports pop, push and peek


Set doesn’t implement any function like colleciton. Set is just a concept: elements cannot be duplicated

e.g. HashSet, LinkedHashSet, TreeSet


  • HashSet implements Set and it is based on HashMap.
  • Disorderd
  • Allow null element

source code:

private transient HashMap<E,Object> map;

private static final Object PRESENT = new Object();

So all add, reomve etc are the operaion of HashMap. The Iterator is just the keySet of HashMap

public Iterator<E> iterator() {
    return map.keySet().iterator();

public boolean contains(Object o) {
    return map.containsKey(o);

public boolean add(E e) {
    return map.put(e, PRESENT)==null;

public void clear() {


LinkedHashSet can use Link to keep the order of set elements.
LinkedHashSet is based on LinkedHashMap


The order of TreeSet is ( e1.compareTo(e2) == 0 ) TreeSet is based on TreeMap and the element must implement Comparable interface ( e1.compareTo(e2) == 0 )



It’s stored by hash table.

transient Node<K,V>[] table;

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

table is used for saving element. If any conflicts, save it to the next link table.

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = != null);
    return null;
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = == null) {
           = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                p = e;
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            return oldValue;
    if (++size > threshold)
    return null;


It’s similar to hashmap. The difference is it also have the following link tables:

transient LinkedHashMap.Entry<K,V> head;

transient LinkedHashMap.Entry<K,V> tail;


it’s based on red-black tree.


if only the weakHashMap has the reference of an element, it will automatically remove the element


Recently I began to use java again. It’s better to review the knowledge of java and make some notes for future quick review.

multiple inheritance

Java implements MI in a different. It allows you to implement multiple interfaces. But you can only inherit one implementation.

C++ supports multiple inheritances.

class oriented


class loader

java source (.java) will first be converted into Java bytecode (.class) by Java compiler (.javac). Then the .class file will be put into class loader

Classloader will load the class into JVM.

  1. Loading Strategy

JVM uses parent loading model.
If a class loader receives a loading request, it will first assign this request to parent class loader. All loaders follow the same rule. Only when the parent loader doesn’t contain any class, then the child loader will try to load it by itself.

Why we need it?
e.g. if someone changed the java.lang.String with some hacking code, without parent loader then JVN will use that hacking String class as the system String class.

BootStrapClassLoader is the top level classloader.

Tips for myself:

BootStrapClassLoader。它是最顶层的类加载器,是由C++编写而成, 已经内嵌到JVM中了。在JVM启动时会初始化该ClassLoader,它主要用来读取Java的核心类库JRE/lib/rt.jar中所有的class文件,这个jar文件中包含了java规范定义的所有接口及实现。
ExtensionClassLoader。它是用来读取Java的一些扩展类库,如读取JRE/lib/ext/*.jar中的包等(这里要注意,有些版本的是没有ext这个目录的)。 AppClassLoader。它是用来读取CLASSPATH下指定的所有jar包或目录的类文件,一般情况下这个就是程序中默认的类加载器。
CustomClassLoader。它是用户自定义编写的,它用来读取指定类文件 。基于自定义的ClassLoader可用于加载非Classpath中(如从网络上下载的jar或二进制)的jar及目录、还可以在加载前对class文件优一些动作,如解密、编码等。

很多资料和文章里说,ExtClassLoader的父类加载器是BootStrapClassLoader,其实这里省掉了一句话,容易造成很多新手(比如我)的迷惑。严格来说,ExtClassLoader的父类加载器是null,只不过在默认的ClassLoader 的 loadClass 方法中,当parent为null时,是交给BootStrapClassLoader来处理的,而且ExtClassLoader 没有重写默认的loadClass方法,所以,ExtClassLoader也会调用BootStrapLoader类加载器来加载,这就导致“BootStrapClassLoader具备了ExtClassLoader父类加载器的功能”。

execution engine

execute the java bytecode

Runtime Data Areas

It’s the memory section during runtime.
runtime data areas

  • Method Area

It stores structured info. E.g. constant pool, static variables constructors.

  • Heap

It stores all java instances and objects. It’s the main area for GC.

Tips: Method area and Heap are shared by all processes

  • Stack

Once a thread has been setup, JVM will build a stack for this thread.

Foreach Stack, it contains multiple stack frame. Each function will build one java stack frame. It will save temp variables, operation stack and return value. The start and end of one function are mapping to the stack entering and stack exit.

  • PC Register It’s used to saving the memory address of the current process. It can make sure that after the thread switching the process still can recover back to the previous status.
  • Native Method Stack It’s similar to stack but only used for JVM native methods

Memory assignment

JVM will first assign a huge space and all new operation will reassign and release within this space. It reduces the time of calling system and is similar to the memory pool. Secondly, it introduces the concept of GC.

  • Static memory

If the memory size can be defined during compiling, it’s a static memory. It means the memory is fixed. e.g. int

  • Dynamic memory Only knows the memory size when executing it. e.g. object memory space

Stack, PC register, and stack frame are the private process. Once the process died, stack frame will be closed and memory will be released.

But Heap and method areas are different. Only during execution can we know the size of objects. So the memory of this section should be dynamic – GC

In a nutshell, the memory size of the stack is fixed so there is no memory collection problem. But the memory size of the heap is uncertain so it will have a problem of GC.

GC strategy

  1. Mark-sweep

Mark all collected object and then collect. It’s the basic method.

Cons: inefficient, after cleaning there will be a lot of memory fragmentation

  1. Copying

Divide the space into two equal spaces. Only use one of those spaces. During GC, copying all active object to another space. Pros: no memory fragmentation Cons: double memory space

  1. Mark-Compact

stage1: mark all referenced objects
stage2: go through the entire heap, clean all marked objects and compress all alive objects into a block with orders

Pros: no memory fragmentation and double memory space issue

  1. Generational Collection

This is the currently used strategy for java GC.

It divides the objects into different generations by the life cycle: Young Generation, Old Generation, and Permanent Generation.

Permanent Generation will save the class info. Young Generation and Old Generation are closely related to GC.

年轻代:是所有新对象产生的地方。年轻代被分为3个部分——Enden区和两个Survivor区(From和to)当Eden区被对象填满时,就会执行Minor GC。并把所有存活下来的对象转移到其中一个survivor区(假设为from区)。Minor GC同样会检查存活下来的对象,并把它们转移到另一个survivor区(假设为to区)。这样在一段时间内,总会有一个空的survivor区。经过多次GC周期后,仍然存活下来的对象会被转移到年老代内存空间。通常这是在年轻代有资格提升到年老代前通过设定年龄阈值来完成的。需要注意,Survivor的两个区是对称的,没先后关系,from和to是相对的。

年老代:在年轻代中经历了N次回收后仍然没有被清除的对象,就会被放到年老代中,可以说他们都是久经沙场而不亡的一代,都是生命周期较长的对象。对于年老代和永久代,就不能再采用像年轻代中那样搬移腾挪的回收算法,因为那些对于这些回收战场上的老兵来说是小儿科。通常会在老年代内存被占满时将会触发Full GC,回收整个堆内存。


GC generation