電腦視覺與影像處理

Ch5. Processing and Convolution

1. Nosie Types

2. Low and High Pass Filter

3. Low : Goal - Denoise

3.1 Low : average blur

對 Window 內的所有 pixel 值平均，當作中心值大小

3.2 Low : median Blur

將 Mask 遮到的數據先做由小到大的排序，接著直接對此數列取 Median 取代中間數據
經常用於去除圖像或者其他信號中的雜訊，中值濾波器在脈衝雜訊（impulse）出現時，特別有用，因為脈衝雜訊看起來像是疊加在影像上的白點和黑點，所以又稱為胡椒鹽式雜訊（salt-andpepper noise）

3.3 Low : Gaussian Blur

前提：Normal distribution（Gaussian distribution）
- 正態分佈的前世今生
- 會有這樣結果的原因為 Central Limit Theorem，不管從哪種 distribution (eg. uniform, exponential) 的母體中隨機取樣做平均，當樣本趨向無窮時，都會符合 normal distribution
Gaussian Filter
- 定義（在二維時，可以想像成兩個 gaussian distribution 相乘）：
- ${\displaystyle G(x,y)={\frac {1}{2\pi \sigma ^{2}}}e^{-(x^{2}+y^{2})/(2\sigma ^{2})}}$
- 計算平均值的時候，我們只需要將中心點作為原點，其他點按照其在高斯分佈曲線上的位置分配權重，就可以得到他的加權平均值
- 其中 $\sigma$ 沒有辦法利用公式確定是多少，必須根據圖片因素
😈 高斯濾波在低通濾波算法中有不錯的表現，但是其卻有另外一個問題，那就是只考慮了像素間的空間位置上的關係，因此濾波的結果會丟失邊緣的信息。這裡的邊緣主要是指圖像中主要的不同顏色區域（比如藍色的天空，黑色的頭髮等），而 Bilateral 就是在 Gaussian blur 中加入了另外的一個權重分部來解決這一問題。

3.4 Low : Bilateral Filter

定義
${\displaystyle I^{\text{filtered}}(x)=\frac {1}{W_{p}} \sum_{x_{i} \in \Omega } I(x_i)f_r(|I(x_{i})-I(x)|)g_{s}(|x_{i}-x|)}$
,而 normalization term, ${\displaystyle {W_{p}}}$ , 被定義為
${\displaystyle W_{p}=\sum_{x_{i} \in \Omega } {f_{r}(|I(x_{i})-I(x)|)g_{s}(|x_{i}-x|)}}$
- $I^\text{filtered}$ 表示過濾完的圖片
- ${\displaystyle I}$ 原圖
- ${\displaystyle x}$ 原圖座標
- $\Omega$ 以${\displaystyle x}$為中心的 window
- $f_r$ 加權為像素色差的高斯濾波器
- $g_s$ 加權為距離的高斯濾波器
為了直觀地了解高斯濾波與雙邊濾波的區別，我們可以從下列圖示中看出依據。假設目標源影像為下述左右區域分明的帶有噪聲的影像（由程式自動生成），藍色框的中心即為目標像素所在的位置，那麼當前像素處所對應的高斯權重與雙邊權重因子3D可視化後的形狀如後邊兩圖所示：
- 左圖為原始的噪聲影像；中間為高斯採樣的權重；右圖為Bilateral採樣的權重
- 從圖中可以看出 Bilateral 加入了相似程度分部以後可以將源影像左側那些跟當前像素差值過大的點給過濾掉，這樣就很好地保持了邊緣

4. High : Feature Extraction - Edge Detection

4.1 High : Sobel Filter

只用圖像的梯度作為判斷依據，且對於不同方向性的邊界是用分開的面罩(Mask)來做偵測，當梯度變化超過一個閥值，即判斷為邊界。
定義：
$\begin{equation} {G_{x} = {\begin{bmatrix} +1 & 0 & -1 \ +2 & 0 & -2 \ +1 & 0 & -1 \end{bmatrix}} * {A} \quad {\mbox{and}}\quad {G_{y}} = {\begin{bmatrix} +1 & +2 & +1 \ 0 & 0 & 0 \ -1 & -2 & -1 \end{bmatrix}} * {A} } \end{equation}$
圖像的每一個像素的橫向及縱向梯度近似值可用以下的公式結合，來計算梯度的大小
${ {G} ={\sqrt {G_{x} ^{2}+G_{y} ^{2}}}}$
用以下公式計算梯度方向
${ {\Theta } = \operatorname {arctan} \left({G_{y} \over {G_{x}} }\right)}$
圖示：

4.2 High : Scharr Filter

就是將 Sobel 的矩陣參數改變
定義：${ {G_{x}} = {\begin{bmatrix} +3 & 0 & -3 \ +10 & 0 & -10 \ +3 & 0 & -3 \end{bmatrix}}* {A} \quad {\mbox{and}}\quad {G_{y}} ={\begin{bmatrix} +3 & +10 & +3 \ 0 & 0 & 0 \ -3 & -10 & -3 \end{bmatrix}} * {A} }$
比 Sober 更加精確(accurate)

4.3 High : Canny Edge Detector

定義：1. 預處理圖片，轉換成灰階，並利用 Gaussian Blur 去除雜訊
1. 利用 Sobel filter 取得圖片每個 pixel 的梯度值和梯度方向
2. 利用非極大值抑制（Non-maximum suppression）尋找可能的邊緣
3. 根據兩個閾值選取 strong edge（確定的）和 weak edge（進一步判斷）
4. 選取和 strong edge 相連的 weak edge 當作確定的 edge</span>
3.利用非極大值抑制（Non-maximum suppression）尋找可能的邊緣
- 可以想像在一個 edge 的區域，附近的每個 pixel 都會具有非零的梯度值，如果將這些 pixel 都當作 edge，最後就會產生很粗的 edge
- 因此這一步的目的是在密集的候選位置中，找出最大值，再把其他去掉
  - 實作的方式是把每個 pixel 和梯度方向的鄰居比較梯度值，如果不是最大的，就去除。
4.5.根據兩個閾值選取 strong edge（確定的）和 weak edge（進一步判斷）、選取和 strong edge 相連的 weak edge 當作確定的 edge
- threshold1, threshold2，用來區分 strong edge 和 weak edge
  - 通常選擇 threshold2 / threshold1 = 1/2 ~ 1/3，例如 (70, 140), (70, 210)
- 這一步驟透過檢驗 weak edge 是否可以連到 strong edge 來判斷這個 edge 是否保留
  - 因此演算法反過來先從 strong edge 出發將 edge 延伸的方向（也就是垂直於梯度方向）的所有 weak edge 改成 strong edge
結果：

4.4 High : Difference of Gaussian

是一種將一個原始灰度圖像的模糊圖像從另一幅灰度圖像進行增強的算法，通過 DOG 以降低模糊圖像的模糊度
定義：一幅圖像的不同 $\sigma$ 的 Gaussian Blur 表示為：
$g1(x,y) = G_{ \sigma 1}(x,y) * f(x,y)$ , $ g2(x,y) = G_{\sigma 2}(x,y) * f(x,y)$, 若將上面濾波得到的 $g1$ 和 $g2$ 相減得到：
$g1(x,y) - g2(x,y)$ $= G_{\sigma1}(x,y)*f(x,y) - G_{\sigma2}(x,y) * f(x,y)$ $= (G_{\sigma1} - G_{\sigma2}) * f(x,y)$ $= DoG * f(x,y)$
在二維的情況下為 $DoG = f(u,v,\sigma ) = {\frac {1}{2\pi \sigma ^{2}}}\exp ^{-(u^{2}+v^{2})/(2\sigma ^{2})}-{\frac {1}{2\pi K^{2}\sigma ^{2}}}\exp ^{-(u^{2}+v^{2})/(2K^{2}\sigma ^{2})}$
- - 它從一個窄高斯減去一個寬高斯，是墨西哥帽小波的一個近似
結果：
在DOG算法中，它被認為是在模擬視網膜上的神經從影像中提取信息從而提供給大腦
但這個算法的一個主要缺點就是在調整圖像對比度的過程中信息量會減少
♻️ 大部分的邊緣銳化算子使用增強高頻信號的方法，但是因為隨機雜訊也是高頻信號，很多銳化算子也增強了雜訊。而 DOG 算法去除的高頻信號中包含了隨機雜訊，所以這種方法是最適合處理那些有高頻雜訊的圖像

5. Correlation and Convolution

filter相反，但當filter的x軸與y軸對稱時，在2-D的運算上就沒有差別了

6. Multi-Resolution：Downsampling (Encoder)

這裡就會使用到剛剛提到的 DoG 來當作工具

7. AutoEncoder – Convolutional Encoder and Decoder

Ch6. Image Transformes

preliminary：Canny Algorithm

🫡 把圖形轉成線，但門檻不要設太高，否則會把某些重要資訊過濾掉

1. Hough Transform

1.1 Circle

把每點都轉成圓圈表示( r 待定)
如此一來，就找到 baby 的頭了

1.2 Line

極座標：
表示法：（右上圖）影像空間中任何一個點，可以射出無限多條直線。從原點對這些線做垂線，可以發現 $r、\theta$ 會一直變動，（左上圖）把這些現 $r、\theta$ 記錄下來，就是在記錄一個點。如此一來，交點的 $r、\theta$ 就是代表一條影像空間中的線。
💯 根據上述兩種方法，我們就算遇到不完整的直線或圓，也能用 Hough Transform 判斷出直線和圓的大致位置（因為是使用 voting），此外，也可以使用在不規則形上。
做 Hough Transform 之前，通常會先使用 sobel filter，偵測出邊緣再執行
tracking vs detection
- 以 recognition 為例
- (先做) dectection 是做影像前處理、整張圖片偵測位置等等
- (後做) tracking 是對範圍比較小的部分做 prediction

2. World to Camera to Image Coordinates

Rotate Transformation
Affine Transformation
- 需要六個式子才能解 6 個未知數，所以最少需要 3 個點 6 個座標
Projective Transformation
- 非線性轉換，用線性的矩陣去逼近最佳化，越接近 groud truth 越好
- 此時 $h_{20}、h_{21}、h_{22}$ 就會有值了，而不再是 001

3. Intergral Image

$ii(x , y)$ : Sum of the pixels 左上角的 $(x , y)$.
$ii$圖定義
已知$ii$圖，求原圖：
求$ii$圖，先算 row 的 sum、再求 column 的 sum
可以幫助我們計算 Gaussian Model 時，$cov[x] = E[x^2] - (E[x])^2$ 的計算速度
也可以加速圖形的 filter 類型的 feature 萃取

Ch7. Histograms and Matching

1. Histogram

1.1 Basic Infomation

2D → 1D + 1D、研究 probability
適當的連續區間
- 每一點表示，每一個 pixel 顏色的其中兩個座標（可能是HSV三個值中的其中兩個值）
不恰當的情況
機率標準化 Normalized to probability $p()=1.0$
1. 整塊 histogram 的 probability density 標準化為 1.0
2. 最高的 bin of histogram 標準化為 1.0
受光線影響下的 RGB 與 HSV histograms
- 可以看出 HSV 相較穩定
四種 metrics 去計算兩個 histograms 的 matching
- Correlation
- Chi-Square
- Intersection
- Bhattacharyya

1.2 Back Projection

A way of recording how well the pixels of a given image fit the distribution (probability) of pixels in a histogram model

Training database
- 如果框起藍色區域的 HS 值，可以得到右下圖的兩個 1D 的機率分布（橫軸為 H、縱軸為 S）
Test image（使用右上角的 distribution）
- 輸入左圖，可以得到右圖的機率分布

1.3 Histogram Equalization

A method that improves the contrast in an image

original 低對比：
equalized 結果：
作法使用 cumulative density function 去 equalize a Gaussian distribution

2. Matching

如要找到狗狗的頭
兩種方式：
- SSD：Sum-of-Squared Differences
  - ex：Normalized by Gaussian Model (Varice)
- MSE：Mean Squared Error
  - ex：Loss function

3. AI Bayesian Decision Rule

Ch11. Camera Modeling and Camera Calibration

1. Image and Signal Processing (ISP)

鏡頭參數
螢幕解析度
Pine Hole Camera Model
World to Camera to Image Coordinates
Projective Geometry

2. Homogenous Coordinates

Homogenous coordinates（Matrix） are a mechanism that allows us to associate points and vectors in space with vectors in $R_Real$

$x\prime=z\prime\frac{x}{z}, y\prime=z\prime\frac{y}{z}, z\prime=z\prime$
Location/Point u = Unit vector * Amplitude
Advantage of Homogenous
- 用矩陣相乘的方法表達 rigid transformation
- 很容易的做 optimization
Solution/Optimization of Homogenous Matrix
- 1.Closed-Form Solution
  - 等於 0 用
  - $Ax = 0$ =>
  - $A^tA$ = Covariance Matrix = $SVD$ = $UWU^T$
  - Smallest eigenvalue > 0 => eigenvector
- 2.Pseudo Inverse
  - 不等於 0 用
  - $Ax = b$ =>
  - $x=(A^TA)^{-1}* A^Tb$
- 3.Sum of Squared Difference
  - 利用 max likelihood – exponential term 來逼近參數
  - $\min E = \sum(Ax-b)^2$
    - $Ax = b’$: estimation value. $b$: ground truth, $\min E = \sum( b’-b)^2$
      - linear approach：Pseudo Inverse
      - non-linear approach：LM（Levenberg-Marquardt Algorithm）
    - $Ax = b’$: estimation value. $b’’$: estimation value, $\min E = \sum( b’-b’’)^2$
      - EM（Expected-Maximization）
- 4.Lagrange Approach (outlier) with constraint
  - 多一個 $\lambda(x^2 +y^2)$ 可以減少右邊兩圖 Variance 較高的情況
  - $\min E = \sum(Ax-b)^2 + \lambda(x^2 +y^2)$

3. Camera Calibration

2D Image Coordinate 與 3D World Coordinate 的轉換

目的為產生 projection matrix, intrinsic and extrinsic parameters 的估計
方法為上方提到的 Homograph coordinate 來解
Undistortion
- 扭曲的原因
  - Radial(輻射狀的) Distortion
  - Tangential(切線的) Distortion
結論 Calibration Procedure
- 1.Print a pattern and attach it to a planar surface.
  - 就是把板子印出來
- 2.Take a few (15~20) images of the model plane under different orientations by moving either the plane or the camera.
  - 拍很多照片
- 3.Detect the feature points (corner points) in the images.
  - 擷取出內部邊角的點
- 下方之後為數學運算
- 4.Estimate the five intrinsic parameters and all the extrinsic parameters using the closed-form solution.
  - 找出內部外部參數
- 5.Estimate the coefficients of the radial distortion by solving the linear least-squares – Pseudo Inverse $k = (D^TD)^{-1}D^Td$
  - 找出 distortion 參數
- 6.Refine all parameters by minimizing Sum of Squared Difference SSD $\sum_{i=1}^{n} \sum_{j=1}^{m} | m_{ij} - \hat m(A,k_1,k_2,R_i,t_i,M_j) | ^ 2$
  - 4.5.兩項，利用這個方式逼近參數(如深度學習)
- Result：
Rodrigues
- 旋轉向量與旋轉矩陣可以通過 Rodrigues 變換進行轉換

Ch12. 3D Sensor

1. Introduction

Motivation
- 藉由左右2顆攝影機的視差所求得的對應位置的 disparity，可用來估計拍攝場景中的物體的深度差異(3D)
- 如果物體本身沒有結構上的差異（分不出移動於否），就需要 Structured Light 結構光先打在物體上

2. Stereo

在一個 stereo system，我們有兩張左右的圖片，而目的就有：
- 場景的 3D structure
- 兩個 camera 的相對位置
Simple Stereo system
- $Z = \frac{f * B}{d}$
- 利用相似三角形比例公式
- 也可以改變鏡頭的位置，不一定要平行放置
  - 上圖可以更利於近物的 3D 建構
- Ground Turth：
- 實際結果：
- 參考影片：Simple Stereo, Camera Calibration