看1级片

2025 iThome 鐵人賽

DAY 17

0

自我挑戰組

資料視覺化的探索之旅：從 ggplot2 技術到視覺化設計系列第 17 篇

面對重疊資料的挑戰：Overplotting 的處理策略

17th鐵人賽 ggplot2 overplotting

insightdeep

2025-09-17 17:36:38

64 瀏覽

分享至

在資料視覺化中，「重疊 (overplotting)」 是一個常見的挑戰。當資料點數量龐大，或因為四捨五入、量測精度不足，導致許多觀測值完全相同時，單純的散佈圖往往無法清楚呈現訊息。Claus O. Wilke 在《Fundamentals of Data Visualization》中針對這個問題提供了幾個值得參考的方向，今天跟大家分享相關的參考心得。

/upload/images/20250917/20177964y01L8abB4u.png

1. 半透明與抖動（Transparency & Jittering）

最直觀的處理方式是讓點具備部分透明度，當重疊發生時，點會因為層層堆疊而顯得更深。這樣一來，我們至少可以觀察到密度的差異。

另一個輔助方法是抖動 (jittering)，將點在座標軸上隨機偏移一點點，讓被完全蓋住的點浮現出來。

不過，抖動需要小心：過度偏移會扭曲資料的真實位置，反而可能誤導讀者。

ggplot(mpg, aes(displ, hwy, colour = drv)) +
  geom_point(alpha = 0.4, position = position_jitter(width = 0.2, height = 0.2))

/upload/images/20250917/20177964x2aCZ8mUer.png

2. 二維直方圖與六角形分箱（2D Histogram / Hexbin）

當資料量極大時，透明與抖動往往仍不足以解決問題。此時，可以將座標平面分割成小區塊：

2D 直方圖 (geom_bin2d)：以矩形為單位，顏色深淺代表落在該區的數量。
Hexbin (geom_hex)：以六角形分割，通常比矩形更能平均分配點與中心的距離，因此分布視覺效果更自然。

/upload/images/20250917/20177964A9VaxSwRKw.png

3. 等高線（Contour）

另一個策略是估計資料的密度分布，再用等高線標示。這種方式在資料平滑變化時特別適合，能幫助讀者快速辨識集中區域與稀疏區域。

/upload/images/20250917/20177964PTZdOslvcU.png

若需要比較群組（如不同性別的樣本），可以使用不同顏色的等高線。但要注意：群組過多時，圖形容易變成「毛線球」，反而難以解讀。此時，分面 (faceting) 就是更好的做法。

/upload/images/20250917/20177964UkI791lJzk.png

小結

在處理這些資料過程中我特別有感的是：處理重疊資料沒有特別完美解方，而是必須依照資料量與分析目的來決定。

資料量中等：透明 + 抖動就能改善。
資料量極大：2D binning 或 hexbin 更適合。
分布平滑且需要分群比較：等高線與分面能帶來清晰解讀。

? English Abstract

Overplotting is a frequent challenge in data visualization, particularly when datasets are large or when values are recorded with limited precision. In such cases, multiple observations share identical positions on the plot, making important details invisible. Several strategies can be applied to address this issue. For small to moderate datasets, applying partial transparency or adding jitter can help reveal hidden points by varying intensity or slightly displacing positions. For larger datasets, binning approaches such as 2D histograms or hexagonal binning (hexbin) are more effective, as they summarize density with color gradients. Alternatively, contour plots highlight regions of varying point density, making them particularly useful for smooth distributions or group comparisons. However, this technique requires careful use to avoid overly complex visualizations when groups overlap heavily. Ultimately, no single method works universally; the choice depends on dataset size and analytical purpose. The recent ggplot2 4.0.0 release further enhances these techniques by improving position adjustments and aesthetic mappings, giving users greater flexibility in tackling overplotting.

系列文

資料視覺化的探索之旅：從 ggplot2 技術到視覺化設計共 19 篇

RSS系列文訂閱系列文

0 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

8048 篇

完賽人數

91 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 17th鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react

IT邦幫忙

HoME 看1级片ENTER NUMBET 001

I agree We use cookies on this website to enhance your user experience. By clicking any link on this page you are giving your consent for us to set cookies. More info