비지도학습과 딥러닝 | Unsupervised learning
인공지능의 비지도 학습 개념과 종류를 설명합니다.
전통적인 머신러닝과 딥러닝에서의 특징 및 차이를 소개하면서 K-means, 계층적 클러스터링,밀도 추정과 같은 방법을 정리합니다.
딥러닝에서의 특성 공학과 표현 학습의 차이, 고차원 데이터 개념을 설명하면서 딥러닝의 표현은 설명하기 어려운 경우가 많다는 점을 이야기합니다.
In Traditional Machine Learning
- K-means clustering
- Hierarchical clustering
- Density estimation
- PCA
특징
- Low dimensional data
- Simple concepts
In Deep Learing
Feature Engineering vs. Representation Learning
- Feature engineering
- By human
- Domain knowledge & Creactivity
- Brainstorming
- Representation learning
- By machine
- Deep learning knowledge & coding skill
- Trial and error
Modern Unsupervised Learning
- High dimensional data
- Difficult concepts ➔ Not well understood, but surprisingly good performance
- Deep learning
- Unsupervised representation learning
Representation in Deep Learning
- Deep learning representation is under constrained
- Simple SGD can find one of the useful networks
- Representation characteristics can be adjusted if needed
- Learned representation becomes difficult to understand
- Disentangled representation
- Alinged
- Independent
- Subspaces
- Possible because severaly underconstrained
Angle Information
- 0 ~ 2π
- Algorithm thinks : 0 and 2π are different / 0 and 1.9π are far
- (x1, x2) = (cos(θ), sin(θ))
- 0 and 2π are the same
- 0 and 1.9π are close
Spatial Information
- Goal : Represent as mathematical object
Human Representation Problems
- Human can understand
- Human can design with a goal
➔ Good representation in deep learning? : Useful and irrelevant
A Well Defined Task
- Typically, only on attribute of interest is considered as y
- Imagenet - class
- y is well defined because it is simply defined as human selected label
- Good representation - a vague concept (Supervised)
- Even when y is well defined, what do we want for hi and h2?
- Simply say “representation learning successful” if good performance?
- But then there is almost nothing we can sy about hi and h2
- Other than saying “useful information has been well curated”
- Is there anything we can say or pursue?
- For a general purpose, what is a good representation?
Information Bottleneck
- For a well defined supervised task, what should hi and h2 satisfy?
- Good representation - a vague concept (Unsupervised)
- For a general purpose, whawt is a good representation?
- General purpose often defined as a list of downstream tasks?
- So, we go back to good performance for the tasks of interest?
Representation
- What we want: a formal definition and evaluation metrics for representation
- Reality : No definition, task dependent evaluation methods
Unsupervised Representation Learning
- Unsupervised performance ≈ supervised performance
- For linear evaluation
- Thanks to instance discrimination, contrastive loss, and aggressive augmentation
- As in supervised learning
- Performance metric can be unclear
- Design of surrogate loss is an art (some principled; some hueristics based)
- Training techinique development continuing (but augmentation methods are dominating)
- NLP
- Masked language modeling
- What next?
- Unsupervised representation learning
- Still a long way to go…
Reference
본 포스팅은 LG Aimers 프로그램에서 학습한 내용을 기반으로 작성되었습니다. (전체 내용 X)
- LG Aimers AI Essential Course Module 3. 비지도학습, 서울대학교 이원종 교수
This post is licensed under CC BY 4.0 by the author.