An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data

이정혜, Inyoung Choi, Chi-Hyuck Jun (2021) · Expert Systems with Applications · DOI ↗

이정혜 실타래 1 (Feature Selection / MB) 완성형 — 67 회 인용. MB 기반 멀티변량 feature ranking — 단변량 ranking (relevance only) 의 한계 (redundancy 미고려) 를 MB 의 minimal sufficient set 으로 보완. 6 개 암 분류 마이크로어레이 데이터에서 단변량 ranking 보다 우월 + 다른 멀티변량 ranking 보다 데이터 효율 우월.

RQ: 마이크로어레이 고차원 + 적은 sample 환경에서 효율적인 멀티변량 feature ranking 은? 단변량 ranking 의 redundancy 문제 를 MB 로 해결 가능한가?
방법론: MB 의 공식 relevance 정의 를 feature ranking method 로 embed
데이터: 6 개 암 분류 마이크로어레이 데이터셋 (수만 features, 수십-수백 samples)
주요 발견: (1) MB-ranking > 단변량 ranking — 6 데이터셋 일관 우위. (2) MB-ranking 의 데이터 효율 도 우월 — 다른 멀티변량 ranking 보다 적은 samples 로 robust ranking. (3) Multiclass classification 도 처리 가능.
시사점: 마이크로어레이 + 유전체 분석에 MB-ranking 의 실용성. 비암 도메인 (이미지, 텍스트) 의 고차원 ranking 에도 확장 가능.

고차원 마이크로어레이의 gene selection 을 위한 MB 기반 멀티변량 ranking 구조.

요약

이 paper 는 이정혜 의 실타래 1 (Feature Selection / MB) 의 완성형 — 67 회 인용. Ranking (모든 features 의 순위) 이 selection (subset 결정) 과 다른 task. 본 paper 는 MB 의 minimal sufficient set 개념을 ranking method 로 일반화.

방법론적 핵심: MB-based multivariate ranking. 기존 ranking 방법:

Univariate (relevance only): t-test, Fisher score, mutual information — 빠르나 redundancy 무시
Multivariate (relevance + redundancy): mRMR, conditional entropy — 정확하나 high computation

본 paper 의 MB-ranking: 각 feature 의 MB membership likelihood 를 ranking score 로. Relevance + redundancy 동시 처리 + 데이터 효율 우월 (다른 multivariate 보다 적은 samples 로 stable ranking).

핵심 발견: 6 개 암 분류 마이크로어레이 (leukemia, breast, lung, etc.) 데이터에서 MB-ranking 이 univariate (t-test, MI) 와 multivariate (mRMR) 모두 outperform. 특히 적은 sample (< 100) 에서 우월 — 데이터 효율.

이정혜 의 연구 궤적 안에서 이 paper 는 실타래 1 의 완성형 — Classification of High Dimensionality Data through Feature Selection Using Markov Blanket (selection) + Markov blanket-based universal feature selection for classification and regression of mixed-type data (mixed-type) → An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data (ranking). 이후 kyu-jin-kim-2024-cafo (KDD 2024 CAFO 시계열) 로 MB 라인이 시계열에서 부활.

핵심 결과

방법	정확도	데이터 효율
Univariate (t-test, MI)	평균	평균
Multivariate (mRMR, CE)	좋음	평균
MB-ranking	최고	최고

6 암 분류 마이크로어레이 일관 우위
인용 수 67

방법론 노트

MB-ranking score for feature $X_i$ :

\text{Rank}(X_i) = f(\text{MB membership likelihood}(X_i | T))

기존 MB algorithm (Inter-IAMB) 으로 MB 후보 features 식별 후, 조건부 dependence strength 로 ranking. 모든 features 의 순위 제공 (selection 은 binary 결정).

식별 가정: (i) MB 의 minimal sufficient property, (ii) ranking score 의 consistent ordering 보장, (iii) 데이터 효율 (적은 samples 에서 안정).

연구 계보

이 paper 는 (i) Markov blanket-based universal feature selection for classification and regression of mixed-type data · Classification of High Dimensionality Data through Feature Selection Using Markov Blanket 직접 선행, (ii) Peng et al. (2005) mRMR 정통, (iii) Bolón-Canedo et al. (2014) microarray feature selection review — 의 결합. 이정혜 의 연구 궤적 실타래 1 의 완성형.

인접 그래프

1-hop 이웃 10개

인물 3
방법론 2
수록처 1
논문 4

휠 = 확대/축소 · 드래그 = 이동 · hover = 강조 · 클릭 = 페이지 이동

이 문서를 가리키는 페이지

논문 (4)

인물 (1)

이정혜