Word2Vec-based efficient privacy-preserving shared representation learning for federated recommendation system in a cross-device setting

Taek-Ho Lee, Suhyeon Kim, 이정혜, Chi-Hyuck Jun (2023) · information-sciences 651

EPP-FedW2V — Word2Vec 의 federated 확장 + negative sample mixing 으로 privacy-preserving sequential recommendation. Cross-device setting (각 사용자가 local 데이터 보관). DP/HE 같은 rigorous (비싼) 기법 없이 — embedding parameter 공유 + negative sample mixing 으로 direct leakage 방지. Sequential information 활용 → context-aware item representations. 추천 성능 약간 감소 + privacy 보호.

RQ: Federated recommendation 의 sequential information 활용 + privacy-preserving 의 동시 달성? Rigorous 기법 없이 단순 + 효율적 방법은?
방법론: Word2Vec (Skip-gram) + negative sample mixing + parameter sharing (embedding layers) + cross-device FL setup
데이터: Recommendation benchmark datasets, 분산 user 시나리오 시뮬레이션
주요 발견: (1) EPP-FedW2V > baselines (PPFRS) — 정확도 + privacy 동시. (2) Negative sample mixing 이 direct leakage 방지 — 구매 item 직접 노출 안 됨. (3) DP/HE 없이도 충분한 privacy. (4) Sequential info 가 context-aware 추천 핵심.
시사점: Cross-device FRS 의 practical starting point. 모바일·웨어러블 같은 대규모 분산 환경에 적합.

EPP-FedW2V 의 Word2Vec 기반 cross-device 연합 추천 + negative sample mixing 구조도.

요약

이 paper 는 이정혜 의 *3 기 SNU TEMEP 의 연합 추천 시스템. Taek-Ho Lee (HarmoAE 의 같은 제 1 저자) + Suhyeon Kim (W2V-LSA, RSAE 의 같은 저자) 의 시너지. Word2Vec + Federated + Sequential 의 결합.

방법론적 핵심: EPP-FedW2V.

(i) Word2Vec Skip-gram — item sequence (user’s purchase history) 를 word sequence 처럼 처리. 각 item 의 context embedding 학습. Sequential information 자동 포착.

(ii) Negative sample mixing: Skip-gram 에서 negative sampling 사용 — not-purchased items. 직접 purchased items 의 embedding update 만 server 로 보내면 direct leakage. EPP-FedW2V 는 negative sample updates 도 mix 해서 server 가 purchased vs negative 구분 못하도록.

(iii) Approximate updates for sensitive features — inference attack 방지.

(iv) Embedding parameter 만 공유 (item embeddings, model weights), raw user data 미공유.

핵심 발견: Recommendation 정확도가 비-privacy FedW2V (baseline) 대비 약간 감소 — 합리적 trade-off. Direct privacy leakage 방지. DP/HE/MPC 같은 비싼 cryptographic 방법 없이 practical efficiency.

이정혜 의 연구 궤적 안에서 이 paper 는 실타래 2 (FL) + 실타래 4 (Representation Learning) 의 추천 도메인 확장. Bilingual autoencoder-based efficient harmonization of multi-source private data for accurate predictive modeling 의 비-rigorous privacy 접근의 후속.

핵심 결과

방법	Privacy	정확도
Centralized W2V	None	best
FedW2V (no privacy)	weak	~best
EPP-FedW2V	strong (no rigorous tech)	약간 감소
DP-FedW2V	strong (DP-based)	큰 감소

Sequential info 활용 (context-aware)
Cross-device setting

방법론 노트

Word2Vec Skip-gram loss:

\mathcal{L} = -\log \sigma(v_o^\top v_c) - \sum_{n \in \text{neg}} \log \sigma(-v_n^\top v_c)

$v_c$ = center (current) item, $v_o$ = context (next) item, $v_n$ = negative sample.

EPP-FedW2V negative mixing: client 가 positive + negative embedding updates 모두 보냄 — server 가 둘 구분 불가능.

식별 가정: (i) Negative sample mixing 의 inference attack 차단, (ii) Skip-gram 의 sequential info 보존, (iii) cross-device 의 scale efficiency.

연구 계보

이 paper 는 (i) Mikolov et al. (2013) Word2Vec 본가, (ii) Bilingual autoencoder-based efficient harmonization of multi-source private data for accurate predictive modeling 의 비-rigorous privacy 접근 직접 선행, (iii) Ammad-Ud-Din et al. (2019) federated CF — 의 결합. 이정혜 의 연구 궤적 실타래 2 + 4 의 추천 응용.

인접 그래프

1-hop 이웃 8개

인물 4
방법론 1
논문 3

휠 = 확대/축소 · 드래그 = 이동 · hover = 라벨 · 클릭 = 페이지 이동

이 문서를 가리키는 페이지

논문 (2)

인물 (1)

이정혜