BERTopic


๐Ÿ“Š TEMEP wiki ์—์„œ 3 ํŽธ์˜ paper ์—์„œ ์ธ์šฉ

Grootendorst (2022) ๊ฐ€ ์ œ์•ˆํ•œ embedding-based topic modeling ๊ธฐ๋ฒ•. 4-step ํŒŒ์ดํ”„๋ผ์ธ: (1) Sentence-BERT (Devlin et al. 2019) embedding ์œผ๋กœ ๋ฌธ์„œ๋ฅผ dense vector ๋กœ ๋ณ€ํ™˜, (2) UMAP ์œผ๋กœ ์ฐจ์› ์ถ•์†Œ, (3) HDBSCAN ์œผ๋กœ density ๊ธฐ๋ฐ˜ cluster ์ƒ์„ฑ, (4) c-TF-IDF ๋กœ cluster representative term ์ถ”์ถœ. LDA ยท DTM ๋“ฑ frequency-based (Bag-of-Words) ๋ฐฉ๋ฒ•๊ณผ ๋‹ฌ๋ฆฌ contextual semantic ๋ณด์กด. Identifying Interdisciplinary Emergence in the Science of Science: Combination of Network Analysis and BERTopic ์—์„œ ํ•™์ œ๊ฐ„ ๊ณผํ•™ ๋ถ€์ƒ ๋ถ„์„์— ์‚ฌ์šฉ.

์ธ์ ‘ ๊ทธ๋ž˜ํ”„

1-hop ์ด์›ƒ 4๊ฐœ
  • ์ธ๋ฌผ 1
  • ๋…ผ๋ฌธ 3
Sira Maliphol BERTopic
ํœ  = ํ™•๋Œ€/์ถ•์†Œ ยท ๋“œ๋ž˜๊ทธ = ์ด๋™ ยท hover = ๋ผ๋ฒจ ยท ํด๋ฆญ = ํŽ˜์ด์ง€ ์ด๋™