Skip to contents

Pair-Wisely Compare Topic Solutions with Jaccard

Usage

compare_solutions(models, depth = 100)

Arguments

models

A list object with STM models. Must be at least 2 models.

depth

The number of top terms to use for comparison. Default is top 100 terms.

Value

A tibble with model-to-model topic-to-topic Jaccard similarity.

Examples

library(stm)
#> stm v1.3.7 successfully loaded. See ?stm for help. 
#>  Papers, resources, and other materials at structuraltopicmodel.com
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union

modA <- stm(poliblog5k.docs, 
            poliblog5k.voc, K=25,
            prevalence=~rating, 
            data=poliblog5k.meta,
            max.em.its=2, 
            init.type="Random",
            seed = 9934) 
#> Beginning Random Initialization 
#> ....................................................................................................
#> Completed E-Step (1 seconds). 
#> Completed M-Step. 
#> Completing Iteration 1 (approx. per word bound = -8.185) 
#> ....................................................................................................
#> Completed E-Step (1 seconds). 
#> Completed M-Step. 
#> Model Terminated Before Convergence Reached 
           
modB <- stm(poliblog5k.docs, 
            poliblog5k.voc, K=25,
            prevalence=~rating, 
            data=poliblog5k.meta,
            max.em.its=2, 
            init.type="Random",
            seed = 9576) 
#> Beginning Random Initialization 
#> ....................................................................................................
#> Completed E-Step (1 seconds). 
#> Completed M-Step. 
#> Completing Iteration 1 (approx. per word bound = -8.163) 
#> ....................................................................................................
#> Completed E-Step (1 seconds). 
#> Completed M-Step. 
#> Model Terminated Before Convergence Reached 
           
compare_solutions(list(modA, modB), depth=100) |> 
arrange(desc(jaccard)) |> 
head()
#> # A tibble: 6 × 5
#>   model_id_A topic_id_A model_id_B topic_id_B jaccard
#>   <chr>      <chr>      <chr>      <chr>        <dbl>
#> 1 mod_1      topic_8    mod_2      topic_23    0.111 
#> 2 mod_1      topic_24   mod_2      topic_20    0.105 
#> 3 mod_1      topic_24   mod_2      topic_23    0.0989
#> 4 mod_1      topic_25   mod_2      topic_6     0.0989
#> 5 mod_1      topic_10   mod_2      topic_20    0.0929
#> 6 mod_1      topic_18   mod_2      topic_15    0.0929