| Title: | Topic Modeling with 'BERTopic' |
|---|---|
| Description: | Provides topic modeling and visualization by interfacing with the 'BERTopic' library for 'Python' via 'reticulate' (Grootendorst 2022 <doi:10.48550/arXiv.2203.05794>). |
| Authors: | Teodor Petrič [aut, cre] (ORCID: <https://orcid.org/0000-0002-4397-9365>) |
| Maintainer: | Teodor Petrič <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.6 |
| Built: | 2026-05-23 07:11:58 UTC |
| Source: | https://github.com/tpetric7/bertopicr |
Sets DYLD_FALLBACK_LIBRARY_PATH to Homebrew's zlib lib directory. This can help reticulate find compatible libraries on macOS.
configure_macos_homebrew_zlib(quiet = FALSE)configure_macos_homebrew_zlib(quiet = FALSE)
quiet |
Logical. If TRUE, suppresses messages. |
Logical. TRUE if the environment was updated, FALSE otherwise.
This function finds the most similar topics to given keywords using a BERTopic model and returns the results in a data frame or tibble format.
find_topics_df(model, queries, top_n = 10, return_tibble = TRUE)find_topics_df(model, queries, top_n = 10, return_tibble = TRUE)
model |
A BERTopic model object. Must be passed from the calling environment. |
queries |
A vector of keywords or phrases to query the topics for. |
top_n |
Number of top similar topics to retrieve for each query. Default is 10. |
return_tibble |
Logical. If TRUE, returns a tibble. If FALSE, returns a data.frame. Default is TRUE. |
A data.frame or tibble with columns for the keyword, topics, and similarity scores for each query.
# Example of finding similar topics using a BERTopic model if (exists("topic_model")) { queries <- c("national minority", "minority issues", "nationality issues") find_topics_df(model = topic_model, queries = queries, top_n = 10) } else { message("No topic_model found. Please load a BERTopic model and try again.") }# Example of finding similar topics using a BERTopic model if (exists("topic_model")) { queries <- c("national minority", "minority issues", "nationality issues") find_topics_df(model = topic_model, queries = queries, top_n = 10) } else { message("No topic_model found. Please load a BERTopic model and try again.") }
This function retrieves document information from a BERTopic model and processes it to unnest list columns, replace NA values, and consolidate columns with the same prefix.
get_document_info_df(model, texts, drop_expanded_columns = TRUE)get_document_info_df(model, texts, drop_expanded_columns = TRUE)
model |
A BERTopic model object. |
texts |
A character vector containing the preprocessed texts to be passed to the BERTopic model. |
drop_expanded_columns |
Logical. If TRUE, drops the expanded columns after consolidation. Default is TRUE. |
A data.frame or tibble with unnested and consolidated columns.
## Not run: document_info_df <- get_document_info_df(model = topic_model, texts = texts_cleaned, drop_expanded_columns = TRUE) print(document_info_df) ## End(Not run)## Not run: document_info_df <- get_document_info_df(model = topic_model, texts = texts_cleaned, drop_expanded_columns = TRUE) print(document_info_df) ## End(Not run)
This function filters a given data frame to select the most representative documents for a specified topic based on their probability scores. The documents are sorted by relevance in descending order, and the top n documents are returned.
get_most_representative_docs(df, topic_nr, n_docs = 5)get_most_representative_docs(df, topic_nr, n_docs = 5)
df |
A data frame containing at least the columns 'Topic', 'Document', and 'probs'. |
topic_nr |
An integer specifying the topic number to filter the documents. |
n_docs |
An integer specifying the number of top representative documents to return. Defaults to 5. |
A vector of the most representative documents corresponding to the specified topic.
If the number of documents available is less than n_docs, all available documents are returned.
## Not run: # Assuming `df_docs` is a data frame with columns `Topic`, `Document`, and `probs` get_most_representative_docs(df_docs, topic_nr = 3, n_docs = 5) ## End(Not run)## Not run: # Assuming `df_docs` is a data frame with columns `Topic`, `Document`, and `probs` get_most_representative_docs(df_docs, topic_nr = 3, n_docs = 5) ## End(Not run)
This function filters a given data frame to select a specified number of representative documents from a particular topic. It uses random sampling to select the documents.
get_representative_docs_custom(df, topic_nr, n_docs)get_representative_docs_custom(df, topic_nr, n_docs)
df |
A data frame containing at least the columns 'Topic' and 'Document'. |
topic_nr |
An integer specifying the topic number to filter the documents. |
n_docs |
An integer specifying the number of documents to sample for the specified topic. |
A vector of sampled documents corresponding to the specified topic.
## Not run: # Assuming `df_docs` is a data frame with columns `Topic`, `Document`, and `probs` get_representative_docs_custom(df_docs, topic_nr = 3, n_docs = 5) ## End(Not run)## Not run: # Assuming `df_docs` is a data frame with columns `Topic`, `Document`, and `probs` get_representative_docs_custom(df_docs, topic_nr = 3, n_docs = 5) ## End(Not run)
This function retrieves a specified number of words with high probability for a given topic number from a BERTopic model and returns the results in a data frame or tibble format.
get_topic_df(model, topic_number = 0, top_n = 10, return_tibble = TRUE)get_topic_df(model, topic_number = 0, top_n = 10, return_tibble = TRUE)
model |
A BERTopic model object. Must be passed from the calling environment. |
topic_number |
The topic number for which words and scores are retrieved. |
top_n |
Number of top words to retrieve for the specified topic. Default is 10. If greater than 10, it will be set to 10 as BERTopic returns a maximum of 10 words. |
return_tibble |
Logical. If TRUE, returns a tibble. If FALSE, returns a data.frame. Default is TRUE. |
A data.frame or tibble with columns for the word, score, and topic number.
## Not run: # Example usage: if (exists("topic_model")) { topic_df <- get_topic_df(model = topic_model, topic_number = 3, top_n = 5) print(topic_df) } else { message("No topic_model found. Please load a BERTopic model and try again.") } ## End(Not run)## Not run: # Example usage: if (exists("topic_model")) { topic_df <- get_topic_df(model = topic_model, topic_number = 3, top_n = 5) print(topic_df) } else { message("No topic_model found. Please load a BERTopic model and try again.") } ## End(Not run)
This function retrieves topic information from a BERTopic model and processes it to unnest list columns, replace NA values, and consolidate columns with the same prefix.
get_topic_info_df(model, drop_expanded_columns = TRUE)get_topic_info_df(model, drop_expanded_columns = TRUE)
model |
A BERTopic model object. |
drop_expanded_columns |
Logical. If TRUE, drops the expanded columns after consolidation. Default is TRUE. |
A data.frame or tibble with unnested and consolidated columns.
## Not run: topic_info_df <- get_topic_info_df(model = topic_model, drop_expanded_columns = TRUE) print(topic_info_df) ## End(Not run)## Not run: topic_info_df <- get_topic_info_df(model = topic_model, drop_expanded_columns = TRUE) print(topic_info_df) ## End(Not run)
This function retrieves all topics from a BERTopic model and converts them into a data frame or tibble format.
get_topics_df(model, return_tibble = TRUE)get_topics_df(model, return_tibble = TRUE)
model |
A BERTopic model object. Must be passed from the calling environment. |
return_tibble |
Logical. If TRUE, returns a tibble. If FALSE, returns a data.frame. Default is TRUE. |
A data.frame or tibble with columns for the word, score, and topic number across all topics.
## Not run: topics_df <- get_topics_df(model = topic_model) print(topics_df) ## End(Not run)## Not run: topics_df <- get_topics_df(model = topic_model) print(topics_df) ## End(Not run)
Load a BERTopic model saved with save_bertopic_model() along with its
companion RDS file containing R-side extras.
load_bertopic_model(path, embedding_model = NULL)load_bertopic_model(path, embedding_model = NULL)
path |
Directory path where the Python model was saved. |
embedding_model |
Optional embedding model to pass through to
|
A list with two elements: model (the BERTopic model) and extras
(the R-side data saved in the companion RDS file).
## Not run: loaded <- load_bertopic_model("topic_model") doc_info <- get_document_info_df(model = loaded$model, texts = docs) ## End(Not run)## Not run: loaded <- load_bertopic_model("topic_model") doc_info <- get_document_info_df(model = loaded$model, texts = docs) ## End(Not run)
Persist a trained BERTopic model to disk and store R-side extras in a companion RDS file. This is the recommended way to reuse a model across sessions when working through reticulate.
save_bertopic_model(topic_model, path)save_bertopic_model(topic_model, path)
topic_model |
A list returned by |
path |
Directory path to write the Python model to. The RDS companion
file is saved as |
Invisibly returns TRUE after successful write.
## Not run: save_bertopic_model(topic_model, "topic_model") ## End(Not run)## Not run: save_bertopic_model(topic_model, "topic_model") ## End(Not run)
This function sets up a Python environment with all required packages for using the BERTopic model within the R package. It can create and activate a virtualenv or conda environment and then install the bundled requirements.
setup_python_environment( envname = "r-bertopic", python_path = NULL, method = c("virtualenv", "conda"), python_version = NULL, upgrade = TRUE, extra_packages = NULL )setup_python_environment( envname = "r-bertopic", python_path = NULL, method = c("virtualenv", "conda"), python_version = NULL, upgrade = TRUE, extra_packages = NULL )
envname |
The name of the Python environment. Default is "r-bertopic". |
python_path |
Optional path to a specific Python executable (virtualenv only). |
method |
Environment type to create and use. One of "virtualenv" or "conda". |
python_version |
Optional Python version for conda (e.g. "3.10"). |
upgrade |
Logical. If TRUE, passes –upgrade to pip installs. Default is TRUE. |
extra_packages |
Optional character vector of additional Python packages to install. |
Invisibly returns the active Python configuration.
This function creates embeddings with sentence-transformers, configures UMAP, HDBSCAN, and CountVectorizer, optionally wires a representation model, and fits a BERTopic model from R. The returned model can be used with bertopicr helpers.
train_bertopic_model( docs, embedding_model = "Qwen/Qwen3-Embedding-0.6B", embeddings = NULL, embedding_batch_size = 32, embedding_show_progress = TRUE, umap_model = NULL, umap_n_neighbors = 15, umap_n_components = 5, umap_min_dist = 0, umap_metric = "cosine", umap_random_state = 42, hdbscan_model = NULL, hdbscan_min_cluster_size = 50, hdbscan_min_samples = 20, hdbscan_metric = "euclidean", hdbscan_cluster_selection_method = "eom", hdbscan_gen_min_span_tree = TRUE, hdbscan_prediction_data = TRUE, hdbscan_core_dist_n_jobs = 1, vectorizer_model = NULL, stop_words = "all_stopwords", ngram_range = c(1, 3), min_df = 2L, max_df = 50L, max_features = 10000, strip_accents = NULL, decode_error = "strict", encoding = "UTF-8", representation_model = c("none", "keybert", "mmr", "ollama"), representation_params = list(), ollama_model = NULL, ollama_base_url = "http://localhost:11434/v1", ollama_api_key = "ollama", ollama_client_params = list(), ollama_prompt = NULL, top_n_words = 200L, calculate_probabilities = TRUE, verbose = TRUE, seed = NULL, timestamps = NULL, topics_over_time_nr_bins = 20L, topics_over_time_global_tuning = TRUE, topics_over_time_evolution_tuning = TRUE, classes = NULL, compute_reduced_embeddings = TRUE, reduced_embedding_n_neighbors = 10L, reduced_embedding_min_dist = 0, reduced_embedding_metric = "cosine", compute_hierarchical_topics = TRUE, bertopic_args = list() )train_bertopic_model( docs, embedding_model = "Qwen/Qwen3-Embedding-0.6B", embeddings = NULL, embedding_batch_size = 32, embedding_show_progress = TRUE, umap_model = NULL, umap_n_neighbors = 15, umap_n_components = 5, umap_min_dist = 0, umap_metric = "cosine", umap_random_state = 42, hdbscan_model = NULL, hdbscan_min_cluster_size = 50, hdbscan_min_samples = 20, hdbscan_metric = "euclidean", hdbscan_cluster_selection_method = "eom", hdbscan_gen_min_span_tree = TRUE, hdbscan_prediction_data = TRUE, hdbscan_core_dist_n_jobs = 1, vectorizer_model = NULL, stop_words = "all_stopwords", ngram_range = c(1, 3), min_df = 2L, max_df = 50L, max_features = 10000, strip_accents = NULL, decode_error = "strict", encoding = "UTF-8", representation_model = c("none", "keybert", "mmr", "ollama"), representation_params = list(), ollama_model = NULL, ollama_base_url = "http://localhost:11434/v1", ollama_api_key = "ollama", ollama_client_params = list(), ollama_prompt = NULL, top_n_words = 200L, calculate_probabilities = TRUE, verbose = TRUE, seed = NULL, timestamps = NULL, topics_over_time_nr_bins = 20L, topics_over_time_global_tuning = TRUE, topics_over_time_evolution_tuning = TRUE, classes = NULL, compute_reduced_embeddings = TRUE, reduced_embedding_n_neighbors = 10L, reduced_embedding_min_dist = 0, reduced_embedding_metric = "cosine", compute_hierarchical_topics = TRUE, bertopic_args = list() )
docs |
Character vector of documents to model. |
embedding_model |
Sentence-transformers model name or local path. |
embeddings |
Optional precomputed embeddings (matrix or array). |
embedding_batch_size |
Batch size for embedding encoding. |
embedding_show_progress |
Logical. Show embedding progress bar. |
umap_model |
Optional pre-built UMAP Python object. If NULL, one is created. |
umap_n_neighbors |
Number of neighbors for UMAP. |
umap_n_components |
Number of UMAP components. |
umap_min_dist |
UMAP min_dist parameter. |
umap_metric |
UMAP metric. |
umap_random_state |
Random state for UMAP. |
hdbscan_model |
Optional pre-built HDBSCAN Python object. If NULL, one is created. |
hdbscan_min_cluster_size |
HDBSCAN min_cluster_size. |
hdbscan_min_samples |
HDBSCAN min_samples. |
hdbscan_metric |
HDBSCAN metric. |
hdbscan_cluster_selection_method |
HDBSCAN cluster selection method. |
hdbscan_gen_min_span_tree |
HDBSCAN gen_min_span_tree. |
hdbscan_prediction_data |
Logical. Whether to generate prediction data. |
hdbscan_core_dist_n_jobs |
HDBSCAN core_dist_n_jobs. |
vectorizer_model |
Optional pre-built CountVectorizer Python object. |
stop_words |
Stop words for CountVectorizer. Use "all_stopwords" to load the bundled multilingual list, "english", or a character vector. |
ngram_range |
Length-2 integer vector for n-gram range. |
min_df |
Minimum document frequency for CountVectorizer. |
max_df |
Maximum document frequency for CountVectorizer. |
max_features |
Maximum features for CountVectorizer. |
strip_accents |
Passed to CountVectorizer. Use NULL to preserve umlauts. |
decode_error |
Passed to CountVectorizer when decoding input bytes. |
encoding |
Text encoding for CountVectorizer (defaults to "utf-8"). |
representation_model |
Representation model to use: "none", "keybert", "mmr", or "ollama". |
representation_params |
Named list of parameters passed to the representation model. |
ollama_model |
Ollama model name when representation_model = "ollama". |
ollama_base_url |
Base URL for the Ollama OpenAI-compatible endpoint. |
ollama_api_key |
API key placeholder for the Ollama OpenAI-compatible endpoint. |
ollama_client_params |
Named list of extra parameters passed to openai$OpenAI(). |
ollama_prompt |
Optional prompt template for the Ollama OpenAI representation. |
top_n_words |
Number of top words per topic to keep in the model. |
calculate_probabilities |
Logical. Whether to calculate topic probabilities. |
verbose |
Logical. Verbosity for BERTopic. |
seed |
Optional random seed. |
timestamps |
Optional vector of timestamps (Date/POSIXt/ISO strings or integer) for topics over time. Defaults to NULL (topics over time disabled). |
topics_over_time_nr_bins |
Number of bins for topics_over_time. |
topics_over_time_global_tuning |
Logical. Whether to enable global tuning for topics_over_time. |
topics_over_time_evolution_tuning |
Logical. Whether to enable evolution tuning for topics_over_time. |
classes |
Optional vector of class labels (character or factor) for topics per class. Defaults to NULL (topics per class disabled). |
compute_reduced_embeddings |
Logical. If TRUE, computes 2D and 3D UMAP reductions. |
reduced_embedding_n_neighbors |
Number of neighbors for reduced embeddings. |
reduced_embedding_min_dist |
UMAP min_dist for reduced embeddings. |
reduced_embedding_metric |
UMAP metric for reduced embeddings. |
compute_hierarchical_topics |
Logical. If TRUE, computes hierarchical topics. |
bertopic_args |
Named list of extra arguments passed to BERTopic(). |
A list with elements model, topics, probabilities, embeddings, reduced_embeddings_2d, reduced_embeddings_3d, hierarchical_topics, topics_over_time, and topics_per_class.
## Not run: setup_python_environment() texts <- c("Cats are great pets", "Dogs are loyal companions", "Markets fluctuate") fit <- train_bertopic_model(texts, embedding_model = "sentence-transformers/all-MiniLM-L6-v2") visualize_topics(fit$model, filename = "intertopic_distance_map", auto_open = FALSE) ## End(Not run)## Not run: setup_python_environment() texts <- c("Cats are great pets", "Dogs are loyal companions", "Markets fluctuate") fit <- train_bertopic_model(texts, embedding_model = "sentence-transformers/all-MiniLM-L6-v2") visualize_topics(fit$model, filename = "intertopic_distance_map", auto_open = FALSE) ## End(Not run)
This function visualizes the topics of a BERTopic model using Plotly and saves the output as an interactive HTML file. It checks for required Python modules and allows for custom file naming.
visualize_barchart( model, filename = "topics_topwords_interactive_barchart", open_file = FALSE )visualize_barchart( model, filename = "topics_topwords_interactive_barchart", open_file = FALSE )
model |
A BERTopic model object. Must be passed from the calling environment. |
filename |
A character string specifying the name of the HTML file to save the bar chart.
Default is "topics_topwords_interactive_barchart".
The |
open_file |
Logical. If TRUE, opens the HTML file after saving. Default is FALSE. |
Displays the interactive bar chart within the R environment and saves it as an HTML file.
## Not run: visualize_barchart(model = topic_model, filename = "custom_barchart", open_file = TRUE) ## End(Not run)## Not run: visualize_barchart(model = topic_model, filename = "custom_barchart", open_file = TRUE) ## End(Not run)
This function visualizes the topic distribution for a specific document from a BERTopic model using Python's Plotly library. The visualization is saved as an interactive HTML file, which can be opened and viewed in a web browser.
visualize_distribution( model, text_id = 1, probabilities, filename = "topic_dist_interactive", auto_open = FALSE )visualize_distribution( model, text_id = 1, probabilities, filename = "topic_dist_interactive", auto_open = FALSE )
model |
A BERTopic model object. The model must have the method |
text_id |
An integer specifying the index of the document for which the topic distribution is visualized. Default is 1.
Must be a positive integer and a valid index within the |
probabilities |
A matrix or data frame of topic probabilities, with rows corresponding to documents and columns to topics. Each element represents the probability of a topic for a given document. |
filename |
A character string specifying the name of the HTML file to save the visualization. Default is "topic_dist_interactive". The .html extension will be added automatically. |
auto_open |
Logical. If TRUE, the HTML file will automatically open in the browser. Default is FALSE. |
The function does not return a value but saves an HTML file containing the visualization and displays it in the current R environment.
## Not run: # Assuming 'topic_model' is a BERTopic model object and 'probs' is a matrix of topic probabilities visualize_distribution( model = topic_model, text_id = 1, probabilities = probs, filename = "custom_filename", auto_open = TRUE) ## End(Not run)## Not run: # Assuming 'topic_model' is a BERTopic model object and 'probs' is a matrix of topic probabilities visualize_distribution( model = topic_model, text_id = 1, probabilities = probs, filename = "custom_filename", auto_open = TRUE) ## End(Not run)
This function generates a visualization of documents using a pre-trained BERTopic model. It uses UMAP to reduce the dimensionality of embeddings and Plotly for interactive visualizations.
visualize_documents( model = topic_model, texts = texts_cleaned, reduced_embeddings = reduced_embeddings, custom_labels = FALSE, hide_annotation = TRUE, filename = "visualize_documents", auto_open = FALSE )visualize_documents( model = topic_model, texts = texts_cleaned, reduced_embeddings = reduced_embeddings, custom_labels = FALSE, hide_annotation = TRUE, filename = "visualize_documents", auto_open = FALSE )
model |
A BERTopic model object. Default is 'topic_model'. |
texts |
A list or vector of cleaned text documents to visualize. Default is 'texts_cleaned'. |
reduced_embeddings |
A matrix of reduced-dimensionality embeddings. Typically generated using UMAP. Default is 'reduced_embeddings'. |
custom_labels |
A logical value indicating whether to use custom labels for topics. Default is FALSE. |
hide_annotation |
A logical value indicating whether to hide annotations in the plot. Default is TRUE. |
filename |
A string specifying the name of the HTML file to save the visualization. Default is "visualize_documents". |
auto_open |
A logical value indicating whether to automatically open the HTML file after saving. Default is FALSE. |
A Plotly visualization of the documents, displayed as an HTML file within the R environment.
## Not run: visualize_documents(model = topic_model, texts = texts_cleaned, reduced_embeddings = reduced_embeddings, custom_labels = FALSE, hide_annotation = TRUE, filename = "visualize_documents", auto_open = FALSE) ## End(Not run)## Not run: visualize_documents(model = topic_model, texts = texts_cleaned, reduced_embeddings = reduced_embeddings, custom_labels = FALSE, hide_annotation = TRUE, filename = "visualize_documents", auto_open = FALSE) ## End(Not run)
This function generates a 3D visualization of documents using a pre-trained BERTopic model and UMAP dimensionality reduction. It uses Plotly for interactive visualizations and saves the output as an HTML file.
visualize_documents_2d( model, texts, reduced_embeddings, custom_labels = FALSE, hide_annotation = TRUE, tooltips = c("Topic", "Name", "Probability", "Text"), filename = "visualize_documents_2d", auto_open = FALSE )visualize_documents_2d( model, texts, reduced_embeddings, custom_labels = FALSE, hide_annotation = TRUE, tooltips = c("Topic", "Name", "Probability", "Text"), filename = "visualize_documents_2d", auto_open = FALSE )
model |
A BERTopic model object. Default is 'topic_model'. |
texts |
A character vector or list of cleaned text documents to visualize. |
reduced_embeddings |
A matrix or data frame of reduced-dimensionality embeddings (2D). Typically generated using UMAP. |
custom_labels |
Logical. If TRUE, custom topic labels are used. Default is FALSE. |
hide_annotation |
Logical. If TRUE, hides annotations on the plot. Default is TRUE. |
tooltips |
A character vector of tooltips for hover information. Default is c("Topic", "Name", "Probability", "Text"). |
filename |
A character string specifying the name of the HTML file to save the visualization. Default is "visualize_documents_2d".
The |
auto_open |
Logical. If TRUE, opens the HTML file in the browser after saving. Default is FALSE. |
The function does not return a value but saves an HTML file containing the visualization and displays it in the current R environment.
## Not run: visualize_documents_2d(model = topic_model, texts = texts_cleaned, reduced_embeddings = embeddings, custom_labels = FALSE, hide_annotation = TRUE, filename = "plot", auto_open = TRUE) ## End(Not run)## Not run: visualize_documents_2d(model = topic_model, texts = texts_cleaned, reduced_embeddings = embeddings, custom_labels = FALSE, hide_annotation = TRUE, filename = "plot", auto_open = TRUE) ## End(Not run)
This function generates a 3D visualization of documents using a pre-trained BERTopic model and UMAP dimensionality reduction. It uses Plotly for interactive visualizations and saves the output as an HTML file.
visualize_documents_3d( model, texts, reduced_embeddings, custom_labels = FALSE, hide_annotation = TRUE, tooltips = c("Topic", "Name", "Probability", "Text"), filename = "visualize_documents_3d", auto_open = FALSE )visualize_documents_3d( model, texts, reduced_embeddings, custom_labels = FALSE, hide_annotation = TRUE, tooltips = c("Topic", "Name", "Probability", "Text"), filename = "visualize_documents_3d", auto_open = FALSE )
model |
A BERTopic model object. Default is 'topic_model'. |
texts |
A character vector or list of cleaned text documents to visualize. |
reduced_embeddings |
A matrix or data frame of reduced-dimensionality embeddings (3D). Typically generated using UMAP. |
custom_labels |
Logical. If TRUE, custom topic labels are used. Default is FALSE. |
hide_annotation |
Logical. If TRUE, hides annotations on the plot. Default is TRUE. |
tooltips |
A character vector of tooltips for hover information. Default is c("Topic", "Name", "Probability", "Text"). |
filename |
A character string specifying the name of the HTML file to save the visualization. Default is "visualize_documents_3d".
The |
auto_open |
Logical. If TRUE, opens the HTML file in the browser after saving. Default is FALSE. |
The function does not return a value but saves an HTML file containing the visualization and displays it in the current R environment.
## Not run: visualize_documents_3d(model = topic_model, texts = texts_cleaned, reduced_embeddings = embeddings, custom_labels = FALSE, hide_annotation = TRUE, filename = "plot", auto_open = TRUE) ## End(Not run)## Not run: visualize_documents_3d(model = topic_model, texts = texts_cleaned, reduced_embeddings = embeddings, custom_labels = FALSE, hide_annotation = TRUE, filename = "plot", auto_open = TRUE) ## End(Not run)
This function visualizes the topic similarity heatmap of topics from a BERTopic model using Python's Plotly library. The visualization is saved as an interactive HTML file, which can be opened and viewed in a web browser.
visualize_heatmap( model, filename = "topics_similarity_heatmap", auto_open = FALSE )visualize_heatmap( model, filename = "topics_similarity_heatmap", auto_open = FALSE )
model |
A BERTopic model object. The model must have the method |
filename |
A character string specifying the name of the HTML file to save the visualization.
The default value is "topics_similarity_heatmap". The filename should not contain illegal characters.
The |
auto_open |
Logical. If TRUE, opens the HTML file after saving. Default is FALSE. |
The function does not return a value but saves an HTML file containing the visualization and displays it in the current R environment.
## Not run: # Assuming 'topic_model' is a BERTopic model object visualize_heatmap(model = topic_model, filename = "topics_similarity_heatmap", auto_open = FALSE) ## End(Not run)## Not run: # Assuming 'topic_model' is a BERTopic model object visualize_heatmap(model = topic_model, filename = "topics_similarity_heatmap", auto_open = FALSE) ## End(Not run)
This function visualizes the hierarchical clustering of topics from a BERTopic model. If a hierarchical topics DataFrame is provided, it uses this for visualization; otherwise, it visualizes directly from the model. The visualization is saved as an interactive HTML file, which can be opened and viewed in a web browser.
visualize_hierarchy( model, hierarchical_topics = NULL, filename = "topic_hierarchy", auto_open = TRUE )visualize_hierarchy( model, hierarchical_topics = NULL, filename = "topic_hierarchy", auto_open = TRUE )
model |
A BERTopic model object. The model must have the method |
hierarchical_topics |
Optional. A hierarchical topics DataFrame created using the BERTopic model's |
filename |
A character string specifying the name of the HTML file to save the visualization. The default value is "topic_hierarchy". The filename should not contain illegal characters. |
auto_open |
Logical. If |
The function does not return a value but saves an HTML file containing the visualization and displays it in the current R environment.
## Not run: # Assuming 'topic_model' is a BERTopic model object visualize_hierarchy(model = topic_model, filename = "topic_hierarchy", auto_open = TRUE) # Alternatively, provide a pre-calculated hierarchical_topics object visualize_hierarchy(model = topic_model, hierarchical_topics = hierarchical_topics, filename = "topic_hierarchy", auto_open = TRUE) ## End(Not run)## Not run: # Assuming 'topic_model' is a BERTopic model object visualize_hierarchy(model = topic_model, filename = "topic_hierarchy", auto_open = TRUE) # Alternatively, provide a pre-calculated hierarchical_topics object visualize_hierarchy(model = topic_model, hierarchical_topics = hierarchical_topics, filename = "topic_hierarchy", auto_open = TRUE) ## End(Not run)
This function visualizes the intertopic distance map of topics from a BERTopic model using Python's Plotly library. The visualization is saved as an interactive HTML file, which can be opened and viewed in a web browser.
visualize_topics( model, filename = "intertopic_distance_map", auto_open = FALSE )visualize_topics( model, filename = "intertopic_distance_map", auto_open = FALSE )
model |
A BERTopic model object. The model must have the method |
filename |
A character string specifying the name of the HTML file to save the visualization.
The default value is "intertopic_distance_map". The filename should not contain illegal characters.
The |
auto_open |
Logical. If TRUE, opens the HTML file after saving. Default is FALSE. |
The function does not return a value but saves an HTML file containing the visualization and displays it in the current R environment.
## Not run: # Assuming 'topic_model' is a BERTopic model object visualize_topics(model = topic_model, filename = "plot", auto_open = TRUE) ## End(Not run)## Not run: # Assuming 'topic_model' is a BERTopic model object visualize_topics(model = topic_model, filename = "plot", auto_open = TRUE) ## End(Not run)
This function visualizes topics over time from a BERTopic model using Python's Plotly library. The visualization is saved as an interactive HTML file, which can be opened and viewed in a web browser.
visualize_topics_over_time( model, topics_over_time_model, top_n_topics = 20, filename = "topics_over_time" )visualize_topics_over_time( model, topics_over_time_model, top_n_topics = 20, filename = "topics_over_time" )
model |
A BERTopic model object. The model must have the method |
topics_over_time_model |
A topics-over-time model object created using the BERTopic model. |
top_n_topics |
An integer specifying the number of top topics to display in the visualization. Default is 20. Must be a positive integer. |
filename |
A character string specifying the name of the HTML file to save the visualization. The default value is "topics_over_time". The filename should not contain illegal characters. |
The function does not return a value but saves an HTML file containing the visualization and displays it in the current R environment.
## Not run: # Assuming 'topics_over_time_model' is a BERTopic model object visualize_topics_over_time(model = topic_model, topics_over_time_model = topics_over_time, top_n_topics = 5, filename = "plot") ## End(Not run)## Not run: # Assuming 'topics_over_time_model' is a BERTopic model object visualize_topics_over_time(model = topic_model, topics_over_time_model = topics_over_time, top_n_topics = 5, filename = "plot") ## End(Not run)
This function visualizes the distribution of topics per class using a pre-trained BERTopic model. The visualization is generated using the Plotly Python package and displayed within an R environment.
visualize_topics_per_class( model = topic_model, topics_per_class = topics_per_class, start = 0, end = 10, filename = "topics_per_class", auto_open = TRUE )visualize_topics_per_class( model = topic_model, topics_per_class = topics_per_class, start = 0, end = 10, filename = "topics_per_class", auto_open = TRUE )
model |
A BERTopic model object. Default is 'topic_model'. |
topics_per_class |
A data frame or list containing the topics per class data. Default is 'topics_per_class'. |
start |
An integer specifying the starting index of the topics to visualize. Default is 0. |
end |
An integer specifying the ending index of the topics to visualize. Default is 10. |
filename |
A string specifying the name of the HTML file to save the visualization. Default is "topics_per_class". |
auto_open |
A logical value indicating whether to automatically open the HTML file after saving. Default is TRUE. |
A Plotly visualization of the topics per class, displayed as an HTML file within the R environment.
## Not run: visualize_topics_per_class(model = topic_model, topics_per_class = topics_per_class, start = 0, end = 7, filename = "plot", auto_open = TRUE) ## End(Not run)## Not run: visualize_topics_per_class(model = topic_model, topics_per_class = topics_per_class, start = 0, end = 7, filename = "plot", auto_open = TRUE) ## End(Not run)