--- title: "Troubleshooting & FAQ" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Troubleshooting & FAQ} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette covers common issues you may encounter when using dbverse packages and how to resolve them. ```{r diagnostic-example} library(dbProject) project_path <- tempfile("dbproject-troubleshooting-") proj <- dbProject$new(path = project_path) proj$is_connected() proj$disconnect() proj$is_connected() unlink(project_path, recursive = TRUE) ``` ## Database Lock Errors ### Error: "Could not set lock on file" ``` Error: Could not set lock on file "path/to/database.duckdb": Conflicting lock is held in /path/to/R (PID 12345) by user... ``` **Cause**: DuckDB uses file-level locking to ensure data integrity. Only one process can write to a database file at a time. This error occurs when: - Another R session is connected to the same database - A previous R session didn't disconnect properly before closing - Multiple scripts are trying to access the same database simultaneously **Solutions**: 1. **Kill the conflicting process** (shown in the error message): ```bash # Replace 12345 with the actual PID from the error kill 12345 ``` 2. **Restart RStudio** to close all R processes 3. **Properly disconnect before closing sessions**: ```r # Always disconnect when done proj$disconnect() # Or for direct DBI connections: DBI::dbDisconnect(con, shutdown = TRUE) ``` 4. **Use read-only mode for concurrent access**: ```r # Multiple processes can read simultaneously con <- DBI::dbConnect(duckdb::duckdb(), dbdir = "path/to/db", read_only = TRUE) ``` --- ## Connection Issues ### Error: "No active or cached connection available" **Cause**: The `dbProject` connection has been closed or was never established. **Solution**: Reconnect to the database: ```r proj$reconnect() ``` ### Objects showing "Error: Column references ..." after restart **Cause**: Lazy table references become stale after R restarts because the connection is lost. **Solution**: Use `dbProject` for automatic reconnection: ```r # Create project (saves connection info) proj <- dbProject$new(path = "my_project", dbdir = "data.duckdb") # After restart, just reload proj <- dbProject$new(path = "my_project", dbdir = "data.duckdb") proj$reconnect() # Pinned objects are automatically reconnected my_data <- proj$pin_read("my_table") ``` --- ## Memory Issues ### Large datasets causing R to crash **Cause**: Accidentally collecting large lazy tables into memory. **Prevention**: 1. **Avoid `collect()` on large tables**: ```r # BAD - loads entire table into memory big_df <- big_table |> collect() # GOOD - keep lazy, push computation to database and only collect what's needed result <- big_table |> filter(gene == "BRCA1") |> collect() ``` 2. **Use `pin_write()` to materialize intermediates to disk**: ```r # Materialize to database, not memory filtered_data <- proj$pin_write(x = lazy_filtered, name = "filtered") ``` --- ## Package-Specific Issues ### dbMatrix: Slow operations after many transformations **Cause**: Complex lazy query chains can cause performance issues. **Solution**: Use `dplyr::compute()` or `pin_write()` to materialize intermediate results: ```r # Materialize after expensive operations complex_result <- my_dbMatrix |> some_expensive_transform() |> # e.g. a function containing several SQL joins dplyr::compute(name = "materialized", temporary = FALSE) ``` --- ## Getting Help If you encounter issues not covered here: 1. Checkout the specific [dbverse package on our GitHub organization](https://github.com/dbverse-org). 2. Ensure you're using the latest dbverse package versions. 3. Include a minimal reproducible example when reporting issues.