A pure and best practice way to implement resource pooling in Rust

Why we write this article ?

During software development, there are so much problems that we need to resolved, and we don't want to depending on some random libraries. There might be some design patterns on the internet, but it not working that great, there are a lot of enhancement that needed to make it work.

Instead we're sharing our solution, that we already used and success with it, so instead of using library, you can simply copy and paste code to your project.

The Problems

There are some physical boundaries in computer that we can not break through, such as ram, cpu speed, cpu word size (32 or 64 bits), disk space, battery, etc… Those limits are not stopping there, it also spread out to cause others limit, for example, 100 connections is the default maximum number of concurrent connections to Postgres, or 1024 is the default limit for concurrent opened files per process in Ubuntu. Even though we could softly increased it, but the problems is we will always be bounded in a specific limit, and reaching that limit is not what we wanted to do, it will cause panic to our system. So that I believe one of the keys to building durable runtime software lies in understanding and respecting all types of resources we interact with..

Moreover, interacting with these resources - whether through networking or mainly CPU-bound operations - produces some overhead if done excessively. Imagine if we make a new connection to the database on every RESTful request. It would be a disaster for our system; we would waste most of our CPU time initiating required resources instead of focusing on the main logic.

If I did touch your pain point(s), welcome to my article. The design pattern we will discuss today is Resource Pooling, with Rust and asynchronous syntax (Tokio).

Who Will Benefit Most from Reading This?

Engineer who don't want to use library for small problems, this code is ready to used unless you need to add some extra features
Developers looking for an implementation of resource pooling in Rust.
If you are looking for a resource pooling solution that works with asynchronous programming, btw we're using Tokio, I think you just need to update some importes statements if you're not using tokio for asynchorous or you're not using asynchronous at all.
High-expectation developers who not only want a solution but also desire the best experience while using it.

The ideas

Note: Press the button to show the code explaination

First, let’s briefly examine how to use it at the end of the implementation:

pub struct UserRepository {
    pub db_request: PoolRequest<DbConnection>
}
 
impl UserRepository {
    async fn find_all(&self) -> Result<Errors, Vec<User>> {
        let db: Option<PoolResponse<DbConnection>> = self.db_request.retrieive().await;
        if db.is_none() {
            return Err(Errors::DatabaseConnectionError)
        }
 
        let db = db.unwrap();
        

        let db_results = db.query("SELECT * FROM Users").await;
        
        drop(db); 
    }
}

The PoolRequest<T> is a lightweight object that act as a request to the pool, it is Send + Sync + Clone, and also very cheap to clone across multiple threads. It's retrieve fn will request the pool for a PoolResponse<T>

The PoolResponse<T> has implemented Deref<T> + DerefMut<T> so that it can be dereferenced into the resource.

The PoolResponse also making sure there are no resources could escape out of the pool, every resource after leaving the pool, must return back after used, so that it can be be reused by others.

Take a look at this naive design on how a basic API application could use the pattern to solve resource initialization overhead.

Simple flow diagram

The pool already hold an amount of resource items, so that we are able to eliminate issue of resource initiation overhead. Every handler holds a PoolRequest instance, which is allowing them to request resource as they wanted to. But there are more problems than that.

Implementation

Resource Abstraction

Before implementing the pool, we need a way to abstract the resources because the pool doesn't concern itself with what the resource is or how to initialize it. It only cares about the state of the resource—for example, whether it has timed out.

a. ResourceProvider

To abstract the process of creating new resources, we'll create a trait that encapsulates the initialization logic for different types of resources.

pub trait PoolResourceProvider<T>: Send + Sync
    where T: Send + Sync
{
    async fn new(&self) -> T where Self: 'async_trait;
}

You may notice that the new function cannot accept custom parameters, but it’s essential to initialize a resource with custom parameters.

We can create a struct to hold the parameters, and that struct will implement the PoolResourceProvider<T> trait.

pub struct DbPoolProvider {
  connection_string: string
}
 
impl PoolResourceProvider<PosgresConnection> for DbPoolProvider {
  async fn new(&self) -> PosgresConnection {
    let db = postgres.connect(self.connection_string).await;
    db
  }
}

Now that we’ve established an effective abstraction for creating new resources, let’s proceed to the next step.

b. PoolItem<T>

The fundamental idea behind a PoolItem is that it will manage the state of each resource, for example How long this resource is being in the pool.

Also, the PoolItem making sure our code will be easy to maintained, provide us a way to implement more detailed logic on how to manage each resource.

Let's take a look on struct PoolItem<T>

pub struct PoolItem<T> where T: Send + Sync + 'static
{
    



    resource: Box<T>,
    
    start_time: Instant,
    
    max_idling_timeout: Duration
}
 
impl<T> PoolItem<T> where T: Send + Sync + 'static
{
    pub fn new(resource: Box<T>, max_idling_timeout: Duration) -> Self {
        Self {
            resource,
            start_time: Instant::now(),
            max_idling_timeout
        }
    }
 
    pub fn refresh(&mut self) {
        self.start_time = Instant::now();
    }
 
    pub fn timeleft(&self) -> Duration {
       let elapsed = self.start_time.elapsed();
       if elapsed.gt(&self.max_idling_timeout) {
           return Duration::ZERO
       }
 
       self.max_idling_timeout - elapsed
    }
}

There are two functions that we need to talk about:

The ::refresh function will be called when a PoolItem has finished it's duty and returned back to the pool. We will refresh it's count down to the beginning as a 'award' that it's has finished it's duty instead of laying in the pool and doing nothing.

The ::timeleft function will counting the max amount of time that the resource could be laying in the pool and doing nothing, if it exceed the timeout which means the timeleft will return Duration::ZERO, then it will be removed by the PoolCleaner we will discuss further later in this post ?

The Pool

Finally, let's discuss our main component: the pool itself.

The pool consists of three primary parts:

The Pool: Hold an array of resources, reponsible for managing the list of resources, and making sure every new resource that be created will also be counted very carefully. This is also the core part which will be shared by Allocator and Cleaner.

The Allocator: This component initializes resources and provides them as needed. It handles the creation of new resources and supplies them to requesters.

The Cleaner: This part is responsible for maintaining the pool's efficiency by cleaning up unused resources, preventing unnecessary memory usage.

a. Pool<T>

Let's talk about the core Pool<T> first, here is it's structure:

pub struct Pool<T> where T: Send + Sync + 'static {
    min_size: usize,
    max_size: usize,
    items: Mutex<VecDeque<PoolItem<T>>>,
    counter: Mutex<usize>,
}

The min_size: Is the minimum amount of resources that will always be in the pool, it help our system reduce overhead in creating new resource too freequently.

The max_size: Is the maximum amount of resources that could be created by the pool, whenever the pool don't have enough resource to provide, it could create some extra resouces to serve, and after these extra resources is used, it will be cleanup if it not in used for max_idling_timeout as described above.

The items: This hold an array of PoolItem, it is wrapped in a Mutex because the entire struct Pool<T> will be saved in the heap instead of stack, and will be shared across across different async tasks.

The counter: This is used to counting every resource in the pool. You might asked why we are not using the items.len() instead. But it is because a resource when being requested it will be take out of the items then eventhough the items does reduced it's size but there are a resource has been created and borrowed, we can not trust the items.len() we need to have another counter, and here is it.

Here is it's implementation, noticed that all of the import is from tokio.

Note: To prevent dead lock, we must ensure two conditions:

1. In any situation there is no more than one lock occured.

For example:

Thread 1: Wait for lockB to released lockA.
Thread 2: Wait for lockA to released lockB.

It will continue to wait for each other until the end of this world. To save the world, never allowed any two locks that could occured at a time.

Never do this:

let lockA = A.lock();
let lockB = B.lock()

Instead, we do this.

let lockA = A.lock();
drop(lockA);
let lockB = B.lock()

2. Establish a single source of truth to manage locks; this makes it much easier to review logic when a deadlock occurs..

While deadlocks may not happen today, there’s always the possibility that a mistake by a colleague could cause one in the future. By centralizing lock management, we simplify the review process and reduce the risk of errors.

Now, let's get back to the core implementation of the Pool.

impl<T> Pool<T> where T: Send + Sync + 'static {
    async fn add_new_item(&self, item: PoolItem<T>) {
        if self.increase_counter().await.is_ok() {
            self.items.lock().await.push_front(item);
        }
    }
 
    async fn borrow_item(&self) -> Option<PoolItem<T>> {
        self.items.lock().await.pop_front()
    }
 
    async fn return_borrowed_item(&self, mut item: PoolItem<T>) {
        let mut counter = self.counter.lock().await;
 
        if *counter <= self.max_size {
            item.borrow_mut().refresh();
            drop(counter);
            self.items.lock().await.push_front(item);
        }
        else {
            *counter -= 1; 
            


        }
    }
 
    async fn increase_counter(&self) -> Result<usize, usize> {
        let mut counter = self.counter.lock().await;
 
        if *counter < self.max_size {
            *counter += 1;
            return Ok(*counter)
        }
 
        Err(*counter)
    }
 
    async fn decrease_counter(&self) -> Result<usize, usize> {
        let mut counter = self.counter.lock().await;
        if *counter > self.min_size {
            *counter -= 1;
            return Ok(*counter);
        }
 
        Err(*counter)
    }
 
    async fn invalidate(&self, index: usize) -> Result<(), Duration> {
        let mut items = self.items.lock().await;
        let item = items.get(index);
        if item.is_none() {
            return Ok(())
        }
 
        let timeleft = item.as_ref().unwrap().timeleft();
        if !timeleft.is_zero() {
            return Err(timeleft)
        }
 
        if let Some(removed_item) = items.swap_remove_back(index) {
            drop(items);
            if let Err(_) = self.decrease_counter().await {
                self.items.lock().await.push_back(removed_item);
            }
        }
 
        Ok(())
    }
}

::add_new_item function will accept new resource to the pool, mostly triggered at the initial time of the PoolAllocator where it need to inititate various of new resources to fix the min_size

::return_borrowed_item function will accept a borrowed resource, what is a borrowed resource ? It is the resource that has been requested, and now it returned back to the pool. In this case there is no need to increased the counter.

::invalidate function will invalidate the item at index if the pool size is exceed the min_size and the item has exceed max_idling_timeout.

a. PoolAllocator<T>

First let's take a look on the struct.

pub struct PoolAllocator<T>
where
    T: 'static + Send + Sync
{
    pub pool: Arc<Pool<T>>,
    resource_idle_timeout: Duration,
    resource_provider: Box<dyn PoolResourceProvider<T> + 'static>,
    waiters: Mutex<VecDeque<oneshot::Sender<PoolResponse<T>>>>,
    cleaner: PoolCleaner<T>
}

We create a PoolAllocator struct that requires a generic type <T>, representing the type of resource it will provide.

The PoolAllocator<T> holds a Vector of PoolItem<T>, which is wrapped inside an Arc to allow sharing the reference across different threads. We then wrap this with an RwLock, enabling multiple readers but only one writer at a time. Together, these wrappers allow us to share the PoolAllocator across threads without violating Rust's memory safety rules.

Regarding why the allocator needs to keep a reference to the PoolCleaner, it's because it decides when to start the cleanup. Additionally, this ensures that the PoolCleaner<T> will be dropped when the allocator is dropped.

Now, let's examine the main logic of the allocator to see how it works.

impl<T> PoolAllocator<T> where T: Send + Sync + 'static
{
    pub async fn init(&self)
    {
        tokio_scoped::scope(|scope| {
            for _i in 0..self.pool.min_size {
                let pool = self.pool.clone();
                let resource_provider = &self.resource_provider;
                let idle_timeout = self.resource_idle_timeout.clone();
                scope.spawn(async move {
                    let resource = resource_provider.new().await;
                    let pool_item: PoolItem<T> = PoolItem::new(Box::new(resource), idle_timeout);
 
                    pool.add_new_item(pool_item).await;
                });
            }
        });
 
        info!(
            target: "pool-allocator",
            "Initialized {} resources to fit the min_pool_size={}",
            self.pool.number_of_available_items().await,
            self.pool.min_size
        );
    }
 
    pub async fn retrieve(self: &Arc<Self>) -> Result<PoolResponse<T>, oneshot::Receiver<PoolResponse<T>>> {
        let available_item = self.pool.borrow_item().await;
        if available_item.is_some() {
            return Ok(PoolResponse::new(available_item.unwrap(), self.clone())); 
        }
 
        


        if self.pool.increase_counter().await.is_ok() {
            let new_resource = self.resource_provider.new().await;
            let borrowed_item = PoolItem::new(
                Box::new(new_resource),
                self.resource_idle_timeout);
 
            return Ok(PoolResponse::new(borrowed_item, self.clone()))
        }
 
        

        let (sender, receiver) = oneshot::channel::<PoolResponse<T>>();
        self.waiters.lock().await.push_back(sender);
        Err(receiver)
    }
 
    



    pub async fn put(self: &Arc<Self>, resource: PoolItem<T>) {
        let resource = 'serve_waiter_first: {
            let mut waiters = self.waiters.lock().await;
            let mut resource = Some(resource);
            while let Some(waiter) = waiters.pop_front() {
                resource = waiter
                    .send(PoolResponse::new(resource.unwrap(), self.clone()))
                    .err()
                    .map(|it| it.try_into().unwrap());
 
                if resource.is_none() {
                    break 'serve_waiter_first None
                }
            }
 
            resource
        };
 
        if resource.is_some() {
            self.pool.return_borrowed_item(resource.unwrap()).await;
            self.cleaner.request_cleanup_loop().await;
        }
    }
}

The ::init function has two main responsibilities. First, it initializes the minimum number of resources to populate the pool. And to prevent long startup time we will create all resources concurrently instead of one by one.

The ::retrieve function extracts a resource from the pool. And if the number of resource has not been reached to the max_pool_size, we initiate a new resource.

And also, if there are no room for any extra new resource, we add the request to waiters list. The waiters treat everyone equally, which means first come first serve. Whenever resource is available it will be used to serve the request first.

The ::put function returns the resource back to the pool, but if there are some waiters, we will allowed the resource to have another trip to serve new request on the waiter list.

If the resource returned successfully to the pool, we understand that the pool is started to become less busy than before. So let's trigger the cleanup process.

b. PoolCleaner<T>

We have successfully implemented resource allocation; now it's time to free up some 'lazy' resources in our system. We love efficient of how the allocator work, but we (should) hate wasting resource.

A resource is considered as lazy if the pool size is exceed max_pool_size and the resource is laying in the pool for too long > resource_idle_timeout.

Let's take a quick look at how this process works:

struct PoolCleaner<T> where T: Send + Sync + 'static {
    pool: Arc<Pool<T>>,
    background_handle: Arc<Mutex<Option<JoinHandle<()>>>>,
}
 
impl<T> PoolCleaner<T> where T: Send + Sync {
    async fn request_cleanup_loop(&self) {
        if !self.pool.is_exceed_min_size().await {
            return
        }
 
        let mut background_handle = self.background_handle.lock().await;
        if background_handle.is_some() && !background_handle.as_ref().unwrap().is_finished() { 
            return
        }
 
        let pool = self.pool.clone();
        *background_handle = {
            Some(spawn(async move {
                let pool = pool;
                let mut curr_index = pool.min_size;
                let mut min_timeout: Option<Duration> = None;
 
                loop {
                    yield_now().await;
 
                    if !pool.is_exceed_min_size().await {
                        break;
                    }
 
                    let number_of_idle_items = pool.number_of_available_items().await;
 
                    if number_of_idle_items == 0 {
                        break;
                    }
 
                    if curr_index >= number_of_idle_items {
                        if let Some(timeout) = min_timeout {
                            min_timeout = None;
                            sleep(timeout).await;
                        }
 
                        curr_index = 0;
                        continue;
                    }
 
                    if let Err(timeleft) = pool.invalidate(curr_index).await {
                        min_timeout = Some(min_timeout
                            .map(|it| min(timeleft, it))
                            .unwrap_or(timeleft)
                        );
                    }
 
                    curr_index += 1;
                }
            }))
        };
    }
}
 
impl<T> Drop for PoolCleaner<T> where T: Send + Sync {
    fn drop(&mut self) {
        let background_handle = self.background_handle.clone();
        spawn(async move {
            let mut handle = background_handle.lock().await;
            if handle.is_some() {
                handle.as_mut().unwrap().abort();
            }
        });
    }
}

The code above could be easily explained as:

Clean up is a background job, and there is there is only maximum one cleanup background job could be run at a time.
Cleanup will start when the pool allocator 'think' that it found a lazy resource.
Cleanup will stop when there are no extra resources pool.size > min_pool_size or there are no resources in the pool, which means the pool is very busy, there is no room for a background job to run.
To both optimize (less cpu blocking) and efficient (on time) cleanup the resource, we call yield_now() for not blocking cpu, and calculate the min time to delay(), to not trigger cleanup loop very often which required lock() on the pool.

The PoolRequest<T>

After a great time with PoolAllocator we have to think about a way to work with it, especially the strictly rule of Rust on memory safety which will preventing us from just simply hold a reference of any kind of things across async tasks.

To deal with it we must create a struct that Send + Sync + Clone.

Let’s take a look at how it works:

pub struct PoolRequest<T>
where
    T: Send + Sync + 'static,
{
    pub retrieving_timeout: Option<Duration>,
    pub pool: Arc<PoolAllocator<T>>
}
 
impl<T> Clone for PoolRequest<T>
where
    T: Send + Sync + 'static,
{
    fn clone(&self) -> Self {
        Self {
            pool: self.pool.clone(),
            retrieving_timeout: self.retrieving_timeout.clone()
        }
    }
}
 
impl<T> PoolRequest<T>
where
    T: Send + Sync + 'static
{
    pub async fn retrieve(&self) -> Option<PoolResponse<T>> {
        match (self.retrieving_timeout, self.pool.retrieve().await) {
            (_, Ok(resource)) => Some(resource),
            (Some(timeout), Err(waiter)) => {
                match tokio::time::timeout(timeout, waiter).await {
                    Ok(Ok(value)) => Some(value),
                    _ => None
                }
            },
            _ => {
                None
            }
        }
    }
}

The PoolRequest could be Send + Sync + Clone and also cheap to be cloned by only keeping the Arc reference to the PoolAllocator.

Then by calling the retrieve function, it will request the resource from the pool, and if there is no resource available, it will wait for a retrieving_timeout time. If there still no resource that available, sadly that we have to return a None result.

The PoolResponse<T>

To be honest, I would prefer the PoolRequest to return the resource directly. We all know how it feels to have ownership of an instance rather than just a reference, right?

But every resource have to go back to the pool, because it is needed to reused by the others ! What if I, you, or one of our colleagues takes the resource and just drops it? It would never return to the pool, and it is a waste, we want cheap and efficient solution.

We need a better approach. By creating PoolResponse, we ensure the resource is always returned to the pool.

Now, let's take a look at how it works.

pub struct PoolResponse<T>
where
    T: Send + Sync + 'static,
{
    resource: Option<PoolItem<T>>,
    pool: Option<Arc<PoolAllocator<T>>>
}
 
impl<T> Deref for PoolResponse<T> where T: Send + Sync + 'static,
{
    type Target = T;
 
    fn deref(&self) -> &Self::Target {
        self.resource.as_ref().expect("Cannot access returned resource").deref()
    }
}
 
impl<T> DerefMut for PoolResponse<T> where T: Send + Sync + 'static
{
    fn deref_mut(&mut self) -> &mut Self::Target {
        self.resource.as_mut().expect("Cannnot access the returned resource").deref_mut()
    }
}
 


impl<T> Drop for PoolResponse<T>
where
    T: Send + Sync + 'static
{
    fn drop(&mut self) {
        let pool = self.pool.take();
        let resource = self.resource.take();
        if resource.is_none() {
            return;
        }
 
        spawn(async move {
            pool.expect("This response already dropped")
                .put(resource.expect("The response already dropped"))
                .await;
        });
    }
}

By implementing both Deref and DerefMut, we are able to freely interact with the resource without any limitations (I hope so).

And the secret on how the resource could automatically return to the pool after used is within the Drop trait implementation. Whenever the PoolResponse goes out of scope or by manually call drop(response), we will returns the resource back to the pool.

This also explains why the pool and resource need to be an Option, because we need to take the resource out and pass it to the async task, which runs independently after the PoolResponse is dropped.

The usage

First assuming that DbConnection is our resource

pub struct DbConnection {}
 
impl DbConnection {
    pub async fn new(_: String) -> Self {
        Self {}
    }
 
    pub async fn query(&self) {
        tokio::time::sleep(Duration::new(0, 100)).await;
    }
}

Then we abstract the resource by implement PoolResourceProvider trait

pub struct DbPoolResourceProvider {
    connection_string: String
}
 
#[async_trait::async_trait]
impl PoolResourceProvider<DbConnection> for DbPoolResourceProvider {
    async fn new(&self) -> DbConnection {
       DbConnection::new(self.connection_string.clone()).await
    }
}

And we used it as below.

This playground also included full code, feel free to copy it to your project, We also implement the PoolBuilder and PoolRequestBuilder as a bonus.

Code playground

Console⣿