Requirements: Basic grasp of pointers and borrow-checker

This is a tricky concept in Rust for many, and I think I can make it a bit more clear for the ones who are trying to understand it. Unlike other posts, let’s start with our beloved borrow checker.

fn main() {
    /* Borrow checker does not allow us to SWAP/MOVE the borrowed values */

    let mut a = "a";
    let mut ref_a = &a;
    let mut b = "b";
    let mut ref_b = &b;

    println!("a = value: {},        --- address:{:p}", a, &a);
    println!("ref_a = value: {:p},  --- address:{:p}, --- deref to: {}", ref_a, &ref_a, *ref_a);
    println!("b = value: {},        --- address:{:p}", b, &b);
    println!("ref_b = value: {:p},  --- address:{:p}, --- deref to: {}", ref_b, &ref_b, *ref_b);


	/* try uncomment one of the lines below */

    // b = a; // MOVE - COMPILER ERROR
    // std::mem::swap(&mut a, &mut b); // SWAP - COMPILER ERROR

    println!("ref_a = value: {},   --- address:{:p}", ref_a, &ref_a);
    println!("ref_b = value: {:p}, --- address:{:p}, --- deref to: {}", ref_b, &ref_b, *ref_b);
}

The code itself is quite readable, but still, let me give a small summary:

ref_a and ref_b are borrowing a and b respectively,
we cannot change the values of a and b before we are done with ref_a and ref_b
if we try to move or swap the values of a or b while they are borrowed, compiler will get angry.

We already know this. It is the infamous borrow-checker.

The Problem: Self-referential structs

There is a nuance. If both a and ref_a are in the same struct, then compiler may not get mad. Because both a and ref_a will belong to the same entity, and we will be moving that entity itself. As a result, we may get unwanted/buggy behavior. See the code below:

#[derive(Debug)]
struct Test {
    string: String,
    ref_string: *const String, // this is a pointer now
}

impl Test {
    fn new(txt: &str) -> Self {
        Test {
            string: String::from(txt),
            ref_string: std::ptr::null(),
        }
    }

    fn init(&mut self) {
        self.ref_string = &self.string;
    }
}

fn print_test_struct(test: &Test, label: &str) {
    println!(
        "{label}.string = value: {},            --- address:{:p}",
        test.string, &test.string
    );
    unsafe {
        println!(
            "{label}.ref_string = value: {:p},  --- address:{:p}, --- deref to: {}",
            test.ref_string, &test.ref_string, *test.ref_string
        );
    }
}

fn main() {
    /* "Self-referential - SWAP PROBLEM - values will be inconsistent" */

    let mut a = Test::new("a");
    a.init();
    let mut b = Test::new("b");
    b.init();

    println!("# initial values:");
    println!();
    print_test_struct(&a, "a");
    println!();
    print_test_struct(&b, "b");

    println!("\n-----------------------\n");

    /* Let's swap */
    std::mem::swap(&mut a, &mut b);

    println!("# values after swap:\n");

    print_test_struct(&a, "a");
    println!();
    print_test_struct(&b, "b");
}

Here is the output from the playground (https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=697589672992f95b8705247a926c7b81):

Weird, huh?

We can replicate the same behavior also with move:

fn main() {
    /* "Self-referential - MOVE PROBLEM - values will be inconsistent" */

    let mut a = Test::new("a");
    a.init();
    let mut b = Test::new("b");
    b.init();

    println!("# initial values:");
    print_test_struct(&a, "a");
    print_test_struct(&b, "b");

    println!("\n-----------------------\n");

    /* Let's move */
    b = a;

    println!("# values after swap:\n");
    // print_test_struct(&a, "a"); // cannot print `a`, since it's moved
    print_test_struct(&b, "b");
}

In short, it is not a very good idea to move self-referential structs.

But, wasn’t Rust safe? How come this happened? Is it because we used unsafe? Yes! If we tried to replicate the same behavior with not raw pointers, but say with Box, the borrow checker would immediately complain.

Let me demonstrate:

#[derive(Debug)]
struct Test<'a> {
    string: String,
    ref_string: Option<&'a String>, // this is a pointer now
}

impl<'a> Test<'a> {
    fn new(txt: &str) -> Self {
        Test {
            string: String::from(txt),
            ref_string: None,
        }
    }

    fn init(&'a mut self) {
        self.ref_string = Some(&self.string);
    }
}

fn print_test_struct(test: &Test, label: &str) {
    println!(
        "{label}.string = value: {},            --- address:{:p}",
        test.string, &test.string
    );
    println!(
        "{label}.ref_string = value: {:?},  --- address:{:p}, --- deref to: {}",
        test.ref_string,
        &test.ref_string,
        *test.ref_string.unwrap()
    );
}

fn main() {
    /* "Self-referential - SWAP PROBLEM - values will be inconsistent" */

    let mut a = Test::new("a");
    a.init();
    let mut b = Test::new("b");
    b.init();

    println!("# initial values:");
    println!();
    print_test_struct(&a, "a"); // COMPILER ERROR
    println!();
    print_test_struct(&b, "b"); // COMPILER ERROR

    println!("\n-----------------------\n");

    /* Let's swap */
    std::mem::swap(&mut a, &mut b); // COMPILER ERROR

    println!("# values after swap:\n");

    print_test_struct(&a, "a"); // COMPILER ERROR
    println!();
    print_test_struct(&b, "b"); // COMPILER ERROR
}

For example, the first compiler error is:

cannot borrow `a` as immutable because it is also borrowed as mutable
immutable borrow occurs here
main.rs(37, 5): mutable borrow occurs here

Which translates into:

init function mutably borrows the self
and this borrow needs a lifetime, which is the lifetime of the self
In other words, after init function’s code execution is finished, this borrow is still active, because ref_string actively borrows from string field (due to lifetime)
if there is an active mutable borrow is present, we can’t create another reference out of it (borrow checker rule)

So, how exactly our code worked with unsafe?

The borrow operation happened and finished within the scope of init function, since there was no lifetime associated to the fields. Thus, we were able to create another reference out of it.

Self-referential relationship with async

You may be thinking: thank god I don’t have self-referential stuff in my code. But do you have async code in your project? Because, async can be self-referential under the hood. In this case, we ask help from Pin/Unpin to make our code fit back in the compiler’s safety guarantees. We, as Rust developers, appreciate it a lot if the compiler can provide the safety instead of us. We are lazy and we like to delegate the responsibility 🙂

Async `can be` self-referential` under the hood?

I will be quoting from async book of rust.

Quote starts…

what happens if we have an async block that uses references? For example:

async {
    let mut x = [0; 128];
    let read_into_buf_fut = read_into_buf(&mut x);
    read_into_buf_fut.await;
    println!("{:?}", x);
}

What struct does this compile down to?

struct ReadIntoBuf<'a> {
    buf: &'a mut [u8], // points to `x` below
}

struct AsyncFuture {
    x: [u8; 128],
    read_into_buf_fut: ReadIntoBuf<'what_lifetime?>,
}

Here, the ReadIntoBuf future holds a reference into the other field of our structure, x (see my screenshot below). However, if AsyncFuture is moved, the location of x will move as well, invalidating the pointer stored in read_into_buf_fut.buf.

Quote ends…

In case the last paragraph was a bit cryptic (it was to me when I first read it), let me make it much easier for you.

Now, you should be able to the see the hidden cycle.

If you are confused about how async code is compiling down to something else, I strongly suggest my own resource on this: https://github.com/ozgunozerk/asynction

How do we solve this problem?

By pinning our self-referential structs, we will be telling the compiler that: “hey, look, I don’t want to move this, alright? Get angry if I try to move this.”

So, we actually didn’t make moving self-referential structs safe, we just forbid ourselves to move them. Because moving them is unsafe 😀. That’s the point of pin. We will be making some promises and compiler is going to check these promises for us.

How does pin work? In short, pinning something will make it impossible to move it.

In long, Pin struct has a single field, which is a pointer, and that is a private field, so you cannot reach the pointer. And the struct Pin implements several methods for reaching the pinned data via mutable or non-mutable references, but all of them have some strict bounds to ensure that you won’t be able to move the data. To sum, when you Pin the data, and you won’t have access to the actual pointer (which points to the data hidden behind). And you have very limited access to the data via Pin methods.

What is Unpin?

You should interpret this name as can be Unpinned instead. If the data we are pinning is Unpin, we can get a mutable reference, move it, we can do it anything we like. Because the compiler does not care about Pin promises for Unpin data. And the methods available for the Pin struct, are actually very flexible if the target data is Unpin.

Basically, you can think of it like, Pin and Unpin cancels each other.

So, which data types are Unpin? I have good news, everything is Unpin, except self-referential structs. So, if you Pin some regular data, it will ultimately have no effect, since your data was Unpin (most probably).

And you can guess by now, !Unpin means, not Unpin, or it means: this data WILL get affected by Pin.

Compiler is generally doing well about async stuff, and it marks the necessary things as !Unpin. However, the self-referential struct we wrote above in our examples, is actually marked as Unpin, whereas it should have been !Unpin.

The reason is, Unpin is an auto trait, and will be implemented for each data, if that data’s fields are Unpin.

The compiler is not trying to deduce if there is any cycle for these structs. If it did, it could have marked them as !Unpin. Maybe there are other reasons as well, but for our custom self-referential structs, we have to mark them as !Unpin ourselves (see the code below):

use std::pin::Pin;
use std::marker::PhantomPinned;  // this helps us to make our data `!Unpin`

#[derive(Debug)]
struct Test {
    string: String,
    ref_string: *const String,
    _marker: PhantomPinned, // we need this to make our type `!Unpin`
}

impl Test {
    fn new(txt: &str) -> Self {
        Test {
            string: String::from(txt),
            ref_string: std::ptr::null(),
            _marker: PhantomPinned, // This makes our type `!Unpin`
        }
    }

    fn init(&mut self) {
        self.ref_string = &self.string;
    }
}

We cannot use impl !Unpin for Test{} as of December 8, 2022, so we have to use a phantom marker instead.

Pinning `!Unpin` is unsafe code

What? Why? Wasn’t this the main purpose of Pin:

to pin !Unpin data.

Hahaha. This part is actually really fun. Take a look at the below main built on top of the above code:

fn main() {
/* "Self-referential - SWAP PROBLEM - values will be inconsistent" */

    let mut a = Test::new("a");
    a.init();
    let mut b = Test::new("b");
    b.init();

    println!("# initial values:");
    println!();
    print_test_struct(&a, "a");
    println!();
    print_test_struct(&b, "b");

    println!("\n-----------------------\n");

   // CHECKPOINT START
    let mut a_pin = unsafe { Pin::new_unchecked(&mut a) };
    let mut b_pin = unsafe { Pin::new_unchecked(&mut b) };

		/* Let's swap */
    std::mem::swap(&mut a, &mut b);
    // CHECKPOINT END

    println!("# values after swap:\n");

    print_test_struct(&a, "a");
    println!();
    print_test_struct(&b, "b");
}

Pay attention to the code between CHECKPOINT START and CHECKPOINT END. We will modify that part later to experiment a bit with this code. I will refer to that part as the checkpoint code.

Although we did pin the values, we could swap a with b. And the problem is still there, a.ref_string and b.ref_string pointers are pointing to the wrong ones.

The compiler was not angry, because, Pin does not allow you to move the values, if you try to move them through the Pin itself only.

If you can access your variable directly, without using Pin, there is nothing compiler can do for you. That’s why, pinning !Unpin is unsafe. Since you (as the developer) should make and hold a promise, which is: I will not move the data I’m right now pinning, through other ways (like we did above).

The correct usage would be shadowing the variable name, in order to limit ourselves (only way to access the variable should be through Pin):

// WRONG
let mut a_pin = unsafe { Pin::new_unchecked(&mut a) };
let mut b_pin = unsafe { Pin::new_unchecked(&mut b) };

// CORRECT
let mut a = unsafe { Pin::new_unchecked(&mut a) }; // notice the shadowing
let mut b = unsafe { Pin::new_unchecked(&mut b) }; // notice the shadowing

So, if we change our checkpoint code, as:

// CHECKPOINT START
let mut a = unsafe { Pin::new_unchecked(&mut a) }; // `a` instead of `a_pin`
let mut b = unsafe { Pin::new_unchecked(&mut b) }; // `b` instead of `b_pin`

/* Let's swap */
std::mem::swap(&mut a, &mut b);
// CHECKPOINT END

You will see the code still compiles. LOL!

But the problem is no longer there. a.ref_string and b.ref_string are now pointing to their own string fields, it is correct!

But why? Has the swap operation not worked? It actually did. What we have swapped were not our structs, but the pointers. So now, a (the pin pointer) is pointing towards b (struct). And b (the pin pointer) is pointing towards a (struct). This is ok, since the structs themselves have not moved, and thus, their a and b variables have not moved as well.

Okay, how can we try to move the actual a and b struct through the pin pointer? So that, we can see the compiler is forbidding us that operation? First, we should get a mutable reference to our structs. Change the checkpoint code as follows:

// CHECKPOINT START
let mut a = unsafe { Pin::new_unchecked(&mut a) };
let mut b = unsafe { Pin::new_unchecked(&mut b) };

/* Let's swap */
std::mem::swap(a.get_mut(), b.get_mut());
// CHECKPOINT END

Now, we get a proper compiler error:

Compiling playground v0.0.1 (/playground)
error[E0277]: `PhantomPinned` cannot be unpinned
  --> src/main.rs:47:20
   |
47 |     std::mem::swap(a.get_mut(), b.get_mut());
   |                    ^^^^^ ------- required by a bound introduced by this call
   |                    |
   |                    within `Test`, the trait `Unpin` is not implemented for `PhantomPinned`
   |
   = note: consider using `Box::pin`
note: required because it appears within the type `Test`
  --> src/main.rs:5:8
   |
5  | struct Test {
   |        ^^^^
note: required by a bound in `Pin::<&'a mut T>::get_mut`

error[E0277]: `PhantomPinned` cannot be unpinned
  --> src/main.rs:47:37
   |
47 |     std::mem::swap(a.get_mut(), b.get_mut());
   |                                     ^^^^^ ------- required by a bound introduced by this call
   |                                     |
   |                                     within `Test`, the trait `Unpin` is not implemented for `PhantomPinned`
   |
   = note: consider using `Box::pin`
note: required because it appears within the type `Test`
  --> src/main.rs:5:8
   |
5  | struct Test {
   |        ^^^^
note: required by a bound in `Pin::<&'a mut T>::get_mut`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `playground` due to 2 previous errors

So, coming back to the question: why pinning !Unpin is unsafe?

Because, remember, we have to give a compiler a promise: we will not use the data we pinned without going through the Pin pointer. That’s why we shadowed the variable names to limit ourselves. In conclusion, the compiler cannot promise the full safety by simply putting pin, and we have to take some of the responsibility for pinning the data. That’s why, it is unsafe.

We made it! Now:

we can pin our self-referencing structs (including async stuff in our code)
we know what !Unpin means
we can make our custom self-referential code !Unpin
we know the reason behind unsafe for pinning a !Unpin data, so no need to be afraid of it anymore

Bonus: Making the API safer

So far, we were relying on the developer’s promise to not move the pinned data. However, we can make our API safer, and enforce these promises via our design.

Having init method to accept Pin<&mut Self> instead of &mut Self would make sense. As the caller has to pin the struct before passing it to init. Also, after init, we want to make it impossible to move the struct. So, it make double sense!

I present you our new init method:

    fn init(self: Pin<&mut Self>) {
        unsafe {
            self.get_unchecked_mut().ref_string = &self.as_ref().string;
        }
    }

However, this API might be too restrictive. get_unchecked_mut() is moving the self, making it impossible to do anything with self after calling init(). We cannot even print it…

If we are okay with not having access to self again, this is fine. But a more flexible API would be:

    fn init(mut self: Pin<&mut Self>) -> Pin<&mut Self> {
        unsafe {
            self.as_mut().get_unchecked_mut().ref_string = &self.as_ref().string;
        }
        self
    }

This seems a bit ugly, let me explain everything step by step:

self is Pin<&mut Self>
Self is Test struct (recall, init is inside impl Test block)
to make get_unchecked_mut not consuming, we have to call as_mut() on the self
hence, we are now using mut self instead of plain self as the argument
we wanted to allow the developer to do stuff with self after calling init. But, we cannot take self by reference, because in that case, the original self will be alive in main, and the developer may still forget about the safety rules
we want to consume self. That way, after pinning it, there will be no way for the developer to accidentally move it
however, we want to return Pin<&mut Self>, so that the developer still has access to the pinned struct after calling init()

Below is the full snippet:

use std::marker::PhantomPinned;
use std::pin::Pin; // this helps us to make our data `!Unpin`

#[derive(Debug)]
struct Test {
    string: String,
    ref_string: *const String,
    _marker: PhantomPinned, // we need this to make our type `!Unpin`
}

impl Test {
    fn new(txt: &str) -> Self {
        Test {
            string: String::from(txt),
            ref_string: std::ptr::null(),
            _marker: PhantomPinned, // This makes our type `!Unpin`
        }
    }

    fn init(mut self: Pin<&mut Self>) -> Pin<&mut Self> {
        unsafe {
            self.as_mut().get_unchecked_mut().ref_string = &self.as_ref().string;
        }
        self
    }
}


// this also had to change
fn print_pinned(test: &Pin<&mut Test>, label: &str) {
    println!(
        "{label}.string = value: {},            --- address:{:p}",
        test.as_ref().string,
        &test.as_ref().string
    );
    unsafe {
        println!(
            "{label}.ref_string = value: {:p},  --- address:{:p}, --- deref to: {}",
            test.as_ref().ref_string,
            &test.as_ref().ref_string,
            *test.as_ref().ref_string
        );
    }
}

fn main() {
    let mut a = Test::new("a");
    let a = unsafe { Pin::new_unchecked(&mut a) }; // Pin `a` before `init`
    let pin_a = a.init(); // we don't even have to shadow, since `a` is already moved
    print_pinned(&pin_a, "a");

    let mut b = Test::new("b");
    let b = unsafe { Pin::new_unchecked(&mut b) }; // Pin `b` before `init`
    let pin_b = b.init(); // // we don't even have to shadow, since `b` is already moved
    print_pinned(&pin_b, "b");

    /* Below won't work anymore, which is good! */

    // std::mem::swap(&mut a, &mut b); // compiler error of `moved` values for both `a` and `b` :)))
}

run it on playground

Accessing fields of `Pin` struct

As you did see in the above example, we had to use an additional unsafe block to reach the inner parts of the Pin pointer:

unsafe {
    self.as_mut().get_unchecked_mut().ref_string = &self.as_ref().string;
}

If you prefer to not use unsafe (like me), there is crate for this. Instead of us, they took the responsibility of these unsafe actions for accessing Pin structs/enums: pin-project

I won’t further complicate this post by providing another example of how to use pin-project. I’m sure you will have no problem discovering it yourself after this post :)

Time to congratulate yourself 🎉 Pin is one of the most confusing topics in Rust.

Hope you liked this article!

References

I’ve tried to summarize what I’ve learned from mostly these 3 resources (not necessarily in order):

https://fasterthanli.me/articles/pin-and-suffering (faster than lime)
https://www.youtube.com/watch?v=DkMwYxfSYNQ (Jon Gjengset)
https://rust-lang.github.io/async-book/04_pinning/01_chapter.html (rust async book)

Share this Post

Pin & Unpin

The Problem: Self-referential structs

Which translates into:

So, how exactly our code worked with unsafe?

Self-referential relationship with async

Async `can be` self-referential` under the hood?

How do we solve this problem?

What is Unpin?

Pinning `!Unpin` is unsafe code

Bonus: Making the API safer

Accessing fields of `Pin` struct

References

Sized & Dynamic Dispatching

Unsafe Rust

Pin & Unpin

The Problem: Self-referential structs

Which translates into:

So, how exactly our code worked with unsafe?

Self-referential relationship with async

Async can be self-referential` under the hood?

How do we solve this problem?

What is Unpin?

Pinning !Unpin is unsafe code

Bonus: Making the API safer

Accessing fields of Pin struct

References

Sized & Dynamic Dispatching

Unsafe Rust

You may also like

Async `can be` self-referential` under the hood?

Pinning `!Unpin` is unsafe code

Accessing fields of `Pin` struct