Requirements: Basic grasp of pointers and borrow-checker
This is a tricky concept in Rust for many, and I think I can make it a bit more clear for the ones who are trying to understand it. Unlike other posts, let’s start with our beloved borrow checker.
fn main() {
/* Borrow checker does not allow us to SWAP/MOVE the borrowed values */
let mut a = "a";
let mut ref_a = &a;
let mut b = "b";
let mut ref_b = &b;
println!("a = value: {}, --- address:{:p}", a, &a);
println!("ref_a = value: {:p}, --- address:{:p}, --- deref to: {}", ref_a, &ref_a, *ref_a);
println!("b = value: {}, --- address:{:p}", b, &b);
println!("ref_b = value: {:p}, --- address:{:p}, --- deref to: {}", ref_b, &ref_b, *ref_b);
/* try uncomment one of the lines below */
// b = a; // MOVE - COMPILER ERROR
// std::mem::swap(&mut a, &mut b); // SWAP - COMPILER ERROR
println!("ref_a = value: {}, --- address:{:p}", ref_a, &ref_a);
println!("ref_b = value: {:p}, --- address:{:p}, --- deref to: {}", ref_b, &ref_b, *ref_b);
}
The code itself is quite readable, but still, let me give a small summary:
ref_a
andref_b
are borrowinga
andb
respectively,- we cannot change the values of
a
andb
before we are done withref_a
andref_b
- if we try to
move
orswap
the values ofa
orb
while they are borrowed, compiler will get angry.
We already know this. It is the infamous borrow-checker.
The Problem: Self-referential structs
There is a nuance. If both a
and ref_a
are in the same struct, then compiler may not get mad. Because both a
and ref_a
will belong to the same entity, and we will be moving that entity itself. As a result, we may get unwanted/buggy behavior. See the code below:
#[derive(Debug)]
struct Test {
string: String,
ref_string: *const String, // this is a pointer now
}
impl Test {
fn new(txt: &str) -> Self {
Test {
string: String::from(txt),
ref_string: std::ptr::null(),
}
}
fn init(&mut self) {
self.ref_string = &self.string;
}
}
fn print_test_struct(test: &Test, label: &str) {
println!(
"{label}.string = value: {}, --- address:{:p}",
test.string, &test.string
);
unsafe {
println!(
"{label}.ref_string = value: {:p}, --- address:{:p}, --- deref to: {}",
test.ref_string, &test.ref_string, *test.ref_string
);
}
}
fn main() {
/* "Self-referential - SWAP PROBLEM - values will be inconsistent" */
let mut a = Test::new("a");
a.init();
let mut b = Test::new("b");
b.init();
println!("# initial values:");
println!();
print_test_struct(&a, "a");
println!();
print_test_struct(&b, "b");
println!("\n-----------------------\n");
/* Let's swap */
std::mem::swap(&mut a, &mut b);
println!("# values after swap:\n");
print_test_struct(&a, "a");
println!();
print_test_struct(&b, "b");
}
Here is the output from the playground (https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=697589672992f95b8705247a926c7b81):
Weird, huh?
We can replicate the same behavior also with move
:
fn main() {
/* "Self-referential - MOVE PROBLEM - values will be inconsistent" */
let mut a = Test::new("a");
a.init();
let mut b = Test::new("b");
b.init();
println!("# initial values:");
print_test_struct(&a, "a");
print_test_struct(&b, "b");
println!("\n-----------------------\n");
/* Let's move */
b = a;
println!("# values after swap:\n");
// print_test_struct(&a, "a"); // cannot print `a`, since it's moved
print_test_struct(&b, "b");
}
In short, it is not a very good idea to move self-referential structs.
But, wasn’t Rust safe? How come this happened? Is it because we used unsafe
? Yes!
If we tried to replicate the same behavior with not raw pointers, but say with Box
, the borrow checker would immediately complain.
Let me demonstrate:
#[derive(Debug)]
struct Test<'a> {
string: String,
ref_string: Option<&'a String>, // this is a pointer now
}
impl<'a> Test<'a> {
fn new(txt: &str) -> Self {
Test {
string: String::from(txt),
ref_string: None,
}
}
fn init(&'a mut self) {
self.ref_string = Some(&self.string);
}
}
fn print_test_struct(test: &Test, label: &str) {
println!(
"{label}.string = value: {}, --- address:{:p}",
test.string, &test.string
);
println!(
"{label}.ref_string = value: {:?}, --- address:{:p}, --- deref to: {}",
test.ref_string,
&test.ref_string,
*test.ref_string.unwrap()
);
}
fn main() {
/* "Self-referential - SWAP PROBLEM - values will be inconsistent" */
let mut a = Test::new("a");
a.init();
let mut b = Test::new("b");
b.init();
println!("# initial values:");
println!();
print_test_struct(&a, "a"); // COMPILER ERROR
println!();
print_test_struct(&b, "b"); // COMPILER ERROR
println!("\n-----------------------\n");
/* Let's swap */
std::mem::swap(&mut a, &mut b); // COMPILER ERROR
println!("# values after swap:\n");
print_test_struct(&a, "a"); // COMPILER ERROR
println!();
print_test_struct(&b, "b"); // COMPILER ERROR
}
For example, the first compiler error is:
cannot borrow `a` as immutable because it is also borrowed as mutable
immutable borrow occurs here
main.rs(37, 5): mutable borrow occurs here
Which translates into:
init
function mutably borrows the self- and this borrow needs a lifetime, which is the lifetime of the self
- In other words, after
init
function’s code execution is finished, this borrow is still active, becauseref_string
actively borrows fromstring
field (due to lifetime) - if there is an active mutable borrow is present, we can’t create another reference out of it (borrow checker rule)
So, how exactly our code worked with unsafe?
The borrow operation happened and finished within the scope of init
function, since there was no lifetime associated to the fields. Thus, we were able to create another reference out of it.
Self-referential relationship with async
You may be thinking: thank god I don’t have self-referential stuff in my code. But do you have async
code in your project? Because, async
can be self-referential under the hood. In this case, we ask help from Pin/Unpin
to make our code fit back in the compiler’s safety guarantees. We, as Rust developers, appreciate it a lot if the compiler can provide the safety instead of us. We are lazy and we like to delegate the responsibility 🙂
Async can be
self-referential` under the hood?
I will be quoting from async book of rust.
Quote starts…
what happens if we have an async
block that uses references? For example:
async {
let mut x = [0; 128];
let read_into_buf_fut = read_into_buf(&mut x);
read_into_buf_fut.await;
println!("{:?}", x);
}
What struct does this compile down to?
struct ReadIntoBuf<'a> {
buf: &'a mut [u8], // points to `x` below
}
struct AsyncFuture {
x: [u8; 128],
read_into_buf_fut: ReadIntoBuf<'what_lifetime?>,
}
Here, the ReadIntoBuf
future holds a reference into the other field of our structure, x
(see my screenshot below). However, if AsyncFuture
is moved, the location of x
will move as well, invalidating the pointer stored in read_into_buf_fut.buf
.
Quote ends…
In case the last paragraph was a bit cryptic (it was to me when I first read it), let me make it much easier for you.
Now, you should be able to the see the hidden cycle.
If you are confused about how async
code is compiling down to something else, I strongly suggest my own resource on this: https://github.com/ozgunozerk/asynction
How do we solve this problem?
By pinning
our self-referential structs, we will be telling the compiler that: “hey, look, I don’t want to move this, alright? Get angry if I try to move this.”
So, we actually didn’t make moving self-referential structs
safe, we just forbid ourselves to move them. Because moving them is unsafe 😀. That’s the point of pin
. We will be making some promises and compiler is going to check these promises for us.
How does pin
work? In short, pinning
something will make it impossible to move it.
In long, Pin
struct has a single field, which is a pointer, and that is a private field, so you cannot reach the pointer. And the struct Pin
implements several methods for reaching the pinned data via mutable or non-mutable references, but all of them have some strict bounds to ensure that you won’t be able to move the data. To sum, when you Pin
the data, and you won’t have access to the actual pointer (which points to the data hidden behind). And you have very limited access to the data via Pin
methods.
What is Unpin?
You should interpret this name as can be Unpinned
instead. If the data we are pinning is Unpin
, we can get a mutable reference, move it, we can do it anything we like. Because the compiler does not care about Pin
promises for Unpin
data. And the methods available for the Pin
struct, are actually very flexible if the target data is Unpin
.
Basically, you can think of it like, Pin
and Unpin
cancels each other.
So, which data types are Unpin
? I have good news, everything is Unpin
, except self-referential structs
. So, if you Pin
some regular data, it will ultimately have no effect, since your data was Unpin
(most probably).
And you can guess by now, !Unpin
means, not Unpin
, or it means: this data WILL get affected by Pin
.
Compiler is generally doing well about async stuff, and it marks the necessary things as !Unpin
. However, the self-referential
struct we wrote above in our examples, is actually marked as Unpin
, whereas it should have been !Unpin
.
The reason is, Unpin
is an auto trait, and will be implemented for each data, if that data’s fields are Unpin
.
The compiler is not trying to deduce if there is any cycle for these structs. If it did, it could have marked them as !Unpin
. Maybe there are other reasons as well, but for our custom self-referential structs
, we have to mark them as !Unpin
ourselves (see the code below):
use std::pin::Pin;
use std::marker::PhantomPinned; // this helps us to make our data `!Unpin`
#[derive(Debug)]
struct Test {
string: String,
ref_string: *const String,
_marker: PhantomPinned, // we need this to make our type `!Unpin`
}
impl Test {
fn new(txt: &str) -> Self {
Test {
string: String::from(txt),
ref_string: std::ptr::null(),
_marker: PhantomPinned, // This makes our type `!Unpin`
}
}
fn init(&mut self) {
self.ref_string = &self.string;
}
}
We cannot use impl !Unpin for Test{}
as of December 8, 2022, so we have to use a phantom marker
instead.
Pinning !Unpin
is unsafe code
What? Why? Wasn’t this the main purpose of Pin
:
to pin
!Unpin
data.
Hahaha. This part is actually really fun. Take a look at the below main
built on top of the above code:
fn main() {
/* "Self-referential - SWAP PROBLEM - values will be inconsistent" */
let mut a = Test::new("a");
a.init();
let mut b = Test::new("b");
b.init();
println!("# initial values:");
println!();
print_test_struct(&a, "a");
println!();
print_test_struct(&b, "b");
println!("\n-----------------------\n");
// CHECKPOINT START
let mut a_pin = unsafe { Pin::new_unchecked(&mut a) };
let mut b_pin = unsafe { Pin::new_unchecked(&mut b) };
/* Let's swap */
std::mem::swap(&mut a, &mut b);
// CHECKPOINT END
println!("# values after swap:\n");
print_test_struct(&a, "a");
println!();
print_test_struct(&b, "b");
}
Pay attention to the code between CHECKPOINT START
and CHECKPOINT END
. We will modify that part later to experiment a bit with this code. I will refer to that part as the checkpoint code.
Although we did pin
the values, we could swap a
with b
. And the problem is still there, a.ref_string
and b.ref_string
pointers are pointing to the wrong ones.
The compiler was not angry, because, Pin
does not allow you to move the values, if you try to move them through the Pin
itself only.
If you can access your variable directly, without using Pin
, there is nothing compiler can do for you. That’s why, pinning !Unpin
is unsafe
. Since you (as the developer) should make and hold a promise, which is: I will not move the data I’m right now pinning
, through other ways (like we did above).
The correct usage would be shadowing the variable name, in order to limit ourselves (only way to access the variable should be through Pin
):
// WRONG
let mut a_pin = unsafe { Pin::new_unchecked(&mut a) };
let mut b_pin = unsafe { Pin::new_unchecked(&mut b) };
// CORRECT
let mut a = unsafe { Pin::new_unchecked(&mut a) }; // notice the shadowing
let mut b = unsafe { Pin::new_unchecked(&mut b) }; // notice the shadowing
So, if we change our checkpoint code, as:
// CHECKPOINT START
let mut a = unsafe { Pin::new_unchecked(&mut a) }; // `a` instead of `a_pin`
let mut b = unsafe { Pin::new_unchecked(&mut b) }; // `b` instead of `b_pin`
/* Let's swap */
std::mem::swap(&mut a, &mut b);
// CHECKPOINT END
You will see the code still compiles. LOL!
But the problem is no longer there. a.ref_string
and b.ref_string
are now pointing to their own string
fields, it is correct!
But why? Has the swap
operation not worked? It actually did. What we have swapped were not our structs, but the pointers. So now, a
(the pin pointer) is pointing towards b
(struct). And b
(the pin pointer) is pointing towards a
(struct). This is ok, since the structs themselves have not moved, and thus, their a
and b
variables have not moved as well.
Okay, how can we try to move the actual a
and b
struct through the pin
pointer? So that, we can see the compiler is forbidding us that operation? First, we should get a mutable
reference to our structs. Change the checkpoint code as follows:
// CHECKPOINT START
let mut a = unsafe { Pin::new_unchecked(&mut a) };
let mut b = unsafe { Pin::new_unchecked(&mut b) };
/* Let's swap */
std::mem::swap(a.get_mut(), b.get_mut());
// CHECKPOINT END
Now, we get a proper compiler error:
Compiling playground v0.0.1 (/playground)
error[E0277]: `PhantomPinned` cannot be unpinned
--> src/main.rs:47:20
|
47 | std::mem::swap(a.get_mut(), b.get_mut());
| ^^^^^ ------- required by a bound introduced by this call
| |
| within `Test`, the trait `Unpin` is not implemented for `PhantomPinned`
|
= note: consider using `Box::pin`
note: required because it appears within the type `Test`
--> src/main.rs:5:8
|
5 | struct Test {
| ^^^^
note: required by a bound in `Pin::<&'a mut T>::get_mut`
error[E0277]: `PhantomPinned` cannot be unpinned
--> src/main.rs:47:37
|
47 | std::mem::swap(a.get_mut(), b.get_mut());
| ^^^^^ ------- required by a bound introduced by this call
| |
| within `Test`, the trait `Unpin` is not implemented for `PhantomPinned`
|
= note: consider using `Box::pin`
note: required because it appears within the type `Test`
--> src/main.rs:5:8
|
5 | struct Test {
| ^^^^
note: required by a bound in `Pin::<&'a mut T>::get_mut`
For more information about this error, try `rustc --explain E0277`.
error: could not compile `playground` due to 2 previous errors
So, coming back to the question: why pinning !Unpin
is unsafe
?
Because, remember, we have to give a compiler a promise: we will not use the data we pinned without going through the Pin
pointer. That’s why we shadowed the variable names to limit ourselves. In conclusion, the compiler cannot promise the full safety by simply putting pin
, and we have to take some of the responsibility for pinning the data. That’s why, it is unsafe
.
We made it! Now:
- we can
pin
our self-referencing structs (includingasync
stuff in our code) - we know what
!Unpin
means - we can make our custom self-referential code
!Unpin
- we know the reason behind
unsafe
for pinning a!Unpin
data, so no need to be afraid of it anymore
Bonus: Making the API safer
So far, we were relying on the developer’s promise to not move the pinned data. However, we can make our API safer, and enforce these promises via our design.
Having init
method to accept Pin<&mut Self>
instead of &mut Self
would make sense. As the caller has to pin the struct before passing it to init
. Also, after init
, we want to make it impossible to move the struct. So, it make double sense!
I present you our new init
method:
fn init(self: Pin<&mut Self>) {
unsafe {
self.get_unchecked_mut().ref_string = &self.as_ref().string;
}
}
However, this API might be too restrictive. get_unchecked_mut()
is moving the self
, making it impossible to do anything with self
after calling init()
. We cannot even print it…
If we are okay with not having access to self
again, this is fine. But a more flexible API would be:
fn init(mut self: Pin<&mut Self>) -> Pin<&mut Self> {
unsafe {
self.as_mut().get_unchecked_mut().ref_string = &self.as_ref().string;
}
self
}
This seems a bit ugly, let me explain everything step by step:
self
isPin<&mut Self>
Self
isTest
struct (recall,init
is insideimpl Test
block)- to make
get_unchecked_mut
not consuming, we have to callas_mut()
on theself
- hence, we are now using
mut self
instead of plainself
as the argument - we wanted to allow the developer to do stuff with
self
after callinginit
. But, we cannot takeself
by reference, because in that case, the originalself
will be alive in main, and the developer may still forget about the safety rules - we want to consume
self
. That way, after pinning it, there will be no way for the developer to accidentally move it - however, we want to return
Pin<&mut Self>
, so that the developer still has access to the pinned struct after callinginit()
Below is the full snippet:
use std::marker::PhantomPinned;
use std::pin::Pin; // this helps us to make our data `!Unpin`
#[derive(Debug)]
struct Test {
string: String,
ref_string: *const String,
_marker: PhantomPinned, // we need this to make our type `!Unpin`
}
impl Test {
fn new(txt: &str) -> Self {
Test {
string: String::from(txt),
ref_string: std::ptr::null(),
_marker: PhantomPinned, // This makes our type `!Unpin`
}
}
fn init(mut self: Pin<&mut Self>) -> Pin<&mut Self> {
unsafe {
self.as_mut().get_unchecked_mut().ref_string = &self.as_ref().string;
}
self
}
}
// this also had to change
fn print_pinned(test: &Pin<&mut Test>, label: &str) {
println!(
"{label}.string = value: {}, --- address:{:p}",
test.as_ref().string,
&test.as_ref().string
);
unsafe {
println!(
"{label}.ref_string = value: {:p}, --- address:{:p}, --- deref to: {}",
test.as_ref().ref_string,
&test.as_ref().ref_string,
*test.as_ref().ref_string
);
}
}
fn main() {
let mut a = Test::new("a");
let a = unsafe { Pin::new_unchecked(&mut a) }; // Pin `a` before `init`
let pin_a = a.init(); // we don't even have to shadow, since `a` is already moved
print_pinned(&pin_a, "a");
let mut b = Test::new("b");
let b = unsafe { Pin::new_unchecked(&mut b) }; // Pin `b` before `init`
let pin_b = b.init(); // // we don't even have to shadow, since `b` is already moved
print_pinned(&pin_b, "b");
/* Below won't work anymore, which is good! */
// std::mem::swap(&mut a, &mut b); // compiler error of `moved` values for both `a` and `b` :)))
}
Accessing fields of Pin
struct
As you did see in the above example, we had to use an additional unsafe
block to reach the inner parts of the Pin
pointer:
unsafe {
self.as_mut().get_unchecked_mut().ref_string = &self.as_ref().string;
}
If you prefer to not use unsafe
(like me), there is crate for this. Instead of us, they took the responsibility of these unsafe actions for accessing Pin
structs/enums: pin-project
I won’t further complicate this post by providing another example of how to use pin-project
. I’m sure you will have no problem discovering it yourself after this post :)
Time to congratulate yourself 🎉 Pin
is one of the most confusing topics in Rust.
Hope you liked this article!
References
I’ve tried to summarize what I’ve learned from mostly these 3 resources (not necessarily in order):
- https://fasterthanli.me/articles/pin-and-suffering (faster than lime)
- https://www.youtube.com/watch?v=DkMwYxfSYNQ (Jon Gjengset)
- https://rust-lang.github.io/async-book/04_pinning/01_chapter.html (rust async book)