The componenets/script
folder contains most of the Rust implementation of the DOM and JavaScript interface.
Servo does not has its own JavaScript engine in Rust. It use SpiderMonkey as it JavaScript engine (written in C++), the same as desktop Firefox.
Web developers use JavaScript to interact with the DOM, each DOM object and API must have a corresponding Rust implementation. The main interface between a Rust reflector and its JavaScript counterpart is defined using WebIDL, which is a standard interface definition language commonly used in HTML5 specs. A python code generator (in components/script/dom/bindings/codegen
) will take the WebIDL and generate a JavaScript wrapper/reflector (located in components/script/dom/bindings/codegen/Bindings
) for the Rust code. The servo developers then fill in the missing Rust code. You can find those implementations in components/script/dom
. There are common DOM elements like <body/>
(htmlbodyelement.rs
), <a/>
(htmlanchorelement.rs
), just to name a few.
In old browser engines like Gecko, the JavaScript objects are GC'd (garbage collected) by the JavaScript engine, while the corresponding C++ objects are managed by reference counting. However, a C++ may be pointed by a JavaScript object, and vice-versa, a pointer cycle is created, so the elements in the cycle can never be freed by reference counting. Gecko solve this problem by implementing a complex cycle collector. A cycle collector is not only hard to maintain, but also introduce performance penalties because the whole system has to be paused for the cycle collector to do a sweeping.
Servo address this issue by using the JavaScript engine's GC only. All the Rust objects' lifetime are controlled by the JavaScript engine. Once an DOM object is GC'd by the JavaScript engine, it's corresponding Rust object is freed.
SpiderMonkey used to use a conservative mark-and-sweep garbage collector, but it migrated to a exact rooting garbage collector.
The basic concept behind the mark-and-sweep GC (garbage collection) is easy: The variables or in-memory data structures that are no longer accessable are considered garbage, and needs to be reclaimed. But first, these garbages are allow to accumulate. Until a certain criteria is met, for example: a new object allocation is requested but there is no enough memory, or a certain timeout has reached, the main program is paused for the garbage collector to do its job.
In the first phase, the garbage collector walks through (or trace out, thus the name tracing GC) the memory to mark objects that are directly or indirectly reachable as live. Directly reachable objects (root in GC speaking) includes objects that are referenced by static variables, stack variab les or even CPU register. Indirectly reachable objects are those referenced by the directly reachable objects or each other. In the second phase, the objects on the heap that are not marked as live are freed, or sweeped.
However, depending on the timing of the GC, there might be unreachable objects that are actually needed afterwords. For example, consider a local variable that was created to be inserted into an array. But before the insertion is executed, the GC kicks in. At that point, the local variable is actually live but unreachable from any root.
There are at least two ways to solve this problem. The first is called conservative GC. What is does is that during the marking phase, it looks into the stack and CPU registers and find anything that looks like a pointer pointing into the heap. Although there may be false positives, we still treat the object at the pointed address as live objects. Another way is called exact rooting. Programmers has to manually mark newly created variables or objects as root, so they will not be recycled.
At first glance, exact rooting puts a lot of burden on programmers, and can be error-prone if the programmer forget to mark anything. Forgetting to mark an object as live may cause memory leak, create security loopholes or used-after-free problems. Although conservative GC has some false positives, but it looks easy and clean. However, the conservative approach has a fatal flaw: the GC managed objects, or GC thing, can't be moved. This prevents many advanced GC optimization like compacting and generational GC, which will be discussed later.
When we want to move a GC thing, we need to update all the pointers that points to it. However, because conservative GC are not exactly sure if a value in the stack or register is an actual pointer or not, it might accidentally change the wrong thing. For example, suppose we have a value X somewhere in the memory, but the garbage collector wants to move the GC thing at heap address X to Y. The value might be treated as a pointer to X and being changed to Y. This will corrupt the value and affect the behavior of the program. With exact rooting, since all live objects are manually marked, the garbage collector can confidently move a GC thing because it knows what is a pointer and what is not.
Then why would a garbage collector wants to move GC things around? One use case is an optimization method call compacting. Since after a few runs of GC, the live objects may be scattered sparsely in the memory. In order to eliminate the fragmentation of the free space for more efficient usage of memory, the garbage collector may want to shift the live GC things together in memory.
Another use case, which have been implemented in SpiderMonkey, is called Generational GC. In generational GC, the heap is split into two sections: the nursery area and the tenured area. Newly created objects are allocated in the nursery area. If the object lives long enough, it will be moved into the tenured area. Since many objects, like local variable, usually don't live very long, the move will not be very often. Since many objects die early in the nursery, usually a fast minor GC in the nursery only will reclaim enough memory for the program to continue running. We only need to do a full-scale major GC on the whole heap occasionally, reducing the overall GC pause time. Also the fragmentation will reduce because the long-lived objects will be pretty condensed in the tenured area.
In order to collaborate with the garbage collector, Servo introduced a few smart pointers, which are located in components/script/dom/bindings/js.rs
. We'll introduce why these smart pointers are necessary, and when to use them.
Atom
type used in the html element attribute?Atom
is from the string-cache crate. It uses a technique called string interning to save only one copy of duplicated strings, thus reduce the memory usage.
Interned strings are stored in a hashmap. If the a string already exists in the hashmap, the Atom
version of it will only store the location in the hashmap.
Au
?You might encounter the length unit Au
when doing layout queries. For example, when you use layout query to get the upper-left corner position of a <img/>
, the origin.x
and origin.y
are both in Au
. An Au
is an "App Unit" and represents 1/60th of a CSS pixel. It was originally proposed in 2002 [3] as a standard unit of measure in Gecko. The definition of Au
is in components/util/geometry.rs
.
14:24 slyu-home I have one more question, in the dom implementations, there are new_inherited(), new(), and Constructor(). What are the differences between them? Thanks
14:24 larsberg I'm not entirely sure. There are any number of things that can cause a failure to happen and merges to not happen. But I suspect that adding the linux2 builder had the good side effect of fixing that android bit.
14:24 Ms2ger slyu, Constructor is the thing called from JS
14:24 * larsberg does not like all the rube goldbergian semi-related moving parts in these systems
14:25 Ms2ger slyu-home, new() is the thing you call when you create the object
14:25 Ms2ger slyu-home, new_inherited() is the thing you call from Class::new() and Subclass::new()
14:26 slyu-home Ms2ger, I didn't get the new_inherited()
14:26 Ms2ger slyu-home, so we have to fake class inheritance because rust doesn't support that
14:27 slyu-home Oh I see
14:27 Ms2ger slyu-home, Foo::new returns an object with a reflector of type Foo, which you don't want for Bar that inherits from Foo
14:29 slyu-home A side question: why do we call it a "reflector"
14:29 slyu-home As far as I understand, it's something created in the JS VM so it can access the underlying Rust
14:29 Ms2ger It reflects DOM APIs across the JS<->Rust boundary
14:30 slyu-home I see
14:30 Ms2ger The alternative is "wrapper", which can mean any of a dozen things :)
14:30 slyu-home Because "reflect" makes me think of bouncing back
14:30 Ms2ger Heh
14:30 slyu-home Now I understand
---
16:07 shinglyu Hi
16:08 Manishearth o/
16:08 shinglyu I am reading this article: https://blog.mozilla.org/research/2014/08/26/javascript-servos-only-garbage-collector/
16:08 shinglyu And this confuses me: "As in C++, the fields of Node are included in-line with the fields of Document, without any pointer indirection"
16:08 Manishearth oh
16:08 Manishearth so
16:08 Manishearth struct Node {something: u8}
16:09 Manishearth struct Document {parent: Node, somethingelse: u8}
16:09 shinglyu So is this a automatic optimization done by the compiler?
16:09 Manishearth will be stored as struct Document {parent: {something: u8}, somethingelse: u8}
16:09 Manishearth which in memory is just two u8s
16:09 Manishearth shinglyu: not even an optimization
16:09 shinglyu Ah!
16:10 Manishearth shinglyu: nothing is a pointer by default
16:10 shinglyu I see
16:10 Manishearth unlike Java and all
16:10 Manishearth so nesting structs is a flat operation
16:10 shinglyu so they are placed in the memory next to each other
16:10 Manishearth note that when I say `parent` there, I mean the parent type (eg an HTMLElement is an Element is a Node, so each one is a child of the other in that sense)
16:11 Manishearth for storing a pointer to a parent node we will use a JS<node>
16:11 Manishearth shinglyu: yep
16:11 shinglyu So if we have a nested struct, it will be flattened
16:11 shinglyu kind of like serialization (?)
16:11 *** Temur joined #servo
16:14 Manishearth serialization?
16:15 Manishearth but yes, it's flattened
16:15 Manishearth how else would it be stored?
16:15 Manishearth if you're not allowed to insert random pointers, that's the only way to store it
16:16 shinglyu My first intuition is that we store a pointer to a Node struct created somewhere else
16:16 Manishearth nop
16:16 Manishearth *nope
16:16 Manishearth Rust and C++ do not put random pointers around
16:16 Manishearth Node is Node
16:17 Manishearth Box<Node>, JS<Node> JSRef<Node> are all pointers
16:17 shinglyu I see
16:18 *** Ms2ger joined #servo
16:19 shinglyu Does it make sense to use a pointer instead (perhaps in other kind of language?) in any case?, or is it simply a stupid idea?
16:21 Manishearth not a stupid idea
16:21 Ms2ger Well, this is simulating inheritance
16:21 Manishearth in a language like Java, you GC everything anyway, so a pointer is fine
16:21 Manishearth Ms2ger: I think he's asking about the language "feature" of flattening, not our usage of it
16:21 Ms2ger Oh
16:22 Manishearth shinglyu: but note that indirection can get costly
16:22 Manishearth in lower level languages like C++/Rust you want to use pointers when you have stuff that you can't, or don't want to, put on the stack
16:22 Ms2ger Right, then Java would store a pointer, but that's more because all object types are really Gc<Option<T>>
16:22 Manishearth yep
16:23 Ms2ger +RefCell
16:23 shinglyu I see
16:23 *** canhtak joined #servo
16:23 Manishearth Ms2ger: wait, why refcell?
16:23 Ms2ger Note that we can't use a pointer for the Node fields, because then we couldn't cast a pointer to a Node to a pointer to a Document
16:23 Manishearth oh right
16:24 Manishearth mutability
16:24 Ms2ger Manishearth, what else do you use RefCell for?
16:24 Manishearth no I was thinking in the Java frame of mind and was like "why do we need refcounted interior mutability here?"
16:24 Manishearth and then I realized that that's exactly what you were saying
16:24 Manishearth well, perhaps it would be Cell though
16:25 Ms2ger Mm, probably
16:25 Manishearth shinglyu: note that since Node is the first field to the Document, and since memory layout is flat, a variable which is a Document can be treated like a Node without any issues
16:26 Manishearth which is why all our casts are simple transmute (where transmute just is a way of telling the compiler to treat a variable of type A as type B)
16:26 Manishearth well, all of our up casts
16:26 Manishearth downcasts are checked
16:26 *** dawehner quit (Ping timeout: 121 seconds)
16:26 shinglyu Manishearth: but we have to keep Node as the first field of Document, right?
16:27 Manishearth yes
16:27 shinglyu And what happen to the rest of the Document's field when we cast it to Node?
16:27 Manishearth nothing
16:27 Manishearth they stay in memory
16:27 Manishearth it's just not accessible anymore
16:27 shinglyu ok
16:27 Manishearth until you cast it back, that is
16:27 *** canhtak quit (Ping timeout: 121 seconds)
16:28 shinglyu But they are in the stack, so it has nothing to do with memory leak, am I right?
16:28 *** Rym joined #servo
16:28 Manishearth shinglyu: well, in the case of Servo, all of these are on the heap
16:28 *** canhtak joined #servo
16:29 Manishearth it's JSRef<T>, remember
16:29 Manishearth so, when we cast it, the fields of Document which were on the heap aren't accessible through the new Node pointer
16:29 Manishearth but they are accessible through the old Document pointer
16:29 Manishearth (JSRef<Node> and JSRef<Document>, that is)
16:29 shinglyu Oh ok
16:29 Manishearth and they are still known to the SM GC
16:30 Manishearth it's just that locally we lose access
16:30 Manishearth leaks are handled by the GC
16:30 shinglyu So even if we are done with the JSRef<Node>, thie JSRef<Document> will still be live
16:31 shinglyu When JSRef<Document> is also dead, it will be GC'd (?)
16:31 *** psd joined #servo
16:32 Manishearth so even if the JSRef<Document> wasn't alive, it still wouldn't be leaked
16:32 *** Rym quit (Ping timeout: 121 seconds)
16:32 Manishearth JSRef<T> is a local rooted reference
16:32 Manishearth it
16:32 Manishearth it
16:32 Manishearth it's possible that there are other pointers to the document within the DOM
16:33 Manishearth eg a JS<T> in some dom struct elsewhere
16:33 Manishearth JSRef<T> is just a way for us to tell the GC "hold on, even if this loses all references don't delete this, I'm holding on to it for now"
16:34 Manishearth and when we relinquish a JSRef, we tell the GC that it's okay to start GCing this object now
16:36 *** dawehner joined #servo
16:39 shinglyu Thanks! that's very helpful explanation
16:39 shinglyu Manishearth: I also have a question about cycle collection
16:40 shinglyu From the first paragraph of "Auto-generating field traversals"
16:40 shinglyu it seems like we still have cycle collection?
16:41 shinglyu But I think the previous section "Memory management for the DOM" is saying that cycle collection is bad, and we want a better way to do this?
16:42 *** nattokirai quit (Client exited)
16:49 *** dinfuehr quit (Connection closed)
16:54 Manishearth shinglyu: I believe the SM GC is cycle collecting ( Ms2ger ?)
16:55 Manishearth I think the blog post is saying that we don't create an auxiliary GC to manage the DOM
16:55 Ms2ger "cycle collection" is a term that's typically used for a particular kind of system based on refcounting
16:55 Manishearth not sure
16:56 Ms2ger The GC doesn't really collect cycles; it just collects everything that isn't rooted, so it doesn't even notice there are cycles in the dead objects
16:58 SimonSapin SpiderMonkey’s garbage collector is a "tracing" garbage collector
16:58 SimonSapin while CPython for examples uses reference counting with a cycle collector.
16:58 SimonSapin these are two kinds of garbage collectors
16:58 Manishearth ah okay
16:59 Manishearth so SM just goes around marking rooted stuff
16:59 Manishearth and cleanung up non-rooted stuff
16:59 Ms2ger Well, two kinds of automatic memory management
16:59 Manishearth what does cycle collecting do?
16:59 Ms2ger "garbage collection" is often reserved for the tracing kind
17:00 Ms2ger So if you have a dead cycle with refcounting, none of the objects reach a refcount zero
17:00 Ms2ger CC detects that case, and force-deletes those objects
17:00 SimonSapin Manishearth: I don’t know how CC works in CPython
17:00 Ms2ger Manishearth, also, r? #5741
17:00 crowbot1 Issue #5741: Use get_attr_for_layout more. - https://github.com/servo/servo/pull/5741
17:03 Manishearth later, studying
17:05 shinglyu I read the whole blog post again, now I believe it's trying to say "Instead of ref counting + cycle collector, we use the hooks in Encodeable trait to let GC know all the pointers between DOM objects"
17:05 *** Nekit1234007 joined #servo
17:07 Manishearth note: we don't use Encodable anymore
17:07 shinglyu Oh, what's the new mechanism then?
17:10 SimonSapin a special-purpose Traceable trait with a compiler plugin to implement in automatically
17:11 shinglyu SimonSapin: components/script/dom/bindings/trace.rs?
17:12 Manishearth yep
17:13 shinglyu Hmm, so the mechanism looks similar
17:14 Manishearth it's the same, just not so hacky
17:14 *** acgtyrant quit (Ping timeout: 121 seconds)
17:14 shinglyu Manishearth: Great, thankss