Use ResourceId to refer to Resources by timon-schelling · Pull Request #4176 · GraphiteEditor/Graphite

timon-schelling · 2026-05-26T13:16:31Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a migration from referencing resources by hash to referencing them by a unique ID (ResourceId), backed by a new ResourceRegistry to manage resource sources and garbage collection. Key feedback focuses on making the JSON migration parser more robust against pretty-printed or multi-field objects, removing a strict check that could cause deserialization failures on documents with missing embedded resources, and optimizing the resource garbage collection pipeline by passing HashSet references directly to avoid redundant conversions and allocations.

gemini-code-assist · 2026-05-26T13:18:34Z

+	fn collect_resources_referenced_by_hash(s: &str) -> Vec<(ResourceHash, Range<usize>)> {
+		let mut out = Vec::new();
+		let mut offset = 0;
+
+		while let Some(i) = s[offset..].find("\"Resource\":") {
+			let abs = offset + i + 11; // pos after `"Resource":`
+			offset = abs;
+
+			if let Some(j) = s[offset..].find('}') {
+				let chunk_start = offset;
+				let chunk = &s[chunk_start..chunk_start + j];
+
+				let quotes: Vec<_> = chunk.match_indices('"').collect();
+				if quotes.len() == 2 {
+					let q0 = chunk_start + quotes[0].0;
+					let q1 = chunk_start + quotes[1].0;
+					let hash = &s[q0 + 1..q1];
+
+					if hash.len() == 64
+						&& hash.bytes().all(|b| matches!(b, b'0'..=b'9' | b'a'..=b'f'))
+						&& let Ok(hash) = hash.parse::<ResourceHash>()
+					{
+						out.push((hash, q0..q1 + 1));
+					}
+				}
+
+				offset = chunk_start + j + 1;
+			} else {
+				break;
+			}
+		}
+
+		out
+	}


The current implementation of collect_resources_referenced_by_hash relies on finding the next } character and then checking if there are exactly two quotes in that chunk. This is fragile and will fail to detect resource hashes in several common scenarios:

If the JSON is pretty-printed, the } might be on a different line or much further down, and the chunk will contain other fields/quotes.

If there are other fields in the same JSON object after the "Resource" field, the chunk will contain those fields and their quotes, making quotes.len() == 2 false.

We can make this parsing much more robust and simple by directly finding the first and second quotes after "Resource": and extracting the string between them.

fn collect_resources_referenced_by_hash(s: &str) -> Vec<(ResourceHash, Range<usize>)> { let mut out = Vec::new(); let mut offset = 0; while let Some(i) = s[offset..].find("\"Resource\":") { let abs = offset + i + 11; // pos after `"Resource":` offset = abs; if let Some(q0_rel) = s[offset..].find('"') { let q0 = offset + q0_rel; if let Some(q1_rel) = s[q0 + 1..].find('"') { let q1 = q0 + 1 + q1_rel; let hash_str = &s[q0 + 1..q1]; if hash_str.len() == 64 && hash_str.bytes().all(|b| matches!(b, b'0'..=b'9' | b'a'..=b'f')) { if let Ok(hash) = hash_str.parse::<ResourceHash>() { out.push((hash, q0..q1 + 1)); } } offset = q1 + 1; } else { break; } } else { break; } } out }

gemini-code-assist · 2026-05-26T13:18:34Z

+	// Require each referenced hash to also appear as an embedded resource map key (at least 2 occurrences).
+	if resources_by_hash.is_empty() || !resources_by_hash.iter().all(|(hash, _)| document_serialized_content.matches(&hash.to_string()).count() >= 2) {
+		return (document_serialized_content, HashMap::new());
+	}


The .all(...) check requires every referenced hash to appear at least twice in the document (once in the node input and once in the embedded resources). If a single resource reference is dangling or missing from the embedded list, this check will cause the entire migration to be skipped.

Since TaggedValue::Resource now expects a ResourceId (bare integer) instead of a ResourceHash (string), skipping the migration means the document will fail to deserialize entirely and cannot be opened. To prevent data loss and allow the document to open successfully, we should remove this strict check and migrate whatever resources are found.

if resources_by_hash.is_empty() { return (document_serialized_content, HashMap::new()); }

gemini-code-assist · 2026-05-26T13:18:34Z

+	pub fn used_resources(&self, include_history: bool) -> Box<[ResourceId]> {
 		let mut resources = HashSet::new();
 		self.network_interface.collect_used_resources(&mut resources);
 		if include_history {
 			self.document_undo_history.iter().for_each(|interface| interface.collect_used_resources(&mut resources));
 			self.document_redo_history.iter().for_each(|interface| interface.collect_used_resources(&mut resources));
 		}
-		resources
+		resources.into_iter().collect::<Vec<_>>().into_boxed_slice()
 	}


used_resources collects the used resource IDs into a HashSet, but then converts it to a Vec and then a Box<[ResourceId]>. When this is passed to garbage_collect, the first thing garbage_collect does is collect it back into a HashSet.

This results in redundant allocations and conversions (HashSet -> Vec -> Box -> HashSet). Returning HashSet<ResourceId> directly from used_resources and updating garbage_collect to accept &HashSet<ResourceId> avoids these redundant operations.

Suggested change

pub fn used_resources(&self, include_history: bool) -> Box<[ResourceId]> {

let mut resources = HashSet::new();

self.network_interface.collect_used_resources(&mut resources);

if include_history {

self.document_undo_history.iter().for_each(|interface| interface.collect_used_resources(&mut resources));

self.document_redo_history.iter().for_each(|interface| interface.collect_used_resources(&mut resources));

}

resources

resources.into_iter().collect::<Vec<_>>().into_boxed_slice()

}

pub fn used_resources(&self, include_history: bool) -> HashSet<ResourceId> {

let mut resources = HashSet::new();

self.network_interface.collect_used_resources(&mut resources);

if include_history {

self.document_undo_history.iter().for_each(|interface| interface.collect_used_resources(&mut resources));

self.document_redo_history.iter().for_each(|interface| interface.collect_used_resources(&mut resources));

}

resources

}

gemini-code-assist · 2026-05-26T13:18:35Z

+				let used_resources = self.used_resources(false);
+
+				let embedded_resource_hashes = Vec::from_iter(used_resources.iter().filter_map(|id| match self.resources.registry.info(id) {
+					Some(ResourceInfo { hash: Some(hash), sources, .. }) if sources.contains(&DataSource::Embedded) => Some(hash),
+					_ => None,
+				}))
+				.into_boxed_slice();
+
 				let mut document = self.clone();
+
+				document.resources.registry.garbage_collect(used_resources.as_ref());


If used_resources is updated to return HashSet<ResourceId>, we should update this call to pass &used_resources instead of used_resources.as_ref() to match the updated garbage_collect signature.

Suggested change

let used_resources = self.used_resources(false);

let embedded_resource_hashes = Vec::from_iter(used_resources.iter().filter_map(|id| match self.resources.registry.info(id) {

Some(ResourceInfo { hash: Some(hash), sources, .. }) if sources.contains(&DataSource::Embedded) => Some(hash),

_ => None,

}))

.into_boxed_slice();

let mut document = self.clone();

document.resources.registry.garbage_collect(used_resources.as_ref());

let used_resources = self.used_resources(false);

let embedded_resource_hashes = Vec::from_iter(used_resources.iter().filter_map(|id| match self.resources.registry.info(id) {

Some(ResourceInfo { hash: Some(hash), sources, .. }) if sources.contains(&DataSource::Embedded) => Some(hash),

_ => None,

}))

.into_boxed_slice();

let mut document = self.clone();

document.resources.registry.garbage_collect(&used_resources);

gemini-code-assist · 2026-05-26T13:18:35Z

+	pub fn garbage_collect(&mut self, used: &[ResourceId]) {
+		let used_set: std::collections::HashSet<ResourceId> = used.iter().copied().collect();
+		self.sources.retain(|id, _| used_set.contains(id));
+		self.hashes.retain(|id, _| used_set.contains(id));
+	}


Accepting &HashSet<ResourceId> directly in garbage_collect avoids the need to re-collect a slice into a HashSet on every call, eliminating redundant allocations and hashing operations.

pub fn garbage_collect(&mut self, used: &std::collections::HashSet<ResourceId>) { self.sources.retain(|id, _| used.contains(id)); self.hashes.retain(|id, _| used.contains(id)); }

timon-schelling added 4 commits May 25, 2026 21:39

Works TM

f79999f

Use Doc

6b78e32

use error for preprocessor ResourceId replacement failure

bc404df

Keep hashes in DocumentInfo

2706e4e

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

github-actions Bot deployed to graphite-dev (Preview) May 26, 2026 13:24 View deployment

Migrate with less mem copy

9f78c6f

github-actions Bot deployed to graphite-dev (Preview) May 26, 2026 15:01 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use ResourceId to refer to Resources#4176

Use ResourceId to refer to Resources#4176
timon-schelling wants to merge 5 commits into
masterfrom
resource-id-migration

timon-schelling commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

timon-schelling commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant