summaryrefslogtreecommitdiff
path: root/docs/caja-io.txt
diff options
context:
space:
mode:
Diffstat (limited to 'docs/caja-io.txt')
-rw-r--r--docs/caja-io.txt255
1 files changed, 255 insertions, 0 deletions
diff --git a/docs/caja-io.txt b/docs/caja-io.txt
new file mode 100644
index 00000000..2219d308
--- /dev/null
+++ b/docs/caja-io.txt
@@ -0,0 +1,255 @@
+Caja I/O Primer
+draft ("Better Than Nothing")
+2001-08-23
+Darin Adler <[email protected]>
+
+The Caja shell, and the file manager inside it, does a lot of
+I/O. Because of this, there are some special disciplines required when
+writing Caja code.
+
+No I/O on the main thread
+
+To be able to respond to the user quickly, Caja needs to be
+designed so that the main user input thread does not block. The basic
+approach is to never do any disk I/O on the main thread.
+
+In practice, Caja code does assume that some disk I/O is fast, in
+some cases intentionally and in other cases due to programmer
+sloppiness. The typical assumption is that reading files from the
+user's home directory and the installed files in the Caja datadir
+are very fast, effectively instantaneous.
+
+So the general approach is to allow I/O for files that have file
+system paths, assuming that the access to these files is fast, and to
+prohibit I/O for files that have arbitrary URIs, assuming that access
+to these could be arbitrarily slow. Although this works pretty well,
+it is based on an incorrect assumption, because with NFS and other
+kinds of abstract file systems, there can be arbitrarily slow parts of
+the file system that have file system paths.
+
+For historical reasons, threading in Caja is done through the
+mate-vfs asynchronous I/O abstraction rather than using threads
+directly. This means that all the threads are created by mate-vfs,
+and Caja code runs on the main thread only. Thus, the rule of
+thumb is that synchronous mate-vfs operations like the ones in
+<libmatevfs/mate-vfs-ops.h> are illegal in most Caja
+code. Similarly, it's illegal to ask for a piece of information, say a
+file size, and then wait until it arrives. The program's main thread
+must be allowed to get back to the main loop and start asking for user
+input again.
+
+How CajaFile is used to do this
+
+The CajaFile class presents an API for scheduling this
+asynchronous I/O and dealing with the uncertainty of when the
+information will be available. (It also does a few other things, but
+that's the main service it provides.) When you want information about
+a particular file or directory, you get the CajaFile object for
+that item using caja_file_get. This operation, like most
+CajaFile operations, is not allowed to do any disk I/O. Once you
+have a CajaFile object, you can ask it questions like "What is
+your file type?" by calling functions like
+caja_file_get_file_type. However, for a newly created CajaFile
+object the answer is almost certainly "I don't know." Each function
+defines a default, which is the answer given for "I don't know." For
+example, caja_file_get_type will return
+MATE_VFS_FILE_TYPE_UNKNOWN if it doesn't yet know the type.
+
+It's worth taking a side trip to discuss the nature of the
+CajaFile API. Since these classes are a private part of the
+Caja implementation, we make no effort to have the API be
+"complete" in an abstract sense. Instead we add operations as
+necessary and give them the semantics that are most handy for our
+purposes. For example, we could have a caja_file_get_size that
+returns a special distinguishable value to mean "I don't know" or a
+separate boolean instead of returning 0 for files where the size is
+unknown. This is entirely motivated by pragmatic concerns. The intent
+is that we tweak these calls as needed if the semantics aren't good
+enough.
+
+Back to the newly created CajaFile object. If you actually need to
+get the type, you need to arrange for that information to be fetched
+from the file system. There are two ways to make this request. If you
+are planning to display the type on an ongoing basis then you want to
+tell the CajaFile that you'll be monitoring the file's type and want to
+know about changes to it. If you just need one-time information about
+the type then you'll want to be informed when the type is
+discovered. The calls used for this are caja_file_monitor_add and
+caja_file_call_when_ready respectively. Both of these calls take a
+list of information needed about a file. If all you need is the file
+type, for example, you would pass a list containing just
+CAJA_FILE_ATTRIBUTE_FILE_TYPE (the attributes are defined in
+caja-file-attributes.h). Not every call has a corresponding file
+attribute type. We add new ones as needed.
+
+If you do a caja_file_monitor_add, you also typically connect to
+the CajaFile object's changed signal. Each time any monitored
+attribute changes, a changed signal is emitted. The caller typically
+caches the value of the attribute that was last seen (for example,
+what's displayed on screen) and does a quick check to see if the
+attribute it cares about has changed. If you do a
+caja_file_call_when_ready, you don't typically need to connect to
+the changed signal, because your callback function will be called when
+and if the requested information is ready.
+
+Both a monitor and a callback can be cancelled. For ease of
+use, neither requires that you store an ID for
+canceling. Instead, the monitor function uses an arbitrary client
+pointer, which can be any kind of pointer that's known to not conflict
+with other monitorers. Usually, this is a pointer to the monitoring
+object, but it can also be, for example, a pointer to a global
+variable. The call_when_ready function uses the callback function and callback
+data to identify the particular callback to cancel. One advantage of the monitor
+API is that it also lets the CajaFile framework know that the file
+should be monitored for changes made outside Caja. This is how we
+know when to ask FAM to monitor a file or directory for us.
+
+Lets review a few of the concepts:
+
+1) Nearly all CajaFile operations, like caja_file_get_type,
+ are not allowed to do any disk I/O.
+2) To cause the actual I/O to be done, callers need to use set up
+ either a monitor or a callback.
+3) The actual I/O is done by asynchronous mate-vfs calls, so the work
+ is done on another thread.
+
+To work with an entire directory of files at once, you use
+a CajaDirectory object. With the CajaDirectory object you can
+monitor a whole set of CajaFile objects at once, and you can
+connect to a single "files_changed" signal that gets emitted whenever
+files within the directory are modified. That way you don't have to
+connect separately to each file you want to monitor. These calls are
+also the mechanism for finding out which files are in a directory. In
+most other respects, they are like the CajaFile calls.
+
+Caching, the good and the bad
+
+Another feature of the CajaFile class is the caching. If you keep
+around a CajaFile object, it keeps around information about the
+last known state of that file. Thus, if you call
+caja_file_get_type, you might well get file type of the file found
+at this location the last time you looked, rather than the information
+about what the file type is now, or "unknown". There are some problems
+with this, though.
+
+The first problem is that if wrong information is cached, you need
+some way to "goose" the CajaFile object and get it to grab new
+information. This is trickier than it might sound, because we don't
+want to constantly distrust information we received just moments
+before. To handle this, we have the
+caja_file_invalidate_attributes and
+caja_file_invalidate_all_attributes calls, as well as the
+caja_directory_force_reload call. If some code in Caja makes a
+change to a file that's known to affect the cached information, it can
+call one of these to inform the CajaFile framework. Changes that
+are made through the framework itself are automatically understood, so
+usually these calls aren't necessary.
+
+The second problem is that it's hard to predict when information will
+and won't be cached. The current rule that's implemented is that no
+information is cached if no one retains a reference to the
+CajaFile object. This means that someone else holding a
+CajaFile object can subtly affect the semantics of whether you
+have new data or not. Calling caja_file_call_when_ready or
+caja_file_monitor_add will not invalidate the cache, but rather
+will return you the already cached information.
+
+These problems are less pronounced when FAM is in use. With FAM, any
+monitored file is highly likely to have accurate information, because
+changes to the file will be noticed by FAM, and that in turn will
+trigger new I/O to determine what the new status of the file is.
+
+Operations that change the file
+
+You'll note that up until this point, I've only discussed getting
+information about the file, not making changes to it. CajaFile
+also contains some APIs for making changes. There are two kinds of
+these.
+
+The calls that change metadata are examples of the first kind. These
+calls make changes to the internal state right away and schedule I/O
+to write the changes out to the file system. There's no way to detect
+if the I/O succeeds or fails, and as far as the client code is
+concerned the change takes place right away.
+
+The calls that make other kinds of file system change are examples of
+of the second kind. These calls take a
+CajaFileOperationCallback. They are all cancellable, and they give
+a callback when the operation completes, whether it succeeds or fails.
+
+Files that move
+
+When a file is moved, and the CajaFile framework knows it, then
+the CajaFile and CajaDirectory objects follow the file rather
+than staying stuck to the path. This has a direct influence on the
+user interface of Caja -- if you move a directory, already-open
+windows and property windows will follow the directory around.
+
+This means that keeping around a CajaFile object and keeping
+around a URI for a file have different semantics, and there are
+cases where one is the better choice and cases where the other is.
+
+Icons
+
+The current implementation of the Caja icon factory uses
+synchronous I/O to get the icons and ignores these guidelines. The
+only reason this doesn't ruin the Caja user experience is that it
+also refuses to even try to fetch icons from URIs that don't
+correspond to file system paths, which for most cases means it limits
+itself to reading from the high-speed local disk. Don't ask me what
+the repercussions of this are for NFS; do the research and tell me
+instead!
+
+Slowness caused by asynchronous operations
+
+One danger in all this asynchronous I/O is that you might end up doing
+repeated drawing and updating. If you go to display a file right
+after asking for information about it, you might immediately show an
+"unknown file type" icon. Then, milliseconds later, you may complete
+the I/O and discover more information about the file, including the
+appropriate icon. So you end up drawing the icon twice. There are a
+number of strategies for preventing this problem. One of them is to
+allow a bit of hysteresis and wait some fixed amount of time after
+requesting the I/O before displaying the "unknown" state. One
+strategy that's used in Caja is to wait until some basic
+information is available until displaying anything. This might make
+the program overall be faster, but it might make it seem slower,
+because you don't see things right away. [What other strategies
+are used in Caja now for this?]
+
+How to make Caja slow
+
+If you add I/O to the functions in CajaFile that are used simply
+to fetch cached file information, you can make Caja incredibly I/O
+intensive. On the other hand, the CajaFile API does not provide a
+way to do arbitrary file reads, for example. So it can be tricky to
+add features to Caja, since you first have to educate CajaFile
+about how to do the I/O asynchronously and cache it, then request the
+information and have some way to deal with the time when it's not yet
+known.
+
+Adding new kinds of I/O usually involves working on the Caja I/O
+state machine in caja-directory-async.c. If we changed Caja to
+use threading instead of using mate-vfs asychronous operations, I'm
+pretty sure that most of the changes would be here in this
+file. That's because the external API used for CajaFile wouldn't
+really have a reason to change. In either case, you'd want to schedule
+work to be done, and get called back when the work is complete.
+
+[We probably need more about caja-directory-async.c here.]
+
+Future direction
+
+Some have suggested that by using threading directly in Caja
+rather than using it indirectly through the mate-vfs async. calls,
+we could simplify the I/O code in Caja. It's possible this would
+make a big improvement, but it's also possible that this would primarily
+affect the internals and implementation details of CajaFile and
+still leave the rest of the Caja code the same.
+
+That's all for now
+
+This is a very rough early draft of this document. Let me know about
+other topics that would be useful to be covered in here.
+
+ -- Darin