In comp.sys.mac.programmer.misc glenn andreas <gandreas@[EMAIL PROTECTED]
> wrote:
> In article <1360pjueho17k0d@[EMAIL PROTECTED]
>,
> SM Ryan <wyrmwif@[EMAIL PROTECTED]
> wrote:
>
>> Don Bruder <dakidd@[EMAIL PROTECTED]
> wrote:
>> # (Call me a throwback, or whatever, but I *HATE* the hassle of trying
to
>> # deal with CFString stuff - particularly since I have almost zero
>> # expectation that this code will ever want to run on a machine that
>> # speaks anything other than plain old ASCII English)
>>
>> If it's any comfort, the kernel only sees a byte string with a zero
>> byte terminator and '/' as a directory separator.
>
> That's not strictly true. Take a look at vfs_lookup.c:
>
>
<http://www.opensource.apple.com/darwinsource/10.4.9.ppc/xnu-792.17.14/bs
> d/vfs/vfs_lookup.c>
>
> It does all sorts of things, such as handling ".." (and other odd
> treatment for specially placed '.'), as well as letting the specific
> file system determine what in a path name is part of a component. For a
> mac file system, this includes case insensitivity, which means that the
> file system need to understand how the file name is encoded (in order to
> compare characters) - and this would be UTF8.
It goes well beyond that, actually.
In UTF-8 there are certain cl***** of characters which have multiple
equivalent representations. These are precomposed characters, which tend
to be letters with accent marks, which can also be formed by using two
separate characters with the plain letter as the first one and a combining
accent mark as the second one. The Mac OS X kernel treats all such
equivalent sequences as being truly equivalent, so that using a decomposed
path will reach a file that was saved with a precomposed name, and vice
versa. This requires a fairly deep understanding of Unicode in the kernel
and, indeed, this is complex enough that there were still some bugs in
this handling up through, I believe, 10.2.
It should also be noted that case insensitivity is also ridiculously hard
to do. For example, it's possible for the uppercase->lowercase
transformation to produce more characters than you started with. The
German uppercase letter that looks like a "beta" turns into "ss" when
lowercased, and there may be other examples as well.
The kernel definitely doesn't look at your path as a bag of bytes. It is
fully Unicode aware, and your app should be too.
I also want to note that it is most definitely not a case where paths are
the OS's native language and FSRefs are some alternate interface
implemented in libraries. FSRefs actually enjoy sup****t deep in the
kernel; look up volfs if you want details. In many cases, accessing a
filesystem object through FSRefs will be significantly faster, because the
kernel is able to go straight from an FSRef to a file's directory entry
without passing through any of the parent directories first.
(Incidentally, some consider this to be a security hole, because the
kernel will let you access a file you have appropriate permissions for in
this way even if its only link on disk is in a directory which you do not
have permissions to access.) Whereas if you use a path to access the file
then the kernel has to break it into pieces, run all these fun Unicode
transformations on it, then do a relatively expensive directory lookup for
every single path component.
The main downside to FSRefs is that the majority of OS X APIs do not
accept them, and transforming them back into a path kills all of that
speed advantage and then some.
It would be great if there were some API that would give you the short
volfs path for an FSRef if it were available, but alas, there isn't one.
--
Michael Ash
Rogue Amoeba Software
?


|