[PATCH 0/6] Extended file stat system call

classic Classic list List threaded Threaded
85 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/6] Extended file stat system call

David Howells

Implement a pair of new system calls to provide extended and further extensible
stat functions.

The second of the associated patches is the main patch that provides these new
system calls:

        ssize_t ret = xstat(int dfd,
                            const char *filename,
                            unsigned atflag,
                            unsigned mask,
                            struct xstat *buffer);

        ssize_t ret = fxstat(int fd,
                             unsigned atflag,
                             unsigned mask,
                             struct xstat *buffer);

which are more fully documented in the first patch's description.

These new stat functions provide a number of useful features, in summary:

 (1) More information: creation time, inode generation number, data version
     number, flags/attributes.  A subset of these is available through a number
     of filesystems (such as CIFS, NFS, AFS, Ext4 and BTRFS).

 (2) Lightweight stat: Ask for just those details of interest, and allow a
     netfs (such as NFS) to approximate anything not of interest, possibly
     without going to the server.

 (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its
     cached attributes are up to date.

 (4) Allow the filesystem to indicate what it can/cannot provide: A filesystem
     can now say it doesn't support a standard stat feature if that isn't
     available.

 (5) Make the fields a consistent size on all arches, and make them large.

 (6) Can be extended by using more request flags and appending further data
     after the end of the standard return data.

Note that no lstat() equivalent is required as that can be implemented through
xstat() with atflag == 0.


=======
PATCHES
=======

Patch 1 defines the xstat() and fxstat() system calls.

Patches 2-6 implement extended stat facilities for Ext4, AFS, NFS and CIFS, and
make eCryptFS go to the lower filesystem for such details.


==============
CONSIDERATIONS
==============

Should fxstat() be implemented as xstat() with a NULL filename, using dfd as
fd?

Should the default for a network fs be to do an unconditional (heavyweight)
stat with a flag to suppress going to the server to update the locally held
attributes and flushing pending writebacks?

Should things like the Windows Archive, Hidden and System bits be handled
through IOC flags, perhaps expanded to 64-bits?

Are these things useful to userspace other than Samba and userspace NFS
servers?

Is it useful to pass the volume ID out?  Or is statfs() sufficient for this?

Should I add a sixth argument to xstat(), mark it reserved and require that
must be supplied as 0 to hedge against future use?

Is there anything else I can usefully add at the moment?


==========
TO BE DONE
==========

Autofs, ntfs, btrfs, ...

I should perhaps use u8/u32/u64 rather than uint8/32/64_t.

Handle remote filesystems being offline and indicate this with
XSTAT_INFO_OFFLINE.


=======
TESTING
=======

There's a test program attached to the description for the main patch.  It can
be run as follows:
[root@andromeda tmp]# ./xstat -R /mnt/foo

        xstat(/mnt/foo) = 0
        0000: 000081a40000ffef 0000000000000001 0000020000000000 0000100000080000
        0020: 0000000000000000 0000000600000008 000000004f88499a 0000000136fd9208
        0040: 000000004f88499a 0000000136fd9208 000000004f8849b9 0000000106daf187
        0060: 000000004f8849b9 0000000106daf187 000000000000000c 000000000000000f
        0080: 0000000000000008 00000000484ebbef 0000000000000025 5949ebd4711efd82
        00a0: d3250b5c15d5e380 0000000000000000 0000000000000000 0000000000000000
        00c0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        00e0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        results=ffef
          Size: 15              Blocks: 8          IO Block: 4096    regular file
        Device: 08:06           Inode: 12          Links: 1    
        Access: (0644/-rw-r--r--)  Uid: 0  
        Gid: 0
        Access: 2012-04-13 16:43:22.922587656+0100
        Modify: 2012-04-13 16:43:53.115011975+0100
        Change: 2012-04-13 16:43:53.115011975+0100
        Create: 2012-04-13 16:43:22.922587656+0100
        Inode version: 484ebbefh
        Data version: 25h
        Inode flags: 00080000 (-------- ----e--- -------- --------)
        Information: 00000200 (-------- -------- ------a- --------)
        Volume ID: 82fd1e71d4eb4959-80e3d5155c0b25d3

David
---
David Howells (6):
      xstat: eCryptFS: Return extended attributes
      xstat: CIFS: Return extended attributes
      xstat: NFS: Return extended attributes
      xstat: AFS: Return extended attributes
      xstat: Ext4: Return extended attributes
      xstat: Add a pair of system calls to make extended file stats available


 arch/x86/syscalls/syscall_32.tbl |    2
 arch/x86/syscalls/syscall_64.tbl |    2
 fs/afs/inode.c                   |   29 ++-
 fs/afs/super.c                   |    7 +
 fs/cifs/cifsfs.h                 |    4
 fs/cifs/cifsglob.h               |   16 +-
 fs/cifs/dir.c                    |    2
 fs/cifs/inode.c                  |  120 +++++++++++--
 fs/ecryptfs/inode.c              |   14 +-
 fs/ext4/ext4.h                   |    2
 fs/ext4/file.c                   |    2
 fs/ext4/inode.c                  |   32 +++
 fs/ext4/namei.c                  |    2
 fs/ext4/super.c                  |    1
 fs/ext4/symlink.c                |    2
 fs/nfs/inode.c                   |   49 ++++-
 fs/nfs/super.c                   |    1
 fs/stat.c                        |  350 +++++++++++++++++++++++++++++++++++---
 include/linux/fcntl.h            |    1
 include/linux/fs.h               |    4
 include/linux/stat.h             |  126 +++++++++++++-
 include/linux/syscalls.h         |    7 +
 22 files changed, 694 insertions(+), 81 deletions(-)

Reply | Threaded
Open this post in threaded view
|

[PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available

David Howells
Add a pair of system calls to make extended file stats available, including
file creation time, inode version and data version where available through the
underlying filesystem.

The idea was initially proposed as a set of xattrs that could be retrieved with
getxattr(), but the general preferance proved to be for new syscalls with an
extended stat structure.

This has a number of uses:

 (1) Creation time: The SMB protocol carries the creation time, which could be
     exported by Samba, which will in turn help CIFS make use of FS-Cache as
     that can be used for coherency data.

     This is also specified in NFSv4 as a recommended attribute and could be
     exported by NFSD [Steve French].

 (2) Lightweight stat: Ask for just those details of interest, and allow a
     netfs (such as NFS) to approximate anything not of interest, possibly
     without going to the server [Trond Myklebust, Ulrich Drepper].

 (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its
     cached attributes are up to date [Trond Myklebust].

 (4) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd
     Schubert].

 (5) Data version number: Could be used by userspace NFS servers [Aneesh Kumar].

     Can also be used to modify fill_post_wcc() in NFSD which retrieves
     i_version directly, but has just called vfs_getattr().  It could get it
     from the kstat struct if it used vfs_xgetattr() instead.

 (6) BSD stat compatibility: Including more fields from the BSD stat such as
     creation time (st_btime) and inode generation number (st_gen) [Jeremy
     Allison, Bernd Schubert].

 (7) Extra coherency data may be useful in making backups [Andreas Dilger].

 (8) Allow the filesystem to indicate what it can/cannot provide: A filesystem
     can now say it doesn't support a standard stat feature if that isn't
     available, so if, for instance, inode numbers or UIDs don't exist...

 (9) Make the fields a consistent size on all arches and make them large.

(10) Store a 16-byte volume ID in the superblock that can be returned in struct
     xstat [Steve French].

(11) Include granularity fields in the time data to indicate the granularity of
     each of the times (NFSv4 time_delta) [Steve French].

(12) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.

(13) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
     Michael Kerrisk].

(14) Spare space, request flags and information flags are provided for future
     expansion.


The following structures are defined for the use of these new system calls:

        struct xstat_dev {
                uint32_t major, minor;
        };

        struct xstat_time {
                uint64_t tv_sec;
                uint32_t tv_nsec;
                uint32_t tv_granularity;
        };

        struct xstat {
                uint32_t st_mask;
                uint32_t st_mode;
                uint32_t st_nlink;
                uint32_t st_uid;
                uint32_t st_gid;
                uint32_t st_information;
                uint32_t st_ioc_flags;
                uint32_t st_blksize;
                struct xstat_dev st_rdev;
                struct xstat_dev st_dev;
                struct xstat_time st_atime;
                struct xstat_time st_btime;
                struct xstat_time st_ctime;
                struct xstat_time st_mtime;
                uint64_t st_ino;
                uint64_t st_size;
                uint64_t st_blocks;
                uint64_t st_gen;
                uint64_t st_version;
                uint8_t st_volume_id[16];
                uint64_t __spares[11];
        };

where st_information is local system information about the file, st_btime is
the file creation time, st_gen is the inode generation (i_generation),
st_data_version is the data version number (i_version), st_ioc_flags is the
flags from FS_IOC_GETFLAGS, st_volume_id is where the volume identified is
stored, st_result_mask is a bitmask indicating the data provided and __spares[]
are where as-yet undefined fields can be placed.

The defined bits in request_mask and st_mask are:

        XSTAT_MODE Want/got st_mode
        XSTAT_NLINK Want/got st_nlink
        XSTAT_UID Want/got st_uid
        XSTAT_GID Want/got st_gid
        XSTAT_RDEV Want/got st_rdev
        XSTAT_ATIME Want/got st_atime
        XSTAT_MTIME Want/got st_mtime
        XSTAT_CTIME Want/got st_ctime
        XSTAT_INO Want/got st_ino
        XSTAT_SIZE Want/got st_size
        XSTAT_BLOCKS Want/got st_blocks
        XSTAT_BASIC_STATS [The stuff in the normal stat struct]
        XSTAT_IOC_FLAGS Want/got FS_IOC_GETFLAGS
        XSTAT_BTIME Want/got st_btime
        XSTAT_GEN Want/got st_gen
        XSTAT_VERSION Want/got st_data_version
        XSTAT_VOLUME_ID Want/got st_volume_id
        XSTAT_ALL_STATS [All currently available stuff]

The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra flags
that might be supplied by the filesystem.  Note that Ext4 returns flags outside
of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS.  Should
{EXT4,FS}_FL_USER_VISIBLE be extended to cover them?  Or should the extra flags
be suppressed?

The defined bits in the st_information field give local system data on a file,
how it is accessed, where it is and what it does:

        XSTAT_INFO_ENCRYPTED File is encrypted
        XSTAT_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted)
        XSTAT_INFO_FABRICATED File was made up by filesystem
        XSTAT_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs)
        XSTAT_INFO_REMOTE File is remote
        XSTAT_INFO_OFFLINE File is offline (CIFS)
        XSTAT_INFO_AUTOMOUNT Dir is automount trigger
        XSTAT_INFO_AUTODIR Dir provides unlisted automounts
        XSTAT_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details
        XSTAT_INFO_HAS_ACL File has an ACL of some sort
        XSTAT_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS)
        XSTAT_INFO_HIDDEN File is marked hidden (DOS+)
        XSTAT_INFO_SYSTEM File is marked system (DOS+)
        XSTAT_INFO_ARCHIVE File is marked archive (DOS+)

These are for the use of GUI tools that might want to mark files specially,
depending on what they are.  I've tried not to provide overlap with
st_ioc_flags where something usable exists there.  Should Hidden, System and
Archive flags be associated with ioc_flags, perhaps with ioc_flags extended to
64-bits?


The system calls are:

        ssize_t ret = xstat(int dfd,
                            const char *filename,
                            unsigned int flags,
                            unsigned int mask,
                            struct xstat *buffer);

        ssize_t ret = fxstat(unsigned fd,
                             unsigned int flags,
                             unsigned int mask,
                             struct xstat *buffer);


The dfd, filename, flags and fd parameters indicate the file to query.  There
is no equivalent of lstat() as that can be emulated with xstat() by passing
AT_SYMLINK_NOFOLLOW in flags.

AT_FORCE_ATTR_SYNC can also be set in flags.  This will require a network
filesystem to synchronise its attributes with the server.

mask is a bitmask indicating the fields in struct xstat that are of interest to
the caller.  The user should set this to XSTAT__BASIC_STATS to get the
basic set returned by stat().

Should there just be one xstat() syscall that does fxstat() if filename is NULL?

The fields in struct xstat come in a number of classes:

 (0) st_dev, st_blksize, st_information.

     These are local data and are always available.

 (1) st_mode, st_nlinks, st_uid, st_gid, st_[amc]time, st_ino, st_size,
     st_blocks.

     These will be returned whether the caller asks for them or not.  The
     corresponding bits in result_mask will be set to indicate their presence.

     If the caller didn't ask for them, then they may be approximated.  For
     example, NFS won't waste any time updating them from the server, unless as
     a byproduct of updating something requested.

     If the values don't actually exist for the underlying object (such as UID
     or GID on a DOS file), then the bit won't be set in the result_mask, even
     if the caller asked for the value and the returned value will be a
     fabrication.

 (2) st_rdev.

     As for class (1), but this won't be returned if the file is not a blockdev
     or chardev.  The bit will be cleared if the value is not returned.

 (3) File creation time (st_btime), inode generation (st_gen), data version
     (st_version), volume_id (st_volume_id) and inode flags (st_ioc_flags).

     These will be returned if available whether the caller asked for them or
     not.  The corresponding bits in result_mask will be set or cleared as
     appropriate to indicate their presence.

     If the caller didn't ask for them, then they may be approximated.  For
     example, NFS won't waste any time updating them from the server, unless
     as a byproduct of updating something requested.

At the moment, this will only work on x86_64 and i386 as it requires system
calls to be wired up.


=======
TESTING
=======

The following test program can be used to test the xstat system call:

        /* Test the xstat() system call
         *
         * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved.
         * Written by David Howells ([hidden email])
         *
         * This program is free software; you can redistribute it and/or
         * modify it under the terms of the GNU General Public Licence
         * as published by the Free Software Foundation; either version
         * 2 of the Licence, or (at your option) any later version.
         */

        #define _GNU_SOURCE
        #define _ATFILE_SOURCE
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        #include <unistd.h>
        #include <fcntl.h>
        #include <time.h>
        #include <sys/syscall.h>
        #include <sys/stat.h>
        #include <sys/types.h>

        #define AT_NO_AUTOMOUNT 0x800
        #define AT_FORCE_ATTR_SYNC 0x2000

        #define XSTAT_MODE 0x00000001U
        #define XSTAT_NLINK 0x00000002U
        #define XSTAT_UID 0x00000004U
        #define XSTAT_GID 0x00000008U
        #define XSTAT_RDEV 0x00000010U
        #define XSTAT_ATIME 0x00000020U
        #define XSTAT_MTIME 0x00000040U
        #define XSTAT_CTIME 0x00000080U
        #define XSTAT_INO 0x00000100U
        #define XSTAT_SIZE 0x00000200U
        #define XSTAT_BLOCKS 0x00000400U
        #define XSTAT_BASIC_STATS 0x000007ffU
        #define XSTAT_BTIME 0x00000800U
        #define XSTAT_GEN 0x00001000U
        #define XSTAT_VERSION 0x00002000U
        #define XSTAT_IOC_FLAGS 0x00004000U
        #define XSTAT_VOLUME_ID 0x00008000U
        #define XSTAT_ALL_STATS 0x0000ffffU

        struct xstat_dev {
                uint32_t major;
                uint32_t minor;
        };

        struct xstat_time {
                uint64_t tv_sec;
                uint32_t tv_nsec;
                uint32_t tv_granularity;
        };

        struct xstat {
                uint32_t st_mask;
                uint32_t st_mode;
                uint32_t st_nlink;
                uint32_t st_uid;
                uint32_t st_gid;
                uint32_t st_information;
                uint32_t st_ioc_flags;
                uint32_t st_blksize;
                struct xstat_dev st_rdev;
                struct xstat_dev st_dev;
                struct xstat_time st_atim;
                struct xstat_time st_btim;
                struct xstat_time st_ctim;
                struct xstat_time st_mtim;
                uint64_t st_ino;
                uint64_t st_size;
                uint64_t st_blksize;
                uint64_t st_blocks;
                uint64_t st_gen;
                uint64_t st_version;
                uint64_t st_volume_id[16];
                uint64_t st_spares[11];
        };

        #define XSTAT_INFO_ENCRYPTED 0x00000001U
        #define XSTAT_INFO_TEMPORARY 0x00000002U
        #define XSTAT_INFO_FABRICATED 0x00000004U
        #define XSTAT_INFO_KERNEL_API 0x00000008U
        #define XSTAT_INFO_REMOTE 0x00000010U
        #define XSTAT_INFO_OFFLINE 0x00000020U
        #define XSTAT_INFO_AUTOMOUNT 0x00000040U
        #define XSTAT_INFO_AUTODIR 0x00000080U
        #define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U
        #define XSTAT_INFO_HAS_ACL 0x00000200U
        #define XSTAT_INFO_REPARSE_POINT 0x00000400U
        #define XSTAT_INFO_HIDDEN 0x00000800U
        #define XSTAT_INFO_SYSTEM 0x00001000U
        #define XSTAT_INFO_ARCHIVE 0x00002000U

        #define __NR_xstat 312
        #define __NR_fxstat 313

        static __attribute__((unused))
        ssize_t xstat(int dfd, const char *filename, unsigned flags,
                      unsigned int mask, struct xstat *buffer)
        {
                return syscall(__NR_xstat, dfd, filename, flags, mask, buffer);
        }

        static __attribute__((unused))
        ssize_t fxstat(int fd, unsigned flags,
                       unsigned int mask, struct xstat *buffer)
        {
                return syscall(__NR_fxstat, fd, flags, mask, buffer);
        }

        static void print_time(const char *field, const struct xstat_time *xstm)
        {
                struct tm tm;
                time_t tim;
                char buffer[100];
                int len;

                tim = xstm->tv_sec;
                if (!localtime_r(&tim, &tm)) {
                        perror("localtime_r");
                        exit(1);
                }
                len = strftime(buffer, 100, "%F %T", &tm);
                if (len == 0) {
                        perror("strftime");
                        exit(1);
                }
                printf("%s", field);
                fwrite(buffer, 1, len, stdout);
                printf(".%09u", xstm->tv_nsec);
                len = strftime(buffer, 100, "%z", &tm);
                if (len == 0) {
                        perror("strftime2");
                        exit(1);
                }
                fwrite(buffer, 1, len, stdout);
                printf("\n");
        }

        static void dump_xstat(struct xstat *xst)
        {
                char buffer[256], ft;

                printf("results=%x\n", xst->st_mask);

                printf(" ");
                if (xst->st_mask & XSTAT_SIZE)
                        printf(" Size: %-15llu", (unsigned long long) xst->st_size);
                if (xst->st_mask & XSTAT_BLOCKS)
                        printf(" Blocks: %-10llu", (unsigned long long) xst->st_blocks);
                printf(" IO Block: %-6llu ", (unsigned long long) xst->st_blksize);
                if (xst->st_mask & XSTAT_MODE) {
                        switch (xst->st_mode & S_IFMT) {
                        case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break;
                        case S_IFCHR: printf(" character special file\n"); ft = 'c'; break;
                        case S_IFDIR: printf(" directory\n"); ft = 'd'; break;
                        case S_IFBLK: printf(" block special file\n"); ft = 'b'; break;
                        case S_IFREG: printf(" regular file\n"); ft = '-'; break;
                        case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break;
                        case S_IFSOCK: printf(" socket\n"); ft = 's'; break;
                        default:
                                printf("unknown type (%o)\n", xst->st_mode & S_IFMT);
                                ft = '?';
                                break;
                        }
                }

                sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor);
                printf("Device: %-15s", buffer);
                if (xst->st_mask & XSTAT_INO)
                        printf(" Inode: %-11llu", (unsigned long long) xst->st_ino);
                if (xst->st_mask & XSTAT_SIZE)
                        printf(" Links: %-5u", xst->st_nlink);
                if (xst->st_mask & XSTAT_RDEV)
                        printf(" Device type: %u,%u",
                               xst->st_rdev.major, xst->st_rdev.minor);
                printf("\n");

                if (xst->st_mask & XSTAT_MODE)
                        printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c)  ",
                               xst->st_mode & 07777,
                               ft,
                               xst->st_mode & S_IRUSR ? 'r' : '-',
                               xst->st_mode & S_IWUSR ? 'w' : '-',
                               xst->st_mode & S_IXUSR ? 'x' : '-',
                               xst->st_mode & S_IRGRP ? 'r' : '-',
                               xst->st_mode & S_IWGRP ? 'w' : '-',
                               xst->st_mode & S_IXGRP ? 'x' : '-',
                               xst->st_mode & S_IROTH ? 'r' : '-',
                               xst->st_mode & S_IWOTH ? 'w' : '-',
                               xst->st_mode & S_IXOTH ? 'x' : '-');
                if (xst->st_mask & XSTAT_UID)
                        printf("Uid: %d   \n", xst->st_uid);
                if (xst->st_mask & XSTAT_GID)
                        printf("Gid: %u\n", xst->st_gid);

                if (xst->st_mask & XSTAT_ATIME)
                        print_time("Access: ", &xst->st_atim);
                if (xst->st_mask & XSTAT_MTIME)
                        print_time("Modify: ", &xst->st_mtim);
                if (xst->st_mask & XSTAT_CTIME)
                        print_time("Change: ", &xst->st_ctim);
                if (xst->st_mask & XSTAT_BTIME)
                        print_time("Create: ", &xst->st_btim);

                if (xst->st_mask & XSTAT_GEN)
                        printf("Inode version: %llxh\n", (unsigned long long) xst->st_gen);
                if (xst->st_mask & XSTAT_VERSION)
                        printf("Data version: %llxh\n", (unsigned long long) xst->st_version);

                if (xst->st_mask & XSTAT_IOC_FLAGS) {
                        unsigned char bits;
                        int loop, byte;

                        static char flag_representation[32 + 1] =
                                /* FS_IOC_GETFLAGS flags: */
                                "????????" /* 31-24 0x00000000-ff000000  */
                                "????ehTD" /* 23-16 0x00000000-00ff0000  */
                                "tj?IE?XZ" /* 15- 8 0x00000000-0000ff00  */
                                "AdaiScus" /*  7- 0 0x00000000-000000ff */
                                ;

                        printf("Inode flags: %08x (", xst->st_ioc_flags);
                        for (byte = 32 - 8; byte >= 0; byte -= 8) {
                                bits = xst->st_ioc_flags >> byte;
                                for (loop = 7; loop >= 0; loop--) {
                                        int bit = byte + loop;

                                        if (bits & 0x80)
                                                putchar(flag_representation[31 - bit]);
                                        else
                                                putchar('-');
                                        bits <<= 1;
                                }
                                if (byte)
                                        putchar(' ');
                        }
                        printf(")\n");
                }

                if (xst->st_information) {
                        unsigned char bits;
                        int loop, byte;

                        static char info_representation[32 + 1] =
                                /* XSTAT_INFO_ flags: */
                                "????????" /* 31-24 0x00000000-ff000000  */
                                "????????" /* 23-16 0x00000000-00ff0000  */
                                "??ASHRan" /* 15- 8 0x00000000-0000ff00  */
                                "dmorkfte" /*  7- 0 0x00000000-000000ff */
                                ;

                        printf("Information: %08x (", xst->st_information);
                        for (byte = 32 - 8; byte >= 0; byte -= 8) {
                                bits = xst->st_information >> byte;
                                for (loop = 7; loop >= 0; loop--) {
                                        int bit = byte + loop;

                                        if (bits & 0x80)
                                                putchar(info_representation[31 - bit]);
                                        else
                                                putchar('-');
                                        bits <<= 1;
                                }
                                if (byte)
                                        putchar(' ');
                        }
                        printf(")\n");
                }

                if (xst->st_mask & XSTAT_VOLUME_ID) {
                        int loop;
                        printf("Volume ID: ");
                        for (loop = 0; loop < sizeof(xst->st_volume_id); loop++) {
                                printf("%02x", xst->st_volume_id[loop]);
                                if (loop == 7)
                                        printf("-");
                        }
                        printf("\n");
                }
        }

        void dump_hex(unsigned long long *data, int from, int to)
        {
                unsigned offset, print_offset = 1, col = 0;

                from /= 8;
                to = (to + 7) / 8;

                for (offset = from; offset < to; offset++) {
                        if (print_offset) {
                                printf("%04x: ", offset * 8);
                                print_offset = 0;
                        }
                        printf("%016llx", data[offset]);
                        col++;
                        if ((col & 3) == 0) {
                                printf("\n");
                                print_offset = 1;
                        } else {
                                printf(" ");
                        }
                }

                if (!print_offset)
                        printf("\n");
        }

        int main(int argc, char **argv)
        {
                struct xstat xst;
                int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;

                unsigned int mask = XSTAT_ALL_STATS;

                for (argv++; *argv; argv++) {
                        if (strcmp(*argv, "-F") == 0) {
                                atflag |= AT_FORCE_ATTR_SYNC;
                                continue;
                        }
                        if (strcmp(*argv, "-L") == 0) {
                                atflag &= ~AT_SYMLINK_NOFOLLOW;
                                continue;
                        }
                        if (strcmp(*argv, "-O") == 0) {
                                mask &= ~XSTAT_BASIC_STATS;
                                continue;
                        }
                        if (strcmp(*argv, "-A") == 0) {
                                atflag |= AT_NO_AUTOMOUNT;
                                continue;
                        }
                        if (strcmp(*argv, "-R") == 0) {
                                raw = 1;
                                continue;
                        }

                        memset(&xst, 0xbf, sizeof(xst));
                        ret = xstat(AT_FDCWD, *argv, atflag, mask, &xst);
                        printf("xstat(%s) = %d\n", *argv, ret);
                        if (ret < 0) {
                                perror(*argv);
                                exit(1);
                        }

                        if (raw)
                                dump_hex((unsigned long long *)&xst, 0, sizeof(xst));

                        dump_xstat(&xst);
                }
                return 0;
        }

Just compile and run, passing it paths to the files you want to examine:

        [root@andromeda ~]# /tmp/xstat /proc/$$
        xstat(/proc/2074) = 160
        results=47ef
          Size: 0               Blocks: 0          IO Block: 1024    directory
        Device: 00:03           Inode: 9072        Links: 7
        Access: (0555/dr-xr-xr-x)  Uid: 0
        Gid: 0
        Access: 2010-07-14 16:50:46.609336272+0100
        Modify: 2010-07-14 16:50:46.609336272+0100
        Change: 2010-07-14 16:50:46.609336272+0100
        Inode flags: 0000000100000000 (-------- -------- -------- -------S -------- -------- -------- --------)
        [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm
        xstat(/afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm) = 160
        results=77ef
          Size: 5413882         Blocks: 0          IO Block: 4096    regular file
        Device: 00:15           Inode: 2288        Links: 1
        Access: (0644/-rw-r--r--)  Uid: 75338
        Gid: 0
        Access: 2008-11-05 19:47:22.000000000+0000
        Modify: 2008-11-05 19:47:22.000000000+0000
        Change: 2008-11-05 19:47:22.000000000+0000
        Inode version: 795h
        Data version: 2h
        Inode flags: 0000000800000000 (-------- -------- -------- ----r--- -------- -------- -------- --------)

Signed-off-by: David Howells <[hidden email]>
---

 arch/x86/syscalls/syscall_32.tbl |    2
 arch/x86/syscalls/syscall_64.tbl |    2
 fs/stat.c                        |  350 +++++++++++++++++++++++++++++++++++---
 include/linux/fcntl.h            |    1
 include/linux/fs.h               |    4
 include/linux/stat.h             |  126 +++++++++++++-
 include/linux/syscalls.h         |    7 +
 7 files changed, 461 insertions(+), 31 deletions(-)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 29f9f05..980eb5a 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -355,3 +355,5 @@
 346 i386 setns sys_setns
 347 i386 process_vm_readv sys_process_vm_readv compat_sys_process_vm_readv
 348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev
+349 i386 xstat sys_xstat
+350 i386 fxstat sys_fxstat
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index dd29a9e..7ae24bb 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -318,6 +318,8 @@
 309 common getcpu sys_getcpu
 310 64 process_vm_readv sys_process_vm_readv
 311 64 process_vm_writev sys_process_vm_writev
+312 common xstat sys_xstat
+313 common fxstat sys_fxstat
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
 # for native 64-bit operation.
diff --git a/fs/stat.c b/fs/stat.c
index c733dc5..af3ef33 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -18,8 +18,20 @@
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 
+/**
+ * generic_fillattr - Fill in the basic attributes from the inode struct
+ * @inode: Inode to use as the source
+ * @stat: Where to fill in the attributes
+ *
+ * Fill in the basic attributes in the kstat structure from data that's to be
+ * found on the VFS inode structure.  This is the default if no getattr inode
+ * operation is supplied.
+ */
 void generic_fillattr(struct inode *inode, struct kstat *stat)
 {
+ struct super_block *sb = inode->i_sb;
+ u32 x;
+
  stat->dev = inode->i_sb->s_dev;
  stat->ino = inode->i_ino;
  stat->mode = inode->i_mode;
@@ -27,17 +39,61 @@ void generic_fillattr(struct inode *inode, struct kstat *stat)
  stat->uid = inode->i_uid;
  stat->gid = inode->i_gid;
  stat->rdev = inode->i_rdev;
- stat->size = i_size_read(inode);
- stat->atime = inode->i_atime;
  stat->mtime = inode->i_mtime;
  stat->ctime = inode->i_ctime;
- stat->blksize = (1 << inode->i_blkbits);
+ stat->size = i_size_read(inode);
  stat->blocks = inode->i_blocks;
-}
+ stat->blksize = (1 << inode->i_blkbits);
 
+ stat->result_mask |= XSTAT_BASIC_STATS & ~XSTAT_RDEV;
+ if (IS_NOATIME(inode))
+ stat->result_mask &= ~XSTAT_ATIME;
+ else
+ stat->atime = inode->i_atime;
+
+ if (S_ISREG(stat->mode) && stat->nlink == 0)
+ stat->information |= XSTAT_INFO_TEMPORARY;
+ if (IS_AUTOMOUNT(inode))
+ stat->information |= XSTAT_INFO_AUTOMOUNT;
+ if (IS_POSIXACL(inode))
+ stat->information |= XSTAT_INFO_HAS_ACL;
+
+ /* if unset, assume 1s granularity */
+ stat->tv_granularity = sb->s_time_gran ?: 1000000000U;
+
+ if (unlikely(S_ISBLK(stat->mode) || S_ISCHR(stat->mode)))
+ stat->result_mask |= XSTAT_RDEV;
+
+ x  = ((u32*)&stat->volume_id)[0] = ((u32*)&sb->s_volume_id)[0];
+ x |= ((u32*)&stat->volume_id)[1] = ((u32*)&sb->s_volume_id)[1];
+ x |= ((u32*)&stat->volume_id)[2] = ((u32*)&sb->s_volume_id)[2];
+ x |= ((u32*)&stat->volume_id)[3] = ((u32*)&sb->s_volume_id)[3];
+ if (x)
+ stat->result_mask |= XSTAT_VOLUME_ID;
+}
 EXPORT_SYMBOL(generic_fillattr);
 
-int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+/**
+ * vfs_xgetattr - Get the basic and extra attributes of a file
+ * @mnt: The mountpoint to which the dentry belongs
+ * @dentry: The file of interest
+ * @stat: Where to return the statistics
+ *
+ * Ask the filesystem for a file's attributes.  The caller must have preset
+ * stat->request_mask and stat->query_flags to indicate what they want.
+ *
+ * If the file is remote, the filesystem can be forced to update the attributes
+ * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags.
+ *
+ * Bits must have been set in stat->request_mask to indicate which attributes
+ * the caller wants retrieving.  Any such attribute not requested may be
+ * returned anyway, but the value may be approximate, and, if remote, may not
+ * have been synchronised with the server.
+ *
+ * 0 will be returned on success, and a -ve error code if unsuccessful.
+ */
+int vfs_xgetattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat)
 {
  struct inode *inode = dentry->d_inode;
  int retval;
@@ -46,64 +102,184 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
  if (retval)
  return retval;
 
+ stat->result_mask = 0;
+ stat->information = 0;
+ stat->ioc_flags = 0;
  if (inode->i_op->getattr)
  return inode->i_op->getattr(mnt, dentry, stat);
 
  generic_fillattr(inode, stat);
  return 0;
 }
+EXPORT_SYMBOL(vfs_xgetattr);
 
+/**
+ * vfs_getattr - Get the basic attributes of a file
+ * @mnt: The mountpoint to which the dentry belongs
+ * @dentry: The file of interest
+ * @stat: Where to return the statistics
+ *
+ * Ask the filesystem for a file's attributes.  If remote, the filesystem isn't
+ * forced to update its files from the backing store.  Only the basic set of
+ * attributes will be retrieved; anyone wanting more must use vfs_getxattr(),
+ * as must anyone who wants to force attributes to be sync'd with the server.
+ *
+ * 0 will be returned on success, and a -ve error code if unsuccessful.
+ */
+int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+{
+ stat->query_flags = 0;
+ stat->request_mask = XSTAT_BASIC_STATS;
+ return vfs_xgetattr(mnt, dentry, stat);
+}
 EXPORT_SYMBOL(vfs_getattr);
 
-int vfs_fstat(unsigned int fd, struct kstat *stat)
+/**
+ * vfs_fxstat - Get basic and extra attributes by file descriptor
+ * @fd: The file descriptor refering to the file of interest
+ * @stat: The result structure to fill in.
+ *
+ * This function is a wrapper around vfs_xgetattr().  The main difference is
+ * that it uses a file descriptor to determine the file location.
+ *
+ * The caller must have preset stat->query_flags and stat->request_mask as for
+ * vfs_xgetattr().
+ *
+ * 0 will be returned on success, and a -ve error code if unsuccessful.
+ */
+int vfs_fxstat(unsigned int fd, struct kstat *stat)
 {
  struct file *f = fget(fd);
  int error = -EBADF;
 
+ if (stat->query_flags & ~KSTAT_QUERY_FLAGS)
+ return -EINVAL;
  if (f) {
- error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat);
+ error = vfs_xgetattr(f->f_path.mnt, f->f_path.dentry, stat);
  fput(f);
  }
  return error;
 }
+EXPORT_SYMBOL(vfs_fxstat);
+
+/**
+ * vfs_fstat - Get basic attributes by file descriptor
+ * @fd: The file descriptor refering to the file of interest
+ * @stat: The result structure to fill in.
+ *
+ * This function is a wrapper around vfs_getattr().  The main difference is
+ * that it uses a file descriptor to determine the file location.
+ *
+ * 0 will be returned on success, and a -ve error code if unsuccessful.
+ */
+int vfs_fstat(unsigned int fd, struct kstat *stat)
+{
+ stat->query_flags = 0;
+ stat->request_mask = XSTAT_BASIC_STATS;
+ return vfs_fxstat(fd, stat);
+}
 EXPORT_SYMBOL(vfs_fstat);
 
-int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
- int flag)
+/**
+ * vfs_xstat - Get basic and extra attributes by filename
+ * @dfd: A file descriptor representing the base dir for a relative filename
+ * @filename: The name of the file of interest
+ * @flags: Flags to control the query
+ * @stat: The result structure to fill in.
+ *
+ * This function is a wrapper around vfs_xgetattr().  The main difference is
+ * that it uses a filename and base directory to determine the file location.
+ * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a
+ * symlink at the given name from being referenced.
+ *
+ * The caller must have preset stat->request_mask as for vfs_xgetattr().  The
+ * flags are also used to load up stat->query_flags.
+ *
+ * 0 will be returned on success, and a -ve error code if unsuccessful.
+ */
+int vfs_xstat(int dfd, const char __user *filename, int flags,
+      struct kstat *stat)
 {
  struct path path;
- int error = -EINVAL;
- int lookup_flags = 0;
+ int error, lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
 
- if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
-      AT_EMPTY_PATH)) != 0)
- goto out;
+ if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
+      AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0)
+ return -EINVAL;
 
- if (!(flag & AT_SYMLINK_NOFOLLOW))
- lookup_flags |= LOOKUP_FOLLOW;
- if (flag & AT_EMPTY_PATH)
+ if (flags & AT_SYMLINK_NOFOLLOW)
+ lookup_flags &= ~LOOKUP_FOLLOW;
+ if (flags & AT_NO_AUTOMOUNT)
+ lookup_flags &= ~LOOKUP_AUTOMOUNT;
+ if (flags & AT_EMPTY_PATH)
  lookup_flags |= LOOKUP_EMPTY;
 
+ stat->query_flags = flags & KSTAT_QUERY_FLAGS;
  error = user_path_at(dfd, filename, lookup_flags, &path);
- if (error)
- goto out;
-
- error = vfs_getattr(path.mnt, path.dentry, stat);
- path_put(&path);
-out:
+ if (!error) {
+ error = vfs_xgetattr(path.mnt, path.dentry, stat);
+ path_put(&path);
+ }
  return error;
 }
+EXPORT_SYMBOL(vfs_xstat);
+
+/**
+ * vfs_fstatat - Get basic attributes by filename
+ * @dfd: A file descriptor representing the base dir for a relative filename
+ * @filename: The name of the file of interest
+ * @flags: Flags to control the query
+ * @stat: The result structure to fill in.
+ *
+ * This function is a wrapper around vfs_xstat().  The difference is that it
+ * preselects basic stats only.  The flags are used to load up
+ * stat->query_flags in addition to indicating symlink handling during path
+ * resolution.
+ *
+ * 0 will be returned on success, and a -ve error code if unsuccessful.
+ */
+int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
+ int flags)
+{
+ stat->request_mask = XSTAT_BASIC_STATS;
+ return vfs_xstat(dfd, filename, flags, stat);
+}
 EXPORT_SYMBOL(vfs_fstatat);
 
-int vfs_stat(const char __user *name, struct kstat *stat)
+/**
+ * vfs_stat - Get basic attributes by filename
+ * @filename: The name of the file of interest
+ * @stat: The result structure to fill in.
+ *
+ * This function is a wrapper around vfs_xstat().  The difference is that it
+ * preselects basic stats only, terminal symlinks are followed regardless and a
+ * remote filesystem can't be forced to query the server.  If such is desired,
+ * vfs_xstat() should be used instead.
+ *
+ * 0 will be returned on success, and a -ve error code if unsuccessful.
+ */
+int vfs_stat(const char __user *filename, struct kstat *stat)
 {
- return vfs_fstatat(AT_FDCWD, name, stat, 0);
+ stat->request_mask = XSTAT_BASIC_STATS;
+ return vfs_xstat(AT_FDCWD, filename, 0, stat);
 }
 EXPORT_SYMBOL(vfs_stat);
 
+/**
+ * vfs_stat - Get basic attributes by filename, without following terminal symlink
+ * @filename: The name of the file of interest
+ * @stat: The result structure to fill in.
+ *
+ * This function is a wrapper around vfs_xstat().  The difference is that it
+ * preselects basic stats only, terminal symlinks are note followed regardless
+ * and a remote filesystem can't be forced to query the server.  If such is
+ * desired, vfs_xstat() should be used instead.
+ *
+ * 0 will be returned on success, and a -ve error code if unsuccessful.
+ */
 int vfs_lstat(const char __user *name, struct kstat *stat)
 {
- return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW);
+ return vfs_xstat(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat);
 }
 EXPORT_SYMBOL(vfs_lstat);
 
@@ -118,7 +294,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
 {
  static int warncount = 5;
  struct __old_kernel_stat tmp;
-
+
  if (warncount > 0) {
  warncount--;
  printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n",
@@ -143,7 +319,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
 #if BITS_PER_LONG == 32
  if (stat->size > MAX_NON_LFS)
  return -EOVERFLOW;
-#endif
+#endif
  tmp.st_size = stat->size;
  tmp.st_atime = stat->atime.tv_sec;
  tmp.st_mtime = stat->mtime.tv_sec;
@@ -225,7 +401,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
 #if BITS_PER_LONG == 32
  if (stat->size > MAX_NON_LFS)
  return -EOVERFLOW;
-#endif
+#endif
  tmp.st_size = stat->size;
  tmp.st_atime = stat->atime.tv_sec;
  tmp.st_mtime = stat->mtime.tv_sec;
@@ -412,6 +588,122 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename,
 }
 #endif /* __ARCH_WANT_STAT64 */
 
+/*
+ * Get the xstat parameters if supplied
+ */
+static int xstat_get_params(unsigned int mask, struct xstat __user *buffer,
+    struct kstat *stat)
+{
+ memset(stat, 0xde, sizeof(*stat)); // DEBUGGING
+
+ if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer)))
+ return -EFAULT;
+
+ stat->request_mask = mask & XSTAT_ALL_STATS;
+ stat->result_mask = 0;
+ return 0;
+}
+
+/*
+ * Set the xstat results.
+ *
+ * If the buffer size was 0, we just return the size of the buffer needed to
+ * return the full result.
+ *
+ * If bufsize indicates a buffer of insufficient size to hold the full result,
+ * we return -E2BIG.
+ *
+ * Otherwise we copy the extended stats to userspace and return the amount of
+ * data written into the buffer (or -EFAULT).
+ */
+static long xstat_set_result(struct kstat *stat, struct xstat __user *buffer)
+{
+ u32 mask = stat->result_mask, gran = stat->tv_granularity;
+
+#define __put_timestamp(kts, uts) ( \
+ __put_user(kts.tv_sec, uts.tv_sec ) || \
+ __put_user(kts.tv_nsec, uts.tv_nsec ) || \
+ __put_user(gran, uts.tv_granularity ))
+
+ /* clear out anything we're not returning */
+ if (!(mask & XSTAT_IOC_FLAGS))
+ stat->ioc_flags = 0;
+ if (!(mask & XSTAT_BTIME))
+ memset(&stat->btime, 0, sizeof(stat->btime));
+ if (!(mask & XSTAT_GEN))
+ stat->gen = 0;
+ if (!(mask & XSTAT_VERSION))
+ stat->version = 0;
+ if (!(mask & XSTAT_VOLUME_ID))
+ memset(&stat->volume_id, 0, sizeof(stat->volume_id));
+
+ /* transfer the results */
+ if (__put_user(mask, &buffer->st_mask ) ||
+    __put_user(stat->mode, &buffer->st_mode ) ||
+    __put_user(stat->nlink, &buffer->st_nlink ) ||
+    __put_user(stat->uid, &buffer->st_uid ) ||
+    __put_user(stat->gid, &buffer->st_gid ) ||
+    __put_user(stat->information, &buffer->st_information ) ||
+    __put_user(stat->ioc_flags, &buffer->st_ioc_flags ) ||
+    __put_user(stat->blksize, &buffer->st_blksize ) ||
+    __put_user(MAJOR(stat->rdev), &buffer->st_rdev.major ) ||
+    __put_user(MINOR(stat->rdev), &buffer->st_rdev.minor ) ||
+    __put_user(MAJOR(stat->dev), &buffer->st_dev.major ) ||
+    __put_user(MINOR(stat->dev), &buffer->st_dev.minor ) ||
+    __put_timestamp(stat->atime, &buffer->st_atime ) ||
+    __put_timestamp(stat->btime, &buffer->st_btime ) ||
+    __put_timestamp(stat->ctime, &buffer->st_ctime ) ||
+    __put_timestamp(stat->mtime, &buffer->st_mtime ) ||
+    __put_user(stat->ino, &buffer->st_ino ) ||
+    __put_user(stat->size, &buffer->st_size ) ||
+    __put_user(stat->blocks, &buffer->st_blocks ) ||
+    __put_user(stat->gen, &buffer->st_gen ) ||
+    __put_user(stat->version, &buffer->st_version ) ||
+    __copy_to_user(&buffer->st_volume_id, &stat->volume_id,
+   sizeof(buffer->st_volume_id) ) ||
+    __clear_user(&buffer->__spares, sizeof(buffer->__spares)))
+ return -EFAULT;
+ return 0;
+}
+
+/*
+ * System call to get extended stats by path
+ */
+SYSCALL_DEFINE5(xstat,
+ int, dfd, const char __user *, filename, unsigned, flags,
+ unsigned int, mask, struct xstat __user *, buffer)
+{
+ struct kstat stat;
+ int error;
+
+ error = xstat_get_params(mask, buffer, &stat);
+ if (error != 0)
+ return error;
+ error = vfs_xstat(dfd, filename, flags, &stat);
+ if (error)
+ return error;
+ return xstat_set_result(&stat, buffer);
+}
+
+/*
+ * System call to get extended stats by file descriptor
+ */
+SYSCALL_DEFINE4(fxstat, unsigned int, fd, unsigned int, flags,
+ unsigned int, mask, struct xstat __user *, buffer)
+{
+ struct kstat stat;
+ int error;
+
+ error = xstat_get_params(mask, buffer, &stat);
+ if (error < 0)
+ return error;
+ stat.query_flags = flags;
+ error = vfs_fxstat(fd, &stat);
+ if (error)
+ return error;
+ return xstat_set_result(&stat, buffer);
+}
+
 /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */
 void __inode_add_bytes(struct inode *inode, loff_t bytes)
 {
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index f550f89..faa9e5d 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -47,6 +47,7 @@
 #define AT_SYMLINK_FOLLOW 0x400   /* Follow symbolic links.  */
 #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
 #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
+#define AT_FORCE_ATTR_SYNC 0x2000 /* Force the attributes to be sync'd with the server */
 
 #ifdef __KERNEL__
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8de6755..ec6c62e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1467,6 +1467,7 @@ struct super_block {
 
  char s_id[32]; /* Informational name */
  u8 s_uuid[16]; /* UUID */
+ unsigned char s_volume_id[16]; /* Volume identifier */
 
  void *s_fs_info; /* Filesystem private info */
  unsigned int s_max_links;
@@ -2470,6 +2471,7 @@ extern const struct inode_operations page_symlink_inode_operations;
 extern int generic_readlink(struct dentry *, char __user *, int);
 extern void generic_fillattr(struct inode *, struct kstat *);
 extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
+extern int vfs_xgetattr(struct vfsmount *, struct dentry *, struct kstat *);
 void __inode_add_bytes(struct inode *inode, loff_t bytes);
 void inode_add_bytes(struct inode *inode, loff_t bytes);
 void inode_sub_bytes(struct inode *inode, loff_t bytes);
@@ -2482,6 +2484,8 @@ extern int vfs_stat(const char __user *, struct kstat *);
 extern int vfs_lstat(const char __user *, struct kstat *);
 extern int vfs_fstat(unsigned int, struct kstat *);
 extern int vfs_fstatat(int , const char __user *, struct kstat *, int);
+extern int vfs_xstat(int, const char __user *, int, struct kstat *);
+extern int vfs_xfstat(unsigned int, struct kstat *);
 
 extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
     unsigned long arg);
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 611c398..0ff561a 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -3,6 +3,7 @@
 
 #ifdef __KERNEL__
 
+#include <linux/types.h>
 #include <asm/stat.h>
 
 #endif
@@ -46,6 +47,117 @@
 
 #endif
 
+/*
+ * Query request/result mask
+ *
+ * Bits should be set in request_mask to request particular items when calling
+ * xstat() or fxstat().
+ *
+ * The bits in st_mask may or may not be set upon return, in part depending on
+ * what was set in the mask argument:
+ *
+ * - if not available at all, the bit will be cleared before returning and the
+ *   field will be cleared; otherwise,
+ *
+ * - if AT_FORCE_ATTR_SYNC is set, then the datum will be synchronised to the
+ *   server and the field and bit will be set on return; otherwise,
+ *
+ * - if explicitly requested, the datum will be synchronised to a server or
+ *   other medium if out of date before being returned, and the bit will be set
+ *   on return; otherwise,
+ *
+ * - if not requested, but available in approximate form without any effort, it
+ *   will be filled in anyway, and the bit will be set upon return (it might
+ *   not be up to date, however, and no attempt will be made to synchronise the
+ *   internal state first); otherwise,
+ *
+ * - the field and the bit will be cleared before returning.
+ *
+ * Items in XSTAT_BASIC_STATS may be marked unavailable on return, but they
+ * will have a value installed for compatibility purposes so that stat() and
+ * co. can be emulated in userspace.
+ */
+#define XSTAT_MODE 0x00000001U /* want/got st_mode */
+#define XSTAT_NLINK 0x00000002U /* want/got st_nlink */
+#define XSTAT_UID 0x00000004U /* want/got st_uid */
+#define XSTAT_GID 0x00000008U /* want/got st_gid */
+#define XSTAT_RDEV 0x00000010U /* want/got st_rdev */
+#define XSTAT_ATIME 0x00000020U /* want/got st_atime */
+#define XSTAT_MTIME 0x00000040U /* want/got st_mtime */
+#define XSTAT_CTIME 0x00000080U /* want/got st_ctime */
+#define XSTAT_INO 0x00000100U /* want/got st_ino */
+#define XSTAT_SIZE 0x00000200U /* want/got st_size */
+#define XSTAT_BLOCKS 0x00000400U /* want/got st_blocks */
+#define XSTAT_BASIC_STATS 0x000007ffU /* the stuff in the normal stat struct */
+#define XSTAT_IOC_FLAGS 0x00000800U /* want/got FS_IOC_GETFLAGS */
+#define XSTAT_BTIME 0x00001000U /* want/got st_btime */
+#define XSTAT_GEN 0x00002000U /* want/got st_gen */
+#define XSTAT_VERSION 0x00004000U /* want/got st_version */
+#define XSTAT_VOLUME_ID 0x00008000U /* want/got st_volume_id */
+#define XSTAT_ALL_STATS 0x0000ffffU /* all supported stats */
+
+/*
+ * Extended stat structures
+ */
+struct xstat_dev {
+ uint32_t major, minor;
+};
+
+struct xstat_time {
+ int64_t tv_sec;
+ uint32_t tv_nsec;
+ uint32_t tv_granularity; /* time granularity (in nS) */
+};
+
+struct xstat {
+ uint32_t st_mask; /* what results were written */
+ uint32_t st_mode; /* file mode */
+ uint32_t st_nlink; /* number of hard links */
+ uint32_t st_uid; /* user ID of owner */
+ uint32_t st_gid; /* group ID of owner */
+ uint32_t st_information; /* information about the file */
+ uint32_t st_ioc_flags; /* as FS_IOC_GETFLAGS */
+ uint32_t st_blksize; /* optimal size for filesystem I/O */
+ struct xstat_dev st_rdev; /* device ID of special file */
+ struct xstat_dev st_dev; /* ID of device containing file */
+ struct xstat_time st_atime; /* last access time */
+ struct xstat_time st_btime; /* file creation time */
+ struct xstat_time st_ctime; /* last attribute change time */
+ struct xstat_time st_mtime; /* last data modification time */
+ uint64_t st_ino; /* inode number */
+ uint64_t st_size; /* file size */
+ uint64_t st_blocks; /* number of 512-byte blocks allocated */
+ uint64_t st_gen; /* inode generation number */
+ uint64_t st_version; /* data version number */
+ uint8_t st_volume_id[16]; /* volume identifier */
+ uint64_t __spares[11]; /* spare space for future expansion */
+};
+
+/*
+ * Flags to be found in st_information
+ *
+ * These give information about the features or the state of a file that might
+ * be of use to ordinary userspace programs such as GUIs or ls rather than
+ * specialised tools.
+ *
+ * Additional information may be found in st_ioc_flags and we try not to
+ * overlap with it.
+ */
+#define XSTAT_INFO_ENCRYPTED 0x00000001U /* File is encrypted */
+#define XSTAT_INFO_TEMPORARY 0x00000002U /* File is temporary (NTFS/CIFS) */
+#define XSTAT_INFO_FABRICATED 0x00000004U /* File was made up by filesystem */
+#define XSTAT_INFO_KERNEL_API 0x00000008U /* File is kernel API (eg: procfs/sysfs) */
+#define XSTAT_INFO_REMOTE 0x00000010U /* File is remote */
+#define XSTAT_INFO_OFFLINE 0x00000020U /* File is offline (CIFS) */
+#define XSTAT_INFO_AUTOMOUNT 0x00000040U /* Dir is automount trigger */
+#define XSTAT_INFO_AUTODIR 0x00000080U /* Dir provides unlisted automounts */
+#define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U /* File has non-system ownership details */
+#define XSTAT_INFO_HAS_ACL 0x00000200U /* File has an ACL of some sort */
+#define XSTAT_INFO_REPARSE_POINT 0x00000400U /* File is reparse point (NTFS/CIFS) */
+#define XSTAT_INFO_HIDDEN 0x00000800U /* File is marked hidden (DOS+) */
+#define XSTAT_INFO_SYSTEM 0x00001000U /* File is marked system (DOS+) */
+#define XSTAT_INFO_ARCHIVE 0x00002000U /* File is marked archive (DOS+) */
+
 #ifdef __KERNEL__
 #define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO)
 #define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO)
@@ -60,6 +172,12 @@
 #include <linux/time.h>
 
 struct kstat {
+ u32 query_flags; /* operational flags */
+#define KSTAT_QUERY_FLAGS (AT_FORCE_ATTR_SYNC)
+ u32 request_mask; /* what fields the user asked for */
+ u32 result_mask; /* what fields the user got */
+ u32 information;
+ u32 ioc_flags; /* inode flags (FS_IOC_GETFLAGS) */
  u64 ino;
  dev_t dev;
  umode_t mode;
@@ -67,14 +185,18 @@ struct kstat {
  uid_t uid;
  gid_t gid;
  dev_t rdev;
+ unsigned int tv_granularity; /* granularity of times (in nS) */
  loff_t size;
- struct timespec  atime;
+ struct timespec atime;
  struct timespec mtime;
  struct timespec ctime;
+ struct timespec btime; /* file creation time */
  unsigned long blksize;
  unsigned long long blocks;
+ u64 gen; /* inode generation */
+ u64 version; /* data version */
+ unsigned char volume_id[16]; /* volume identifier */
 };
 
 #endif
-
 #endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 3de3acb..ff9f8d9 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -45,6 +45,8 @@ struct shmid_ds;
 struct sockaddr;
 struct stat;
 struct stat64;
+struct xstat_parameters;
+struct xstat;
 struct statfs;
 struct statfs64;
 struct __sysctl_args;
@@ -858,4 +860,9 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
       unsigned long riovcnt,
       unsigned long flags);
 
+asmlinkage long sys_xstat(int dfd, const char __user *path, unsigned flags,
+  unsigned mask, struct xstat __user *buffer);
+asmlinkage long sys_fxstat(unsigned fd, unsigned flags,
+   unsigned mask, struct xstat __user *buffer);
+
 #endif

Reply | Threaded
Open this post in threaded view
|

[PATCH 2/6] xstat: Ext4: Return extended attributes

David Howells
In reply to this post by David Howells
Return extended attributes from the Ext4 filesystem.  This includes the
following:

 (1) The inode creation time (i_crtime) as i_btime.

 (2) The inode i_generation as i_gen if not the root directory.

 (3) The inode i_version as st_data_version if a file with I_VERSION set or a
     directory.

 (4) FS_xxx_FL flags are returned as for ioctl(FS_IOC_GETFLAGS).

Signed-off-by: David Howells <[hidden email]>
---

 fs/ext4/ext4.h    |    2 ++
 fs/ext4/file.c    |    2 +-
 fs/ext4/inode.c   |   32 +++++++++++++++++++++++++++++---
 fs/ext4/namei.c   |    2 ++
 fs/ext4/super.c   |    1 +
 fs/ext4/symlink.c |    2 ++
 6 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ab2594a..81806da 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1899,6 +1899,8 @@ extern int  ext4_getattr(struct vfsmount *mnt, struct dentry *dentry,
  struct kstat *stat);
 extern void ext4_evict_inode(struct inode *);
 extern void ext4_clear_inode(struct inode *);
+extern int  ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry,
+      struct kstat *stat);
 extern int  ext4_sync_inode(handle_t *, struct inode *);
 extern void ext4_dirty_inode(struct inode *, int);
 extern int ext4_change_inode_journal_flag(struct inode *, int);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index cb70f18..ae8654c 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -249,7 +249,7 @@ const struct file_operations ext4_file_operations = {
 
 const struct inode_operations ext4_file_inode_operations = {
  .setattr = ext4_setattr,
- .getattr = ext4_getattr,
+ .getattr = ext4_file_getattr,
 #ifdef CONFIG_EXT4_FS_XATTR
  .setxattr = generic_setxattr,
  .getxattr = generic_getxattr,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c77b0bd..eafc188 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4191,11 +4191,37 @@ err_out:
 int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry,
  struct kstat *stat)
 {
- struct inode *inode;
- unsigned long delalloc_blocks;
+ struct inode *inode = dentry->d_inode;
+ struct ext4_inode_info *ei = EXT4_I(inode);
+
+ stat->result_mask |= XSTAT_BTIME;
+ stat->btime.tv_sec = ei->i_crtime.tv_sec;
+ stat->btime.tv_nsec = ei->i_crtime.tv_nsec;
+
+ if (inode->i_ino != EXT4_ROOT_INO) {
+ stat->result_mask |= XSTAT_GEN;
+ stat->gen = inode->i_generation;
+ }
+ if (S_ISDIR(inode->i_mode) || IS_I_VERSION(inode)) {
+ stat->result_mask |= XSTAT_VERSION;
+ stat->version = inode->i_version;
+ }
+
+ ext4_get_inode_flags(ei);
+ stat->ioc_flags |= ei->i_flags & EXT4_FL_USER_VISIBLE;
+ stat->result_mask |= XSTAT_IOC_FLAGS;
 
- inode = dentry->d_inode;
  generic_fillattr(inode, stat);
+ return 0;
+}
+
+int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry,
+      struct kstat *stat)
+{
+ struct inode *inode = dentry->d_inode;
+ u64 delalloc_blocks;
+
+ ext4_getattr(mnt, dentry, stat);
 
  /*
  * We can't update i_blocks if the block allocation is delayed
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 349d7b3..6162387 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2579,6 +2579,7 @@ const struct inode_operations ext4_dir_inode_operations = {
  .mknod = ext4_mknod,
  .rename = ext4_rename,
  .setattr = ext4_setattr,
+ .getattr = ext4_getattr,
 #ifdef CONFIG_EXT4_FS_XATTR
  .setxattr = generic_setxattr,
  .getxattr = generic_getxattr,
@@ -2591,6 +2592,7 @@ const struct inode_operations ext4_dir_inode_operations = {
 
 const struct inode_operations ext4_special_inode_operations = {
  .setattr = ext4_setattr,
+ .getattr = ext4_getattr,
 #ifdef CONFIG_EXT4_FS_XATTR
  .setxattr = generic_setxattr,
  .getxattr = generic_getxattr,
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ceebaf8..2d395bf 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3040,6 +3040,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
  if (sb->s_magic != EXT4_SUPER_MAGIC)
  goto cantfind_ext4;
  sbi->s_kbytes_written = le64_to_cpu(es->s_kbytes_written);
+ memcpy(sb->s_volume_id, es->s_uuid, sizeof(sb->s_volume_id));
 
  /* Set defaults before we parse the mount options */
  def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index ed9354a..d8fe7fb 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -35,6 +35,7 @@ const struct inode_operations ext4_symlink_inode_operations = {
  .follow_link = page_follow_link_light,
  .put_link = page_put_link,
  .setattr = ext4_setattr,
+ .getattr = ext4_getattr,
 #ifdef CONFIG_EXT4_FS_XATTR
  .setxattr = generic_setxattr,
  .getxattr = generic_getxattr,
@@ -47,6 +48,7 @@ const struct inode_operations ext4_fast_symlink_inode_operations = {
  .readlink = generic_readlink,
  .follow_link = ext4_follow_link,
  .setattr = ext4_setattr,
+ .getattr = ext4_getattr,
 #ifdef CONFIG_EXT4_FS_XATTR
  .setxattr = generic_setxattr,
  .getxattr = generic_getxattr,

Reply | Threaded
Open this post in threaded view
|

[PATCH 3/6] xstat: AFS: Return extended attributes

David Howells
In reply to this post by David Howells
Return extended attributes from the AFS filesystem.  This includes the
following:

 (1) The vnode uniquifier as st_gen.

 (2) The data version number as st_data_version.

 (3) XSTAT_INFO_AUTOMOUNT will be set on automount directories by virtue of
     S_AUTOMOUNT being set on the inode.  These are referrals to other volumes
     or other cells.

 (4) XSTAT_INFO_AUTODIR on a directory that does cell lookup for non-existent
     names and mounts them (typically mounted on /afs with -o autocell).  The
     resulting directories are marked XSTAT_INFO_FABRICATED as they do not
     actually exist in the mounted AFS directory.

 (6) Files, directories and symlinks accessed over AFS are marked
     XSTAT_INFO_REMOTE.

 (7) XSTAT_INFO_NONSYSTEM_OWNERSHIP is set as the UID and GID retrieved from an
     AFS share may not be applicable on the system.

 (8) XSTAT_INFO_HAS_ACL is set as AFS directories have ACLs (the UID and GID
     are only used through the ACLs) and these ACLs apply to the contents of
     the directories.

Signed-off-by: David Howells <[hidden email]>
---

 fs/afs/inode.c |   29 +++++++++++++++++++++--------
 fs/afs/super.c |    7 +++++++
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index d890ae3..062def2 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -71,9 +71,9 @@ static int afs_inode_map_status(struct afs_vnode *vnode, struct key *key)
  inode->i_uid = vnode->status.owner;
  inode->i_gid = 0;
  inode->i_size = vnode->status.size;
- inode->i_ctime.tv_sec = vnode->status.mtime_server;
- inode->i_ctime.tv_nsec = 0;
- inode->i_atime = inode->i_mtime = inode->i_ctime;
+ inode->i_mtime.tv_sec = vnode->status.mtime_server;
+ inode->i_mtime.tv_nsec = 0;
+ inode->i_atime = inode->i_ctime = inode->i_mtime;
  inode->i_blocks = 0;
  inode->i_generation = vnode->fid.unique;
  inode->i_version = vnode->status.data_version;
@@ -374,16 +374,29 @@ error_unlock:
 /*
  * read the attributes of an inode
  */
-int afs_getattr(struct vfsmount *mnt, struct dentry *dentry,
-      struct kstat *stat)
+int afs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 {
- struct inode *inode;
-
- inode = dentry->d_inode;
+ struct inode *inode = dentry->d_inode;
 
  _enter("{ ino=%lu v=%u }", inode->i_ino, inode->i_generation);
 
  generic_fillattr(inode, stat);
+
+ stat->result_mask &= ~(XSTAT_ATIME | XSTAT_CTIME | XSTAT_BLOCKS);
+ stat->result_mask |= XSTAT_GEN | XSTAT_VERSION;
+ stat->gen = inode->i_generation;
+ stat->version = inode->i_version;
+
+ if (test_bit(AFS_VNODE_AUTOCELL, &AFS_FS_I(inode)->flags))
+ stat->information |= XSTAT_INFO_AUTODIR;
+
+ if (test_bit(AFS_VNODE_PSEUDODIR, &AFS_FS_I(inode)->flags))
+ stat->information |= XSTAT_INFO_FABRICATED;
+ else
+ stat->information |= XSTAT_INFO_REMOTE;
+
+ stat->information |=
+ XSTAT_INFO_NONSYSTEM_OWNERSHIP | XSTAT_INFO_HAS_ACL;
  return 0;
 }
 
diff --git a/fs/afs/super.c b/fs/afs/super.c
index f02b31e..1f13b48 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -314,6 +314,13 @@ static int afs_fill_super(struct super_block *sb,
  sb->s_bdi = &as->volume->bdi;
  strlcpy(sb->s_id, as->volume->vlocation->vldb.name, sizeof(sb->s_id));
 
+ /* construct a volume ID from the AFS volume ID and type */
+ sb->s_volume_id[4] = as->volume->type;
+ sb->s_volume_id[3] = as->volume->vid >> 0;
+ sb->s_volume_id[2] = as->volume->vid >> 8;
+ sb->s_volume_id[1] = as->volume->vid >> 16;
+ sb->s_volume_id[0] = as->volume->vid >> 24;
+
  /* allocate the root inode and dentry */
  fid.vid = as->volume->vid;
  fid.vnode = 1;

Reply | Threaded
Open this post in threaded view
|

[PATCH 4/6] xstat: NFS: Return extended attributes

David Howells
In reply to this post by David Howells
Return extended attributes from the NFS filesystem.  This includes the
following:

 (1) The change attribute as st_data_version if NFSv4.

 (2) XSTAT_INFO_AUTOMOUNT and XSTAT_INFO_FABRICATED are set on referral or
     submount directories that are automounted upon.  NFS shows one directory
     with a different FSID, but the local filesystem has two: the mountpoint
     directory and the root of the filesystem mounted upon it.

 (3) XSTAT_INFO_REMOTE is set on files acquired over NFS.

Furthermore, what nfs_getattr() does can be controlled as follows:

 (1) If AT_FORCE_ATTR_SYNC is indicated, or mtime, ctime or data_version (NFSv4
     only) are requested then the outstanding writes will be written to the
     server first.

 (2) The inode's attributes may be synchronised with the server:

     (a) If AT_FORCE_ATTR_SYNC is indicated or if atime is requested (and atime
      updating is not suppressed by a mount flag) then the attributes will
      be reread unconditionally.

     (b) If the data version or any of basic stats are requested then the
      attributes will be reread if the cached attributes have expired.

     (c) Otherwise the cached attributes will be used - even if expired -
      without reference to the server.

Signed-off-by: David Howells <[hidden email]>
---

 fs/nfs/inode.c |   49 +++++++++++++++++++++++++++++++++++++------------
 fs/nfs/super.c |    1 +
 2 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index e8bbfa5..460fcf3 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -509,11 +509,18 @@ void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr)
 int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 {
  struct inode *inode = dentry->d_inode;
+ unsigned force = stat->query_flags & AT_FORCE_ATTR_SYNC;
  int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME;
  int err;
 
- /* Flush out writes to the server in order to update c/mtime.  */
- if (S_ISREG(inode->i_mode)) {
+ if (NFS_SERVER(inode)->nfs_client->rpc_ops->version < 4)
+ stat->request_mask &= ~XSTAT_VERSION;
+
+ /* Flush out writes to the server in order to update c/mtime
+ * or data version if the user wants them */
+ if ((force || (stat->request_mask &
+       (XSTAT_MTIME | XSTAT_CTIME | XSTAT_VERSION))) &&
+    S_ISREG(inode->i_mode)) {
  err = filemap_write_and_wait(inode->i_mapping);
  if (err)
  goto out;
@@ -528,18 +535,36 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
  *  - NFS never sets MS_NOATIME or MS_NODIRATIME so there is
  *    no point in checking those.
  */
- if ((mnt->mnt_flags & MNT_NOATIME) ||
-    ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode)))
+ if (mnt->mnt_flags & MNT_NOATIME ||
+    (mnt->mnt_flags & MNT_NODIRATIME && S_ISDIR(inode->i_mode))) {
+ stat->ioc_flags |= FS_NOATIME_FL;
+ need_atime = 0;
+ } else if (!(stat->request_mask & XSTAT_ATIME)) {
  need_atime = 0;
+ }
 
- if (need_atime)
- err = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
- else
- err = nfs_revalidate_inode(NFS_SERVER(inode), inode);
- if (!err) {
- generic_fillattr(inode, stat);
- stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
+ if (force || stat->request_mask & (XSTAT_BASIC_STATS | XSTAT_VERSION)) {
+ if (force || need_atime)
+ err = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ else
+ err = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ if (err)
+ goto out;
  }
+
+ generic_fillattr(inode, stat);
+ stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
+
+ if (stat->request_mask & XSTAT_VERSION) {
+ stat->version = inode->i_version;
+ stat->result_mask |= XSTAT_VERSION;
+ }
+
+ if (IS_AUTOMOUNT(inode))
+ stat->information |= XSTAT_INFO_FABRICATED;
+
+ stat->information |= XSTAT_INFO_REMOTE;
+
 out:
  return err;
 }
@@ -852,7 +877,7 @@ int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
 static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)
 {
  struct nfs_inode *nfsi = NFS_I(inode);
-
+
  if (mapping->nrpages != 0) {
  int ret = invalidate_inode_pages2(mapping);
  if (ret < 0)
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 37412f7..faa652c 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2222,6 +2222,7 @@ static int nfs_set_super(struct super_block *s, void *data)
  ret = set_anon_super(s, server);
  if (ret == 0)
  server->s_dev = s->s_dev;
+ memcpy(&s->s_volume_id, &server->fsid, sizeof(s->s_volume_id));
  return ret;
 }
 

Reply | Threaded
Open this post in threaded view
|

[PATCH 5/6] xstat: CIFS: Return extended attributes

David Howells
In reply to this post by David Howells
Return extended attributes from the CIFS filesystem.  This includes the
following:

 (1) Return the file creation time as btime.  We assume that the creation time
     won't change over the life of the inode.

 (2) Set XSTAT_INFO_AUTOMOUNT on referral/submount directories.

 (3) Unset XSTAT_INO if we made up the inode number and didn't get it from the
     server.

 (4) Unset XSTAT_[UG]ID if we are either returning values passed to mount
     and/or the server doesn't return them.

 (5) Map various Windows file attributes to FS_xxx_FL flags in st_ioc_flags
     and XSTAT_INFO_xxx flags in st_information, fetching them from the server
     if we don't have them yet or don't have a current copy.

     Possibly things like Hidden, System and Archive should be FS_xxx_FL flags
     rather than XSTAT_INFO_xxx flags and st_ioc_flags should be expanded to
     64 bits.

 (6) Set XSTAT_INFO_REMOTE on all files fetched by CIFS.

 (7) Set XSTAT_INFO_NONSYSTEM_OWNERSHIP on all files as they all have Windows
     ownership details too.

 (8) Set XSTAT_INFO_HAS_ACL if CONFIG_CIFS_ACL=y as Windows ACLs are available
     on the object.

Furthermore, what cifs_getattr() does can be controlled as follows:

 (1) If AT_FORCE_ATTR_SYNC is indicated, or if the inode flags or creation time
     are requested but not yet collected, then the attributes will be reread
     unconditionally.

 (2) If the basic stats are requested or if the inode flags are requested and
     have been collected previously, then the attributes will be reread if out
     of date.

 (3) Otherwise the cached attributes will be used - even if expired - without
     reference to the server.

Note that cifs_revalidate_dentry() will issue an extra operation to get the
FILE_ALL_INFO in addition to the FILE_UNIX_BASIC_INFO if it needs to collect
creation time and attributes on behalf of cifs_getattr().

[NOTE: THIS PATCH IS UNTESTED!]

Signed-off-by: David Howells <[hidden email]>
---

 fs/cifs/cifsfs.h   |    4 +-
 fs/cifs/cifsglob.h |   16 +++++--
 fs/cifs/dir.c      |    2 -
 fs/cifs/inode.c    |  120 +++++++++++++++++++++++++++++++++++++++++++++-------
 4 files changed, 118 insertions(+), 24 deletions(-)

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index d1389bb..021e327 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -56,9 +56,9 @@ extern int cifs_rmdir(struct inode *, struct dentry *);
 extern int cifs_rename(struct inode *, struct dentry *, struct inode *,
        struct dentry *);
 extern int cifs_revalidate_file_attr(struct file *filp);
-extern int cifs_revalidate_dentry_attr(struct dentry *);
+extern int cifs_revalidate_dentry_attr(struct dentry *, bool, bool);
 extern int cifs_revalidate_file(struct file *filp);
-extern int cifs_revalidate_dentry(struct dentry *);
+extern int cifs_revalidate_dentry(struct dentry *, bool, bool);
 extern int cifs_invalidate_mapping(struct inode *inode);
 extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
 extern int cifs_setattr(struct dentry *, struct iattr *);
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 4ff6313..d3567da 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -621,11 +621,15 @@ struct cifsInodeInfo {
  /* BB add in lists for dirty pages i.e. write caching info for oplock */
  struct list_head openFileList;
  __u32 cifsAttrs; /* e.g. DOS archive bit, sparse, compressed, system */
- bool clientCanCacheRead; /* read oplock */
- bool clientCanCacheAll; /* read and writebehind oplock */
- bool delete_pending; /* DELETE_ON_CLOSE is set */
- bool invalid_mapping; /* pagecache is invalid */
+ bool clientCanCacheRead:1; /* read oplock */
+ bool clientCanCacheAll:1; /* read and writebehind oplock */
+ bool delete_pending:1; /* DELETE_ON_CLOSE is set */
+ bool invalid_mapping:1; /* pagecache is invalid */
+ bool btime_valid:1; /* stored creation time is valid */
+ bool uid_faked:1; /* true if i_uid is faked */
+ bool gid_faked:1; /* true if i_gid is faked */
  unsigned long time; /* jiffies of last update of inode */
+ struct timespec btime; /* creation time */
  u64  server_eof; /* current file size on server -- protected by i_lock */
  u64  uniqueid; /* server inode number */
  u64  createtime; /* creation time on server */
@@ -833,6 +837,9 @@ struct dfs_info3_param {
 #define CIFS_FATTR_DELETE_PENDING 0x2
 #define CIFS_FATTR_NEED_REVAL 0x4
 #define CIFS_FATTR_INO_COLLISION 0x8
+#define CIFS_FATTR_WINATTRS_VALID 0x10 /* T if cf_btime and cf_cifsattrs valid */
+#define CIFS_FATTR_UID_FAKED 0x20 /* T if cf_uid is faked */
+#define CIFS_FATTR_GID_FAKED 0x40 /* T if cf_gid is faked */
 
 struct cifs_fattr {
  u32 cf_flags;
@@ -850,6 +857,7 @@ struct cifs_fattr {
  struct timespec cf_atime;
  struct timespec cf_mtime;
  struct timespec cf_ctime;
+ struct timespec cf_btime;
 };
 
 static inline void free_dfs_info_param(struct dfs_info3_param *param)
diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index d172c8e..d9e03ae 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -664,7 +664,7 @@ cifs_d_revalidate(struct dentry *direntry, struct nameidata *nd)
  return -ECHILD;
 
  if (direntry->d_inode) {
- if (cifs_revalidate_dentry(direntry))
+ if (cifs_revalidate_dentry(direntry, false, false))
  return 0;
  else {
  /*
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 745da3d..662d5ce 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -135,13 +135,21 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr)
  set_nlink(inode, fattr->cf_nlink);
  inode->i_uid = fattr->cf_uid;
  inode->i_gid = fattr->cf_gid;
+ if (fattr->cf_flags & CIFS_FATTR_UID_FAKED)
+ cifs_i->uid_faked = true;
+ if (fattr->cf_flags & CIFS_FATTR_GID_FAKED)
+ cifs_i->gid_faked = true;
 
  /* if dynperm is set, don't clobber existing mode */
  if (inode->i_state & I_NEW ||
     !(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM))
  inode->i_mode = fattr->cf_mode;
 
- cifs_i->cifsAttrs = fattr->cf_cifsattrs;
+ if (fattr->cf_flags & CIFS_FATTR_WINATTRS_VALID) {
+ cifs_i->cifsAttrs = fattr->cf_cifsattrs;
+ cifs_i->btime = fattr->cf_btime;
+ cifs_i->btime_valid = true;
+ }
 
  if (fattr->cf_flags & CIFS_FATTR_NEED_REVAL)
  cifs_i->time = 0;
@@ -248,15 +256,19 @@ cifs_unix_basic_to_fattr(struct cifs_fattr *fattr, FILE_UNIX_BASIC_INFO *info,
  break;
  }
 
- if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID)
+ if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) {
  fattr->cf_uid = cifs_sb->mnt_uid;
- else
+ fattr->cf_flags |= CIFS_FATTR_UID_FAKED;
+ } else {
  fattr->cf_uid = le64_to_cpu(info->Uid);
+ }
 
- if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)
+ if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) {
  fattr->cf_gid = cifs_sb->mnt_gid;
- else
+ fattr->cf_flags |= CIFS_FATTR_GID_FAKED;
+ } else {
  fattr->cf_gid = le64_to_cpu(info->Gid);
+ }
 
  fattr->cf_nlink = le64_to_cpu(info->Nlinks);
 }
@@ -283,7 +295,8 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct super_block *sb)
  fattr->cf_ctime = CURRENT_TIME;
  fattr->cf_mtime = CURRENT_TIME;
  fattr->cf_nlink = 2;
- fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL;
+ fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL |
+ CIFS_FATTR_UID_FAKED | CIFS_FATTR_GID_FAKED;
 }
 
 int cifs_get_file_info_unix(struct file *filp)
@@ -510,6 +523,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info,
  struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
 
  memset(fattr, 0, sizeof(*fattr));
+ fattr->cf_flags = CIFS_FATTR_WINATTRS_VALID;
  fattr->cf_cifsattrs = le32_to_cpu(info->Attributes);
  if (info->DeletePending)
  fattr->cf_flags |= CIFS_FATTR_DELETE_PENDING;
@@ -521,6 +535,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info,
 
  fattr->cf_ctime = cifs_NTtimeToUnix(info->ChangeTime);
  fattr->cf_mtime = cifs_NTtimeToUnix(info->LastWriteTime);
+ fattr->cf_btime = cifs_NTtimeToUnix(info->CreationTime);
 
  if (adjust_tz) {
  fattr->cf_ctime.tv_sec += tcon->ses->server->timeAdj;
@@ -1724,7 +1739,8 @@ int cifs_revalidate_file_attr(struct file *filp)
  return rc;
 }
 
-int cifs_revalidate_dentry_attr(struct dentry *dentry)
+int cifs_revalidate_dentry_attr(struct dentry *dentry,
+ bool want_extra_bits, bool force)
 {
  int xid;
  int rc = 0;
@@ -1735,7 +1751,7 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry)
  if (inode == NULL)
  return -ENOENT;
 
- if (!cifs_inode_needs_reval(inode))
+ if (!force && !cifs_inode_needs_reval(inode))
  return rc;
 
  xid = GetXid();
@@ -1752,9 +1768,12 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry)
  "%ld jiffies %ld", full_path, inode, inode->i_count.counter,
  dentry, dentry->d_time, jiffies);
 
- if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext)
+ if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext) {
  rc = cifs_get_inode_info_unix(&inode, full_path, sb, xid);
- else
+ if (rc != 0)
+ goto out;
+ }
+ if (!cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext || want_extra_bits)
  rc = cifs_get_inode_info(&inode, full_path, NULL, sb,
  xid, NULL);
 
@@ -1779,12 +1798,13 @@ int cifs_revalidate_file(struct file *filp)
 }
 
 /* revalidate a dentry's inode attributes */
-int cifs_revalidate_dentry(struct dentry *dentry)
+int cifs_revalidate_dentry(struct dentry *dentry,
+   bool want_extra_bits, bool force)
 {
  int rc;
  struct inode *inode = dentry->d_inode;
 
- rc = cifs_revalidate_dentry_attr(dentry);
+ rc = cifs_revalidate_dentry_attr(dentry, want_extra_bits, force);
  if (rc)
  return rc;
 
@@ -1796,11 +1816,30 @@ int cifs_revalidate_dentry(struct dentry *dentry)
 int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
  struct kstat *stat)
 {
+ struct cifsInodeInfo *cifs_i = CIFS_I(dentry->d_inode);
  struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb);
  struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
  struct inode *inode = dentry->d_inode;
+ unsigned force = stat->query_flags & AT_FORCE_ATTR_SYNC;
+ bool want_extra_bits = false;
+ u32 info, ioc = 0;
+ u32 attrs;
  int rc;
 
+ if (cifs_i->uid_faked)
+ stat->request_mask &= ~XSTAT_UID;
+ if (cifs_i->gid_faked)
+ stat->request_mask &= ~XSTAT_GID;
+
+ if (stat->request_mask & XSTAT_BTIME && !cifs_i->btime_valid) {
+ want_extra_bits = true;
+ force = true;
+ }
+ if (stat->request_mask & XSTAT_IOC_FLAGS) {
+ want_extra_bits = true;
+ force = true;
+ }
+
  /*
  * We need to be sure that all dirty pages are written and the server
  * has actual ctime, mtime and file length.
@@ -1814,13 +1853,14 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
  }
  }
 
- rc = cifs_revalidate_dentry_attr(dentry);
- if (rc)
- return rc;
+ if (force || stat->request_mask & XSTAT_BASIC_STATS) {
+ rc = cifs_revalidate_dentry(dentry, want_extra_bits, force);
+ if (rc)
+ return rc;
+ }
 
  generic_fillattr(inode, stat);
  stat->blksize = CIFS_MAX_MSGSIZE;
- stat->ino = CIFS_I(inode)->uniqueid;
 
  /*
  * If on a multiuser mount without unix extensions, and the admin hasn't
@@ -1834,7 +1874,53 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
  if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID))
  stat->gid = current_fsgid();
  }
- return rc;
+
+ info = XSTAT_INFO_REMOTE | XSTAT_INFO_NONSYSTEM_OWNERSHIP;
+#ifdef CONFIG_CIFS_ACL
+ info |= XSTAT_INFO_HAS_ACL;
+#endif
+
+ if (cifs_i->btime_valid) {
+ stat->btime = cifs_i->btime;
+ stat->result_mask |= XSTAT_BTIME;
+ }
+
+ /* We don't promise an inode number if we made one up */
+ stat->ino = cifs_i->uniqueid;
+ if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM))
+ stat->result_mask &= ~XSTAT_INO;
+
+ /*
+ * If on a multiuser mount without unix extensions, and the admin
+ * hasn't overridden them, set the ownership to the fsuid/fsgid of the
+ * current process.
+ */
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER) &&
+    !tcon->unix_ext) {
+ if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID))
+ stat->uid = current_fsuid();
+ if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID))
+ stat->gid = current_fsgid();
+ }
+ if (cifs_i->uid_faked)
+ stat->result_mask &= ~XSTAT_UID;
+ if (cifs_i->gid_faked)
+ stat->result_mask &= ~XSTAT_GID;
+
+ attrs = cifs_i->cifsAttrs;
+ if (attrs & ATTR_HIDDEN) info |= XSTAT_INFO_HIDDEN;
+ if (attrs & ATTR_SYSTEM) info |= XSTAT_INFO_SYSTEM;
+ if (attrs & ATTR_ARCHIVE) info |= XSTAT_INFO_ARCHIVE;
+ if (attrs & ATTR_TEMPORARY) info |= XSTAT_INFO_TEMPORARY;
+ if (attrs & ATTR_REPARSE) info |= XSTAT_INFO_REPARSE_POINT;
+ if (attrs & ATTR_OFFLINE) info |= XSTAT_INFO_OFFLINE;
+ if (attrs & ATTR_ENCRYPTED) info |= XSTAT_INFO_ENCRYPTED;
+ stat->information |= info;
+
+ if (attrs & ATTR_READONLY) ioc |= FS_IMMUTABLE_FL;
+ if (attrs & ATTR_COMPRESSED) ioc |= FS_COMPR_FL;
+ stat->ioc_flags |= ioc;
+ return 0;
 }
 
 static int cifs_truncate_page(struct address_space *mapping, loff_t from)

Reply | Threaded
Open this post in threaded view
|

[PATCH 6/6] xstat: eCryptFS: Return extended attributes

David Howells
In reply to this post by David Howells
Return extended attributes from the eCryptFS filesystem, dredged up from the
lower filesystem.  XSTAT_INFO_ENCRYPTED is set on the files whose cryptography
is handled by eCryptFS.

Possibly eCryptFS should also set FS_COMPR_FL on its compressed files.

Signed-off-by: David Howells <[hidden email]>
---

 fs/ecryptfs/inode.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index ab35b11..62865e9 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -1060,13 +1060,23 @@ int ecryptfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
  struct kstat lower_stat;
  int rc;
 
- rc = vfs_getattr(ecryptfs_dentry_to_lower_mnt(dentry),
- ecryptfs_dentry_to_lower(dentry), &lower_stat);
+ lower_stat.query_flags = stat->query_flags;
+ lower_stat.request_mask = stat->request_mask | XSTAT_BLOCKS;
+ rc = vfs_xgetattr(ecryptfs_dentry_to_lower_mnt(dentry),
+  ecryptfs_dentry_to_lower(dentry), &lower_stat);
  if (!rc) {
  fsstack_copy_attr_all(dentry->d_inode,
       ecryptfs_inode_to_lower(dentry->d_inode));
  generic_fillattr(dentry->d_inode, stat);
  stat->blocks = lower_stat.blocks;
+ stat->result_mask = lower_stat.result_mask;
+ stat->information = lower_stat.information;
+ stat->information |= XSTAT_INFO_ENCRYPTED;
+ stat->gen = lower_stat.gen;
+ stat->version = lower_stat.version;
+ stat->ioc_flags = lower_stat.ioc_flags;
+ memcpy(&stat->volume_id, lower_stat.volume_id,
+       sizeof(stat->volume_id));
  }
  return rc;
 }

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 4/6] xstat: NFS: Return extended attributes

Myklebust, Trond
In reply to this post by David Howells
On Thu, 2012-04-19 at 15:06 +0100, David Howells wrote:

> Return extended attributes from the NFS filesystem.  This includes the
> following:
>
>  (1) The change attribute as st_data_version if NFSv4.
>
>  (2) XSTAT_INFO_AUTOMOUNT and XSTAT_INFO_FABRICATED are set on referral or
>      submount directories that are automounted upon.  NFS shows one directory
>      with a different FSID, but the local filesystem has two: the mountpoint
>      directory and the root of the filesystem mounted upon it.
>
>  (3) XSTAT_INFO_REMOTE is set on files acquired over NFS.
>
> Furthermore, what nfs_getattr() does can be controlled as follows:
>
>  (1) If AT_FORCE_ATTR_SYNC is indicated, or mtime, ctime or data_version (NFSv4
>      only) are requested then the outstanding writes will be written to the
>      server first.
>
>  (2) The inode's attributes may be synchronised with the server:
>
>      (a) If AT_FORCE_ATTR_SYNC is indicated or if atime is requested (and atime
>       updating is not suppressed by a mount flag) then the attributes will
>       be reread unconditionally.
>
>      (b) If the data version or any of basic stats are requested then the
>       attributes will be reread if the cached attributes have expired.
>
>      (c) Otherwise the cached attributes will be used - even if expired -
>       without reference to the server.

Hmm... As far as I can see you are still doing an nfs_revalidate_inode()
in the non-forced case. That will cause expired attributes to be
retrieved from the server.

> Signed-off-by: David Howells <[hidden email]>
> ---
>
>  fs/nfs/inode.c |   49 +++++++++++++++++++++++++++++++++++++------------
>  fs/nfs/super.c |    1 +
>  2 files changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index e8bbfa5..460fcf3 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -509,11 +509,18 @@ void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr)
>  int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
>  {
>   struct inode *inode = dentry->d_inode;
> + unsigned force = stat->query_flags & AT_FORCE_ATTR_SYNC;
>   int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME;
>   int err;
>  
> - /* Flush out writes to the server in order to update c/mtime.  */
> - if (S_ISREG(inode->i_mode)) {
> + if (NFS_SERVER(inode)->nfs_client->rpc_ops->version < 4)
> + stat->request_mask &= ~XSTAT_VERSION;
> +
> + /* Flush out writes to the server in order to update c/mtime
> + * or data version if the user wants them */
> + if ((force || (stat->request_mask &
> +       (XSTAT_MTIME | XSTAT_CTIME | XSTAT_VERSION))) &&
> +    S_ISREG(inode->i_mode)) {
>   err = filemap_write_and_wait(inode->i_mapping);

We can get rid of the filemap_write_and_wait() if the caller allows us
to approximate m/ctime values. That would give a major speed-up for most
stat() workloads.

>   if (err)
>   goto out;
> @@ -528,18 +535,36 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
>   *  - NFS never sets MS_NOATIME or MS_NODIRATIME so there is
>   *    no point in checking those.
>   */
> - if ((mnt->mnt_flags & MNT_NOATIME) ||
> -    ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode)))
> + if (mnt->mnt_flags & MNT_NOATIME ||
> +    (mnt->mnt_flags & MNT_NODIRATIME && S_ISDIR(inode->i_mode))) {
> + stat->ioc_flags |= FS_NOATIME_FL;
> + need_atime = 0;
> + } else if (!(stat->request_mask & XSTAT_ATIME)) {
>   need_atime = 0;
> + }
>  
> - if (need_atime)
> - err = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
> - else
> - err = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> - if (!err) {
> - generic_fillattr(inode, stat);
> - stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
> + if (force || stat->request_mask & (XSTAT_BASIC_STATS | XSTAT_VERSION)) {
> + if (force || need_atime)
> + err = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
> + else
> + err = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> + if (err)
> + goto out;
>   }
> +
> + generic_fillattr(inode, stat);
> + stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
> +
> + if (stat->request_mask & XSTAT_VERSION) {
> + stat->version = inode->i_version;
> + stat->result_mask |= XSTAT_VERSION;
> + }
> +
> + if (IS_AUTOMOUNT(inode))
> + stat->information |= XSTAT_INFO_FABRICATED;
> +
> + stat->information |= XSTAT_INFO_REMOTE;
> +
>  out:
>   return err;
>  }
> @@ -852,7 +877,7 @@ int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
>  static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)
>  {
>   struct nfs_inode *nfsi = NFS_I(inode);
> -
> +
>   if (mapping->nrpages != 0) {
>   int ret = invalidate_inode_pages2(mapping);
>   if (ret < 0)
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 37412f7..faa652c 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -2222,6 +2222,7 @@ static int nfs_set_super(struct super_block *s, void *data)
>   ret = set_anon_super(s, server);
>   if (ret == 0)
>   server->s_dev = s->s_dev;
> + memcpy(&s->s_volume_id, &server->fsid, sizeof(s->s_volume_id));
>   return ret;
>  }
>  
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[hidden email]
www.netapp.com

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/6] xstat: CIFS: Return extended attributes

Steve French-2
In reply to this post by David Howells
For some of our users this would help A LOT.

Interesting ... just had discussions yesterday with some guys trying
to migrate to Linux and another set trying to backup Windows/NetApp
from Linux.

Some things they brought up that they needed (beyond what we already
have wth the cifs acl and SID and "dos attributes" xattrs, which is
even more useful now with the backup intent cifs mount flag) included:
- how do they tell if the inode number for a file was manufactured on
the client, or whether we were able to use the server file's inode
number ("UniqueId")
- how to get birth time (creation time)
- how to tell if file is "offline" (HSM)
- And is there a way to return the other less common cifs attributes
(e.g. "reparse")

Dave's patch seems to address all of that.  Samba server stuffs most
of this in an ndr encoded xattr blob which isn't much good for kernel
code to use, and I really prefer Dave's approach.   Without this, I
would need to add another cifs specific ioctl, but since there is
significant overlap between some of these and ntfs, vfat, nfs etc. I
like the xstat idea better.



On Thu, Apr 19, 2012 at 9:07 AM, David Howells <[hidden email]> wrote:

> Return extended attributes from the CIFS filesystem.  This includes the
> following:
>
>  (1) Return the file creation time as btime.  We assume that the creation time
>     won't change over the life of the inode.
>
>  (2) Set XSTAT_INFO_AUTOMOUNT on referral/submount directories.
>
>  (3) Unset XSTAT_INO if we made up the inode number and didn't get it from the
>     server.
>
>  (4) Unset XSTAT_[UG]ID if we are either returning values passed to mount
>     and/or the server doesn't return them.
>
>  (5) Map various Windows file attributes to FS_xxx_FL flags in st_ioc_flags
>     and XSTAT_INFO_xxx flags in st_information, fetching them from the server
>     if we don't have them yet or don't have a current copy.
>
>     Possibly things like Hidden, System and Archive should be FS_xxx_FL flags
>     rather than XSTAT_INFO_xxx flags and st_ioc_flags should be expanded to
>     64 bits.
>
>  (6) Set XSTAT_INFO_REMOTE on all files fetched by CIFS.
>
>  (7) Set XSTAT_INFO_NONSYSTEM_OWNERSHIP on all files as they all have Windows
>     ownership details too.
>
>  (8) Set XSTAT_INFO_HAS_ACL if CONFIG_CIFS_ACL=y as Windows ACLs are available
>     on the object.
>
> Furthermore, what cifs_getattr() does can be controlled as follows:
>
>  (1) If AT_FORCE_ATTR_SYNC is indicated, or if the inode flags or creation time
>     are requested but not yet collected, then the attributes will be reread
>     unconditionally.
>
>  (2) If the basic stats are requested or if the inode flags are requested and
>     have been collected previously, then the attributes will be reread if out
>     of date.
>
>  (3) Otherwise the cached attributes will be used - even if expired - without
>     reference to the server.
>
> Note that cifs_revalidate_dentry() will issue an extra operation to get the
> FILE_ALL_INFO in addition to the FILE_UNIX_BASIC_INFO if it needs to collect
> creation time and attributes on behalf of cifs_getattr().
>
> [NOTE: THIS PATCH IS UNTESTED!]
>
> Signed-off-by: David Howells <[hidden email]>
> ---
>
>  fs/cifs/cifsfs.h   |    4 +-
>  fs/cifs/cifsglob.h |   16 +++++--
>  fs/cifs/dir.c      |    2 -
>  fs/cifs/inode.c    |  120 +++++++++++++++++++++++++++++++++++++++++++++-------
>  4 files changed, 118 insertions(+), 24 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index d1389bb..021e327 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -56,9 +56,9 @@ extern int cifs_rmdir(struct inode *, struct dentry *);
>  extern int cifs_rename(struct inode *, struct dentry *, struct inode *,
>                       struct dentry *);
>  extern int cifs_revalidate_file_attr(struct file *filp);
> -extern int cifs_revalidate_dentry_attr(struct dentry *);
> +extern int cifs_revalidate_dentry_attr(struct dentry *, bool, bool);
>  extern int cifs_revalidate_file(struct file *filp);
> -extern int cifs_revalidate_dentry(struct dentry *);
> +extern int cifs_revalidate_dentry(struct dentry *, bool, bool);
>  extern int cifs_invalidate_mapping(struct inode *inode);
>  extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
>  extern int cifs_setattr(struct dentry *, struct iattr *);
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 4ff6313..d3567da 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -621,11 +621,15 @@ struct cifsInodeInfo {
>        /* BB add in lists for dirty pages i.e. write caching info for oplock */
>        struct list_head openFileList;
>        __u32 cifsAttrs; /* e.g. DOS archive bit, sparse, compressed, system */
> -       bool clientCanCacheRead;        /* read oplock */
> -       bool clientCanCacheAll;         /* read and writebehind oplock */
> -       bool delete_pending;            /* DELETE_ON_CLOSE is set */
> -       bool invalid_mapping;           /* pagecache is invalid */
> +       bool clientCanCacheRead:1;      /* read oplock */
> +       bool clientCanCacheAll:1;       /* read and writebehind oplock */
> +       bool delete_pending:1;          /* DELETE_ON_CLOSE is set */
> +       bool invalid_mapping:1;         /* pagecache is invalid */
> +       bool btime_valid:1;             /* stored creation time is valid */
> +       bool uid_faked:1;               /* true if i_uid is faked */
> +       bool gid_faked:1;               /* true if i_gid is faked */
>        unsigned long time;             /* jiffies of last update of inode */
> +       struct timespec btime;          /* creation time */
>        u64  server_eof;                /* current file size on server -- protected by i_lock */
>        u64  uniqueid;                  /* server inode number */
>        u64  createtime;                /* creation time on server */
> @@ -833,6 +837,9 @@ struct dfs_info3_param {
>  #define CIFS_FATTR_DELETE_PENDING      0x2
>  #define CIFS_FATTR_NEED_REVAL          0x4
>  #define CIFS_FATTR_INO_COLLISION       0x8
> +#define CIFS_FATTR_WINATTRS_VALID      0x10    /* T if cf_btime and cf_cifsattrs valid */
> +#define CIFS_FATTR_UID_FAKED           0x20    /* T if cf_uid is faked */
> +#define CIFS_FATTR_GID_FAKED           0x40    /* T if cf_gid is faked */
>
>  struct cifs_fattr {
>        u32             cf_flags;
> @@ -850,6 +857,7 @@ struct cifs_fattr {
>        struct timespec cf_atime;
>        struct timespec cf_mtime;
>        struct timespec cf_ctime;
> +       struct timespec cf_btime;
>  };
>
>  static inline void free_dfs_info_param(struct dfs_info3_param *param)
> diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
> index d172c8e..d9e03ae 100644
> --- a/fs/cifs/dir.c
> +++ b/fs/cifs/dir.c
> @@ -664,7 +664,7 @@ cifs_d_revalidate(struct dentry *direntry, struct nameidata *nd)
>                return -ECHILD;
>
>        if (direntry->d_inode) {
> -               if (cifs_revalidate_dentry(direntry))
> +               if (cifs_revalidate_dentry(direntry, false, false))
>                        return 0;
>                else {
>                        /*
> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> index 745da3d..662d5ce 100644
> --- a/fs/cifs/inode.c
> +++ b/fs/cifs/inode.c
> @@ -135,13 +135,21 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr)
>        set_nlink(inode, fattr->cf_nlink);
>        inode->i_uid = fattr->cf_uid;
>        inode->i_gid = fattr->cf_gid;
> +       if (fattr->cf_flags & CIFS_FATTR_UID_FAKED)
> +               cifs_i->uid_faked = true;
> +       if (fattr->cf_flags & CIFS_FATTR_GID_FAKED)
> +               cifs_i->gid_faked = true;
>
>        /* if dynperm is set, don't clobber existing mode */
>        if (inode->i_state & I_NEW ||
>            !(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM))
>                inode->i_mode = fattr->cf_mode;
>
> -       cifs_i->cifsAttrs = fattr->cf_cifsattrs;
> +       if (fattr->cf_flags & CIFS_FATTR_WINATTRS_VALID) {
> +               cifs_i->cifsAttrs = fattr->cf_cifsattrs;
> +               cifs_i->btime = fattr->cf_btime;
> +               cifs_i->btime_valid = true;
> +       }
>
>        if (fattr->cf_flags & CIFS_FATTR_NEED_REVAL)
>                cifs_i->time = 0;
> @@ -248,15 +256,19 @@ cifs_unix_basic_to_fattr(struct cifs_fattr *fattr, FILE_UNIX_BASIC_INFO *info,
>                break;
>        }
>
> -       if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID)
> +       if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) {
>                fattr->cf_uid = cifs_sb->mnt_uid;
> -       else
> +               fattr->cf_flags |= CIFS_FATTR_UID_FAKED;
> +       } else {
>                fattr->cf_uid = le64_to_cpu(info->Uid);
> +       }
>
> -       if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)
> +       if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) {
>                fattr->cf_gid = cifs_sb->mnt_gid;
> -       else
> +               fattr->cf_flags |= CIFS_FATTR_GID_FAKED;
> +       } else {
>                fattr->cf_gid = le64_to_cpu(info->Gid);
> +       }
>
>        fattr->cf_nlink = le64_to_cpu(info->Nlinks);
>  }
> @@ -283,7 +295,8 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct super_block *sb)
>        fattr->cf_ctime = CURRENT_TIME;
>        fattr->cf_mtime = CURRENT_TIME;
>        fattr->cf_nlink = 2;
> -       fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL;
> +       fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL |
> +               CIFS_FATTR_UID_FAKED | CIFS_FATTR_GID_FAKED;
>  }
>
>  int cifs_get_file_info_unix(struct file *filp)
> @@ -510,6 +523,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info,
>        struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
>
>        memset(fattr, 0, sizeof(*fattr));
> +       fattr->cf_flags = CIFS_FATTR_WINATTRS_VALID;
>        fattr->cf_cifsattrs = le32_to_cpu(info->Attributes);
>        if (info->DeletePending)
>                fattr->cf_flags |= CIFS_FATTR_DELETE_PENDING;
> @@ -521,6 +535,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info,
>
>        fattr->cf_ctime = cifs_NTtimeToUnix(info->ChangeTime);
>        fattr->cf_mtime = cifs_NTtimeToUnix(info->LastWriteTime);
> +       fattr->cf_btime = cifs_NTtimeToUnix(info->CreationTime);
>
>        if (adjust_tz) {
>                fattr->cf_ctime.tv_sec += tcon->ses->server->timeAdj;
> @@ -1724,7 +1739,8 @@ int cifs_revalidate_file_attr(struct file *filp)
>        return rc;
>  }
>
> -int cifs_revalidate_dentry_attr(struct dentry *dentry)
> +int cifs_revalidate_dentry_attr(struct dentry *dentry,
> +                               bool want_extra_bits, bool force)
>  {
>        int xid;
>        int rc = 0;
> @@ -1735,7 +1751,7 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry)
>        if (inode == NULL)
>                return -ENOENT;
>
> -       if (!cifs_inode_needs_reval(inode))
> +       if (!force && !cifs_inode_needs_reval(inode))
>                return rc;
>
>        xid = GetXid();
> @@ -1752,9 +1768,12 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry)
>                 "%ld jiffies %ld", full_path, inode, inode->i_count.counter,
>                 dentry, dentry->d_time, jiffies);
>
> -       if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext)
> +       if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext) {
>                rc = cifs_get_inode_info_unix(&inode, full_path, sb, xid);
> -       else
> +               if (rc != 0)
> +                       goto out;
> +       }
> +       if (!cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext || want_extra_bits)
>                rc = cifs_get_inode_info(&inode, full_path, NULL, sb,
>                                         xid, NULL);
>
> @@ -1779,12 +1798,13 @@ int cifs_revalidate_file(struct file *filp)
>  }
>
>  /* revalidate a dentry's inode attributes */
> -int cifs_revalidate_dentry(struct dentry *dentry)
> +int cifs_revalidate_dentry(struct dentry *dentry,
> +                          bool want_extra_bits, bool force)
>  {
>        int rc;
>        struct inode *inode = dentry->d_inode;
>
> -       rc = cifs_revalidate_dentry_attr(dentry);
> +       rc = cifs_revalidate_dentry_attr(dentry, want_extra_bits, force);
>        if (rc)
>                return rc;
>
> @@ -1796,11 +1816,30 @@ int cifs_revalidate_dentry(struct dentry *dentry)
>  int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
>                 struct kstat *stat)
>  {
> +       struct cifsInodeInfo *cifs_i = CIFS_I(dentry->d_inode);
>        struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb);
>        struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
>        struct inode *inode = dentry->d_inode;
> +       unsigned force = stat->query_flags & AT_FORCE_ATTR_SYNC;
> +       bool want_extra_bits = false;
> +       u32 info, ioc = 0;
> +       u32 attrs;
>        int rc;
>
> +       if (cifs_i->uid_faked)
> +               stat->request_mask &= ~XSTAT_UID;
> +       if (cifs_i->gid_faked)
> +               stat->request_mask &= ~XSTAT_GID;
> +
> +       if (stat->request_mask & XSTAT_BTIME && !cifs_i->btime_valid) {
> +               want_extra_bits = true;
> +               force = true;
> +       }
> +       if (stat->request_mask & XSTAT_IOC_FLAGS) {
> +               want_extra_bits = true;
> +               force = true;
> +       }
> +
>        /*
>         * We need to be sure that all dirty pages are written and the server
>         * has actual ctime, mtime and file length.
> @@ -1814,13 +1853,14 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
>                }
>        }
>
> -       rc = cifs_revalidate_dentry_attr(dentry);
> -       if (rc)
> -               return rc;
> +       if (force || stat->request_mask & XSTAT_BASIC_STATS) {
> +               rc = cifs_revalidate_dentry(dentry, want_extra_bits, force);
> +               if (rc)
> +                       return rc;
> +       }
>
>        generic_fillattr(inode, stat);
>        stat->blksize = CIFS_MAX_MSGSIZE;
> -       stat->ino = CIFS_I(inode)->uniqueid;
>
>        /*
>         * If on a multiuser mount without unix extensions, and the admin hasn't
> @@ -1834,7 +1874,53 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
>                if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID))
>                        stat->gid = current_fsgid();
>        }
> -       return rc;
> +
> +       info = XSTAT_INFO_REMOTE | XSTAT_INFO_NONSYSTEM_OWNERSHIP;
> +#ifdef CONFIG_CIFS_ACL
> +       info |= XSTAT_INFO_HAS_ACL;
> +#endif
> +
> +       if (cifs_i->btime_valid) {
> +               stat->btime = cifs_i->btime;
> +               stat->result_mask |= XSTAT_BTIME;
> +       }
> +
> +       /* We don't promise an inode number if we made one up */
> +       stat->ino = cifs_i->uniqueid;
> +       if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM))
> +               stat->result_mask &= ~XSTAT_INO;
> +
> +       /*
> +        * If on a multiuser mount without unix extensions, and the admin
> +        * hasn't overridden them, set the ownership to the fsuid/fsgid of the
> +        * current process.
> +        */
> +       if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER) &&
> +           !tcon->unix_ext) {
> +               if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID))
> +                       stat->uid = current_fsuid();
> +               if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID))
> +                       stat->gid = current_fsgid();
> +       }
> +       if (cifs_i->uid_faked)
> +               stat->result_mask &= ~XSTAT_UID;
> +       if (cifs_i->gid_faked)
> +               stat->result_mask &= ~XSTAT_GID;
> +
> +       attrs = cifs_i->cifsAttrs;
> +       if (attrs & ATTR_HIDDEN)        info |= XSTAT_INFO_HIDDEN;
> +       if (attrs & ATTR_SYSTEM)        info |= XSTAT_INFO_SYSTEM;
> +       if (attrs & ATTR_ARCHIVE)       info |= XSTAT_INFO_ARCHIVE;
> +       if (attrs & ATTR_TEMPORARY)     info |= XSTAT_INFO_TEMPORARY;
> +       if (attrs & ATTR_REPARSE)       info |= XSTAT_INFO_REPARSE_POINT;
> +       if (attrs & ATTR_OFFLINE)       info |= XSTAT_INFO_OFFLINE;
> +       if (attrs & ATTR_ENCRYPTED)     info |= XSTAT_INFO_ENCRYPTED;
> +       stat->information |= info;
> +
> +       if (attrs & ATTR_READONLY)      ioc |= FS_IMMUTABLE_FL;
> +       if (attrs & ATTR_COMPRESSED)    ioc |= FS_COMPR_FL;
> +       stat->ioc_flags |= ioc;
> +       return 0;
>  }
>
>  static int cifs_truncate_page(struct address_space *mapping, loff_t from)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Thanks,

Steve
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6] xstat: Ext4: Return extended attributes

Steve French-2
In reply to this post by David Howells
This patch reminds me of a question on time stamps - how can an
application query the time granularity ie sb_s_time_gran for a mount
(e.g. 1 second for some file systems, 100 nanoseconds for cifs/smb2, 1
nanosecond for others etc.)

On Thu, Apr 19, 2012 at 9:06 AM, David Howells <[hidden email]> wrote:

> Return extended attributes from the Ext4 filesystem.  This includes the
> following:
>
>  (1) The inode creation time (i_crtime) as i_btime.
>
>  (2) The inode i_generation as i_gen if not the root directory.
>
>  (3) The inode i_version as st_data_version if a file with I_VERSION set or a
>     directory.
>
>  (4) FS_xxx_FL flags are returned as for ioctl(FS_IOC_GETFLAGS).
>
> Signed-off-by: David Howells <[hidden email]>
> ---
>
>  fs/ext4/ext4.h    |    2 ++
>  fs/ext4/file.c    |    2 +-
>  fs/ext4/inode.c   |   32 +++++++++++++++++++++++++++++---
>  fs/ext4/namei.c   |    2 ++
>  fs/ext4/super.c   |    1 +
>  fs/ext4/symlink.c |    2 ++
>  6 files changed, 37 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index ab2594a..81806da 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1899,6 +1899,8 @@ extern int  ext4_getattr(struct vfsmount *mnt, struct dentry *dentry,
>                                struct kstat *stat);
>  extern void ext4_evict_inode(struct inode *);
>  extern void ext4_clear_inode(struct inode *);
> +extern int  ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry,
> +                             struct kstat *stat);
>  extern int  ext4_sync_inode(handle_t *, struct inode *);
>  extern void ext4_dirty_inode(struct inode *, int);
>  extern int ext4_change_inode_journal_flag(struct inode *, int);
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index cb70f18..ae8654c 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -249,7 +249,7 @@ const struct file_operations ext4_file_operations = {
>
>  const struct inode_operations ext4_file_inode_operations = {
>        .setattr        = ext4_setattr,
> -       .getattr        = ext4_getattr,
> +       .getattr        = ext4_file_getattr,
>  #ifdef CONFIG_EXT4_FS_XATTR
>        .setxattr       = generic_setxattr,
>        .getxattr       = generic_getxattr,
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c77b0bd..eafc188 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4191,11 +4191,37 @@ err_out:
>  int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry,
>                 struct kstat *stat)
>  {
> -       struct inode *inode;
> -       unsigned long delalloc_blocks;
> +       struct inode *inode = dentry->d_inode;
> +       struct ext4_inode_info *ei = EXT4_I(inode);
> +
> +       stat->result_mask |= XSTAT_BTIME;
> +       stat->btime.tv_sec = ei->i_crtime.tv_sec;
> +       stat->btime.tv_nsec = ei->i_crtime.tv_nsec;
> +
> +       if (inode->i_ino != EXT4_ROOT_INO) {
> +               stat->result_mask |= XSTAT_GEN;
> +               stat->gen = inode->i_generation;
> +       }
> +       if (S_ISDIR(inode->i_mode) || IS_I_VERSION(inode)) {
> +               stat->result_mask |= XSTAT_VERSION;
> +               stat->version = inode->i_version;
> +       }
> +
> +       ext4_get_inode_flags(ei);
> +       stat->ioc_flags |= ei->i_flags & EXT4_FL_USER_VISIBLE;
> +       stat->result_mask |= XSTAT_IOC_FLAGS;
>
> -       inode = dentry->d_inode;
>        generic_fillattr(inode, stat);
> +       return 0;
> +}
> +
> +int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry,
> +                     struct kstat *stat)
> +{
> +       struct inode *inode = dentry->d_inode;
> +       u64 delalloc_blocks;
> +
> +       ext4_getattr(mnt, dentry, stat);
>
>        /*
>         * We can't update i_blocks if the block allocation is delayed
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 349d7b3..6162387 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2579,6 +2579,7 @@ const struct inode_operations ext4_dir_inode_operations = {
>        .mknod          = ext4_mknod,
>        .rename         = ext4_rename,
>        .setattr        = ext4_setattr,
> +       .getattr        = ext4_getattr,
>  #ifdef CONFIG_EXT4_FS_XATTR
>        .setxattr       = generic_setxattr,
>        .getxattr       = generic_getxattr,
> @@ -2591,6 +2592,7 @@ const struct inode_operations ext4_dir_inode_operations = {
>
>  const struct inode_operations ext4_special_inode_operations = {
>        .setattr        = ext4_setattr,
> +       .getattr        = ext4_getattr,
>  #ifdef CONFIG_EXT4_FS_XATTR
>        .setxattr       = generic_setxattr,
>        .getxattr       = generic_getxattr,
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index ceebaf8..2d395bf 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3040,6 +3040,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>        if (sb->s_magic != EXT4_SUPER_MAGIC)
>                goto cantfind_ext4;
>        sbi->s_kbytes_written = le64_to_cpu(es->s_kbytes_written);
> +       memcpy(sb->s_volume_id, es->s_uuid, sizeof(sb->s_volume_id));
>
>        /* Set defaults before we parse the mount options */
>        def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
> diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
> index ed9354a..d8fe7fb 100644
> --- a/fs/ext4/symlink.c
> +++ b/fs/ext4/symlink.c
> @@ -35,6 +35,7 @@ const struct inode_operations ext4_symlink_inode_operations = {
>        .follow_link    = page_follow_link_light,
>        .put_link       = page_put_link,
>        .setattr        = ext4_setattr,
> +       .getattr        = ext4_getattr,
>  #ifdef CONFIG_EXT4_FS_XATTR
>        .setxattr       = generic_setxattr,
>        .getxattr       = generic_getxattr,
> @@ -47,6 +48,7 @@ const struct inode_operations ext4_fast_symlink_inode_operations = {
>        .readlink       = generic_readlink,
>        .follow_link    = ext4_follow_link,
>        .setattr        = ext4_setattr,
> +       .getattr        = ext4_getattr,
>  #ifdef CONFIG_EXT4_FS_XATTR
>        .setxattr       = generic_setxattr,
>        .getxattr       = generic_getxattr,
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Thanks,

Steve
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/6] Extended file stat system call

Roland McGrath-4
In reply to this post by David Howells
I have no comment on the functionality.  But "xstat" is probably a poor
choice of name.  There is precedent for that function name with different
meaning in the userland APIs.  (It's a moderately useless meaning inherited
from SVR4, but regardless overloading a name previously used is unwise.)


Thanks,
Roland
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/6] Extended file stat system call

Steve French-2
In reply to this post by David Howells
On Thu, Apr 19, 2012 at 9:05 AM, David Howells <[hidden email]> wrote:
>
> Implement a pair of new system calls to provide extended and further extensible
 stat functions.
<snip>
> Should the default for a network fs be to do an unconditional (heavyweight)
> stat with a flag to suppress going to the server to update the locally held
> attributes and flushing pending writebacks?

Even though we can use leases (oplocks) to avoid the roundrtrip, it is
probably too expensive to default to forcing a cache flush, especially
when a common case is to get the file creation time or inode number
information (stable vs volatile).

Would it be better to make the stable vs volatile inode number an attribute
of the volume  or something returned by the proposed xstat?

> Should things like the Windows Archive, Hidden and System bits be handled
> through IOC flags, perhaps expanded to 64-bits?

Today I export these through an psuedo-xattr in cifs.ko, I am curious how
NTFS and FAT export these on linux.

> ==========
> TO BE DONE
> ==========
>
> Autofs, ntfs, btrfs, ...

Given the overlap in optional attributes between the network
protocol and local NTFS (and ReFS and to a lesser extent FAT)
I would expect cifs.ko and the ntfs implementations
info to map pretty closely.

> I should perhaps use u8/u32/u64 rather than uint8/32/64_t.
>
> Handle remote filesystems being offline and indicate this with
> XSTAT_INFO_OFFLINE.

You already have support for an indicator for offline files (HSM),
would XSTAT_INFO_OFFLINE be intended for the case
where the network session to the server is disconnected
(and in which you case the application does not want to reconnect)?

--
Thanks,

Steve
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/6] Extended file stat system call

Paul Eggert
In reply to this post by Roland McGrath-4
On 04/19/2012 09:32 AM, Roland McGrath wrote:
> I have no comment on the functionality.  But "xstat" is probably a poor
> choice of name.

In AIX 7.1 the (similar) function is called statxat instead of xstat.
The API isn't exactly the same, but it's the same basic idea.
Might be worth looking at, not merely to see whether the API
should be the same, but also to borrow good ideas even if not.

http://pic.dhe.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.basetechref/doc/basetrf2/statx.htm
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/6] Extended file stat system call

Roland McGrath-4
statx seems like a better family of names.  I also think it's worthwhile to
see if the interface can be made to more closely match the AIX precedent.


Thanks,
Roland
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/6] Extended file stat system call

Andreas Dilger-7
In reply to this post by David Howells
On 2012-04-19, at 8:05 AM, David Howells wrote:
> Implement a pair of new system calls to provide extended and further
> extensible stat functions.

Hallelujah for this.  I've been waiting/wanting something like this
for ages already.  Now if only we can get this landed before it
degrades into the mess it did last time.

> Should fxstat() be implemented as xstat() with a NULL filename,
> using dfd as fd?

I'm personally inclined toward fewer syscalls, especially since
the fstatxat()->statxat() mapping (if I can be so bold as to
prefer the names used later in this thread) is IMHO clear and
unambiguous, and avoids several thin wrappers in the kernel.

> Should the default for a network fs be to do an unconditional
> (heavyweight) stat with a flag to suppress going to the server
> to update the locally held attributes and flushing pending writebacks?

NOOOooo!  If application writers are going to use this, they should
request the information needed, and no more.  Make no assumptions
about what information is easy or hard for a filesystem to return,
since the overhead can vary wildly depending on the implementation.

Something like "ls --color" (no -l or -s) always stats the file just
to get the mode bits to color executable files differently.  Having
to return other information that isn't totally free almost ruins the
benefit of adding a new syscall in the first place.

> Should things like the Windows Archive, Hidden and System bits be
> handled through IOC flags, perhaps expanded to 64-bits?

I'm definitely in favour of a 64-bit IOC flags value, since they are
getting close to running out already.  As to whether those other bits
should be merged into the IOC flags, I'm mostly indifferent, but I
lean toward including them since they are definitely related.

I wouldn't object to 64-bit UID/GID values or split 32-bit low/hi UID
and GID values, since NFSv4 and Kerberos realms will likely need this
at some point as well.  That said, if the API is extensible, it would
be just as easy to add the low/hi split values when they are needed
in the future.

> Are these things useful to userspace other than Samba and userspace
> NFS servers?

Definitely yes.  The GNU fileutils can use a lot of this, since they
are VERY stat() heavy for things like checking st_dev and st_ino
changes during directory traversal, but don't need any of the other
info.

> Is it useful to pass the volume ID out?  Or is statfs() sufficient
> for this?

Can't hurt, IMHO.  It is a better (more persistent) identifier than
st_dev, and if it is free, or explicitly requested by the application
(Samba, Ganesha, etc) it can be very useful.

Cheers, Andreas





Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available

Andreas Dilger-7
In reply to this post by David Howells
On 2012-04-19, at 8:06 AM, David Howells wrote:
> Add a pair of system calls to make extended file stats available,
> including file creation time, inode version and data version where available through the underlying filesystem.
>
> The idea was initially proposed as a set of xattrs that could be
> retrieved with getxattr(), but the general preferance proved to be
> for new syscalls with an extended stat structure.

I would comment that it was the opposite.  It was originally a
stat()-like extension that degraded into a messy getxattr() mess.

> (2) Lightweight stat: Ask for just those details of interest, and
>     allow a netfs (such as NFS) to approximate anything not of
>     interest, possibly without going to the server [Trond Myklebust,
>     Ulrich Drepper].

This was my original motivation for this functionality, so you can
put my name here also.

> The fields in struct xstat come in a number of classes:
>
> (0) st_dev, st_blksize, st_information.
>
>     These are local data and are always available.

For the extra two bits it would cost us, I don't think st_blksize
and st_information should always be returned.  st_blksize may be
variable for a distributed filesystem, and some of the fields in
st_information (offline) may not be free to access either.

Cheers, Andreas





Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available

J. Bruce Fields
In reply to this post by David Howells
On Thu, Apr 19, 2012 at 03:06:12PM +0100, David Howells wrote:

> Add a pair of system calls to make extended file stats available, including
> file creation time, inode version and data version where available through the
> underlying filesystem.
>
> The idea was initially proposed as a set of xattrs that could be retrieved with
> getxattr(), but the general preferance proved to be for new syscalls with an
> extended stat structure.
>
> This has a number of uses:
>
>  (1) Creation time: The SMB protocol carries the creation time, which could be
>      exported by Samba, which will in turn help CIFS make use of FS-Cache as
>      that can be used for coherency data.
>
>      This is also specified in NFSv4 as a recommended attribute and could be
>      exported by NFSD [Steve French].
>
>  (2) Lightweight stat: Ask for just those details of interest, and allow a
>      netfs (such as NFS) to approximate anything not of interest, possibly
>      without going to the server [Trond Myklebust, Ulrich Drepper].
>
>  (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its
>      cached attributes are up to date [Trond Myklebust].
>
>  (4) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd
>      Schubert].
>
>  (5) Data version number: Could be used by userspace NFS servers [Aneesh Kumar].
>
>      Can also be used to modify fill_post_wcc() in NFSD which retrieves
>      i_version directly, but has just called vfs_getattr().  It could get it
>      from the kstat struct if it used vfs_xgetattr() instead.
>
>  (6) BSD stat compatibility: Including more fields from the BSD stat such as
>      creation time (st_btime) and inode generation number (st_gen) [Jeremy
>      Allison, Bernd Schubert].
>
>  (7) Extra coherency data may be useful in making backups [Andreas Dilger].
>
>  (8) Allow the filesystem to indicate what it can/cannot provide: A filesystem
>      can now say it doesn't support a standard stat feature if that isn't
>      available, so if, for instance, inode numbers or UIDs don't exist...
>
>  (9) Make the fields a consistent size on all arches and make them large.
>
> (10) Store a 16-byte volume ID in the superblock that can be returned in struct
>      xstat [Steve French].
>
> (11) Include granularity fields in the time data to indicate the granularity of
>      each of the times (NFSv4 time_delta) [Steve French].

It looks like you're including this with *each* time?  But surely
there's no filesystem with different granularity (say) for ctime than
for mtime.  Also, nfsd will want only one time_delta, not one for each
time.

Note also we need to document carefully what this means: I think it
should be the granularity that the filesystem is capable of
representing, but people are sometimes surprised to find out that the
actual time source is usually more coarse-grained than that.

--b.

>
> (12) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
>
> (13) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
>      Michael Kerrisk].
>
> (14) Spare space, request flags and information flags are provided for future
>      expansion.
>
>
> The following structures are defined for the use of these new system calls:
>
> struct xstat_dev {
> uint32_t major, minor;
> };
>
> struct xstat_time {
> uint64_t tv_sec;
> uint32_t tv_nsec;
> uint32_t tv_granularity;
> };
>
> struct xstat {
> uint32_t st_mask;
> uint32_t st_mode;
> uint32_t st_nlink;
> uint32_t st_uid;
> uint32_t st_gid;
> uint32_t st_information;
> uint32_t st_ioc_flags;
> uint32_t st_blksize;
> struct xstat_dev st_rdev;
> struct xstat_dev st_dev;
> struct xstat_time st_atime;
> struct xstat_time st_btime;
> struct xstat_time st_ctime;
> struct xstat_time st_mtime;
> uint64_t st_ino;
> uint64_t st_size;
> uint64_t st_blocks;
> uint64_t st_gen;
> uint64_t st_version;
> uint8_t st_volume_id[16];
> uint64_t __spares[11];
> };
>
> where st_information is local system information about the file, st_btime is
> the file creation time, st_gen is the inode generation (i_generation),
> st_data_version is the data version number (i_version), st_ioc_flags is the
> flags from FS_IOC_GETFLAGS, st_volume_id is where the volume identified is
> stored, st_result_mask is a bitmask indicating the data provided and __spares[]
> are where as-yet undefined fields can be placed.
>
> The defined bits in request_mask and st_mask are:
>
> XSTAT_MODE Want/got st_mode
> XSTAT_NLINK Want/got st_nlink
> XSTAT_UID Want/got st_uid
> XSTAT_GID Want/got st_gid
> XSTAT_RDEV Want/got st_rdev
> XSTAT_ATIME Want/got st_atime
> XSTAT_MTIME Want/got st_mtime
> XSTAT_CTIME Want/got st_ctime
> XSTAT_INO Want/got st_ino
> XSTAT_SIZE Want/got st_size
> XSTAT_BLOCKS Want/got st_blocks
> XSTAT_BASIC_STATS [The stuff in the normal stat struct]
> XSTAT_IOC_FLAGS Want/got FS_IOC_GETFLAGS
> XSTAT_BTIME Want/got st_btime
> XSTAT_GEN Want/got st_gen
> XSTAT_VERSION Want/got st_data_version
> XSTAT_VOLUME_ID Want/got st_volume_id
> XSTAT_ALL_STATS [All currently available stuff]
>
> The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra flags
> that might be supplied by the filesystem.  Note that Ext4 returns flags outside
> of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS.  Should
> {EXT4,FS}_FL_USER_VISIBLE be extended to cover them?  Or should the extra flags
> be suppressed?
>
> The defined bits in the st_information field give local system data on a file,
> how it is accessed, where it is and what it does:
>
> XSTAT_INFO_ENCRYPTED File is encrypted
> XSTAT_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted)
> XSTAT_INFO_FABRICATED File was made up by filesystem
> XSTAT_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs)
> XSTAT_INFO_REMOTE File is remote
> XSTAT_INFO_OFFLINE File is offline (CIFS)
> XSTAT_INFO_AUTOMOUNT Dir is automount trigger
> XSTAT_INFO_AUTODIR Dir provides unlisted automounts
> XSTAT_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details
> XSTAT_INFO_HAS_ACL File has an ACL of some sort
> XSTAT_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS)
> XSTAT_INFO_HIDDEN File is marked hidden (DOS+)
> XSTAT_INFO_SYSTEM File is marked system (DOS+)
> XSTAT_INFO_ARCHIVE File is marked archive (DOS+)
>
> These are for the use of GUI tools that might want to mark files specially,
> depending on what they are.  I've tried not to provide overlap with
> st_ioc_flags where something usable exists there.  Should Hidden, System and
> Archive flags be associated with ioc_flags, perhaps with ioc_flags extended to
> 64-bits?
>
>
> The system calls are:
>
> ssize_t ret = xstat(int dfd,
>    const char *filename,
>    unsigned int flags,
>    unsigned int mask,
>    struct xstat *buffer);
>
> ssize_t ret = fxstat(unsigned fd,
>     unsigned int flags,
>     unsigned int mask,
>     struct xstat *buffer);
>
>
> The dfd, filename, flags and fd parameters indicate the file to query.  There
> is no equivalent of lstat() as that can be emulated with xstat() by passing
> AT_SYMLINK_NOFOLLOW in flags.
>
> AT_FORCE_ATTR_SYNC can also be set in flags.  This will require a network
> filesystem to synchronise its attributes with the server.
>
> mask is a bitmask indicating the fields in struct xstat that are of interest to
> the caller.  The user should set this to XSTAT__BASIC_STATS to get the
> basic set returned by stat().
>
> Should there just be one xstat() syscall that does fxstat() if filename is NULL?
>
> The fields in struct xstat come in a number of classes:
>
>  (0) st_dev, st_blksize, st_information.
>
>      These are local data and are always available.
>
>  (1) st_mode, st_nlinks, st_uid, st_gid, st_[amc]time, st_ino, st_size,
>      st_blocks.
>
>      These will be returned whether the caller asks for them or not.  The
>      corresponding bits in result_mask will be set to indicate their presence.
>
>      If the caller didn't ask for them, then they may be approximated.  For
>      example, NFS won't waste any time updating them from the server, unless as
>      a byproduct of updating something requested.
>
>      If the values don't actually exist for the underlying object (such as UID
>      or GID on a DOS file), then the bit won't be set in the result_mask, even
>      if the caller asked for the value and the returned value will be a
>      fabrication.
>
>  (2) st_rdev.
>
>      As for class (1), but this won't be returned if the file is not a blockdev
>      or chardev.  The bit will be cleared if the value is not returned.
>
>  (3) File creation time (st_btime), inode generation (st_gen), data version
>      (st_version), volume_id (st_volume_id) and inode flags (st_ioc_flags).
>
>      These will be returned if available whether the caller asked for them or
>      not.  The corresponding bits in result_mask will be set or cleared as
>      appropriate to indicate their presence.
>
>      If the caller didn't ask for them, then they may be approximated.  For
>      example, NFS won't waste any time updating them from the server, unless
>      as a byproduct of updating something requested.
>
> At the moment, this will only work on x86_64 and i386 as it requires system
> calls to be wired up.
>
>
> =======
> TESTING
> =======
>
> The following test program can be used to test the xstat system call:
>
> /* Test the xstat() system call
> *
> * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved.
> * Written by David Howells ([hidden email])
> *
> * This program is free software; you can redistribute it and/or
> * modify it under the terms of the GNU General Public Licence
> * as published by the Free Software Foundation; either version
> * 2 of the Licence, or (at your option) any later version.
> */
>
> #define _GNU_SOURCE
> #define _ATFILE_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <time.h>
> #include <sys/syscall.h>
> #include <sys/stat.h>
> #include <sys/types.h>
>
> #define AT_NO_AUTOMOUNT 0x800
> #define AT_FORCE_ATTR_SYNC 0x2000
>
> #define XSTAT_MODE 0x00000001U
> #define XSTAT_NLINK 0x00000002U
> #define XSTAT_UID 0x00000004U
> #define XSTAT_GID 0x00000008U
> #define XSTAT_RDEV 0x00000010U
> #define XSTAT_ATIME 0x00000020U
> #define XSTAT_MTIME 0x00000040U
> #define XSTAT_CTIME 0x00000080U
> #define XSTAT_INO 0x00000100U
> #define XSTAT_SIZE 0x00000200U
> #define XSTAT_BLOCKS 0x00000400U
> #define XSTAT_BASIC_STATS 0x000007ffU
> #define XSTAT_BTIME 0x00000800U
> #define XSTAT_GEN 0x00001000U
> #define XSTAT_VERSION 0x00002000U
> #define XSTAT_IOC_FLAGS 0x00004000U
> #define XSTAT_VOLUME_ID 0x00008000U
> #define XSTAT_ALL_STATS 0x0000ffffU
>
> struct xstat_dev {
> uint32_t major;
> uint32_t minor;
> };
>
> struct xstat_time {
> uint64_t tv_sec;
> uint32_t tv_nsec;
> uint32_t tv_granularity;
> };
>
> struct xstat {
> uint32_t st_mask;
> uint32_t st_mode;
> uint32_t st_nlink;
> uint32_t st_uid;
> uint32_t st_gid;
> uint32_t st_information;
> uint32_t st_ioc_flags;
> uint32_t st_blksize;
> struct xstat_dev st_rdev;
> struct xstat_dev st_dev;
> struct xstat_time st_atim;
> struct xstat_time st_btim;
> struct xstat_time st_ctim;
> struct xstat_time st_mtim;
> uint64_t st_ino;
> uint64_t st_size;
> uint64_t st_blksize;
> uint64_t st_blocks;
> uint64_t st_gen;
> uint64_t st_version;
> uint64_t st_volume_id[16];
> uint64_t st_spares[11];
> };
>
> #define XSTAT_INFO_ENCRYPTED 0x00000001U
> #define XSTAT_INFO_TEMPORARY 0x00000002U
> #define XSTAT_INFO_FABRICATED 0x00000004U
> #define XSTAT_INFO_KERNEL_API 0x00000008U
> #define XSTAT_INFO_REMOTE 0x00000010U
> #define XSTAT_INFO_OFFLINE 0x00000020U
> #define XSTAT_INFO_AUTOMOUNT 0x00000040U
> #define XSTAT_INFO_AUTODIR 0x00000080U
> #define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U
> #define XSTAT_INFO_HAS_ACL 0x00000200U
> #define XSTAT_INFO_REPARSE_POINT 0x00000400U
> #define XSTAT_INFO_HIDDEN 0x00000800U
> #define XSTAT_INFO_SYSTEM 0x00001000U
> #define XSTAT_INFO_ARCHIVE 0x00002000U
>
> #define __NR_xstat 312
> #define __NR_fxstat 313
>
> static __attribute__((unused))
> ssize_t xstat(int dfd, const char *filename, unsigned flags,
>      unsigned int mask, struct xstat *buffer)
> {
> return syscall(__NR_xstat, dfd, filename, flags, mask, buffer);
> }
>
> static __attribute__((unused))
> ssize_t fxstat(int fd, unsigned flags,
>       unsigned int mask, struct xstat *buffer)
> {
> return syscall(__NR_fxstat, fd, flags, mask, buffer);
> }
>
> static void print_time(const char *field, const struct xstat_time *xstm)
> {
> struct tm tm;
> time_t tim;
> char buffer[100];
> int len;
>
> tim = xstm->tv_sec;
> if (!localtime_r(&tim, &tm)) {
> perror("localtime_r");
> exit(1);
> }
> len = strftime(buffer, 100, "%F %T", &tm);
> if (len == 0) {
> perror("strftime");
> exit(1);
> }
> printf("%s", field);
> fwrite(buffer, 1, len, stdout);
> printf(".%09u", xstm->tv_nsec);
> len = strftime(buffer, 100, "%z", &tm);
> if (len == 0) {
> perror("strftime2");
> exit(1);
> }
> fwrite(buffer, 1, len, stdout);
> printf("\n");
> }
>
> static void dump_xstat(struct xstat *xst)
> {
> char buffer[256], ft;
>
> printf("results=%x\n", xst->st_mask);
>
> printf(" ");
> if (xst->st_mask & XSTAT_SIZE)
> printf(" Size: %-15llu", (unsigned long long) xst->st_size);
> if (xst->st_mask & XSTAT_BLOCKS)
> printf(" Blocks: %-10llu", (unsigned long long) xst->st_blocks);
> printf(" IO Block: %-6llu ", (unsigned long long) xst->st_blksize);
> if (xst->st_mask & XSTAT_MODE) {
> switch (xst->st_mode & S_IFMT) {
> case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break;
> case S_IFCHR: printf(" character special file\n"); ft = 'c'; break;
> case S_IFDIR: printf(" directory\n"); ft = 'd'; break;
> case S_IFBLK: printf(" block special file\n"); ft = 'b'; break;
> case S_IFREG: printf(" regular file\n"); ft = '-'; break;
> case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break;
> case S_IFSOCK: printf(" socket\n"); ft = 's'; break;
> default:
> printf("unknown type (%o)\n", xst->st_mode & S_IFMT);
> ft = '?';
> break;
> }
> }
>
> sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor);
> printf("Device: %-15s", buffer);
> if (xst->st_mask & XSTAT_INO)
> printf(" Inode: %-11llu", (unsigned long long) xst->st_ino);
> if (xst->st_mask & XSTAT_SIZE)
> printf(" Links: %-5u", xst->st_nlink);
> if (xst->st_mask & XSTAT_RDEV)
> printf(" Device type: %u,%u",
>       xst->st_rdev.major, xst->st_rdev.minor);
> printf("\n");
>
> if (xst->st_mask & XSTAT_MODE)
> printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c)  ",
>       xst->st_mode & 07777,
>       ft,
>       xst->st_mode & S_IRUSR ? 'r' : '-',
>       xst->st_mode & S_IWUSR ? 'w' : '-',
>       xst->st_mode & S_IXUSR ? 'x' : '-',
>       xst->st_mode & S_IRGRP ? 'r' : '-',
>       xst->st_mode & S_IWGRP ? 'w' : '-',
>       xst->st_mode & S_IXGRP ? 'x' : '-',
>       xst->st_mode & S_IROTH ? 'r' : '-',
>       xst->st_mode & S_IWOTH ? 'w' : '-',
>       xst->st_mode & S_IXOTH ? 'x' : '-');
> if (xst->st_mask & XSTAT_UID)
> printf("Uid: %d   \n", xst->st_uid);
> if (xst->st_mask & XSTAT_GID)
> printf("Gid: %u\n", xst->st_gid);
>
> if (xst->st_mask & XSTAT_ATIME)
> print_time("Access: ", &xst->st_atim);
> if (xst->st_mask & XSTAT_MTIME)
> print_time("Modify: ", &xst->st_mtim);
> if (xst->st_mask & XSTAT_CTIME)
> print_time("Change: ", &xst->st_ctim);
> if (xst->st_mask & XSTAT_BTIME)
> print_time("Create: ", &xst->st_btim);
>
> if (xst->st_mask & XSTAT_GEN)
> printf("Inode version: %llxh\n", (unsigned long long) xst->st_gen);
> if (xst->st_mask & XSTAT_VERSION)
> printf("Data version: %llxh\n", (unsigned long long) xst->st_version);
>
> if (xst->st_mask & XSTAT_IOC_FLAGS) {
> unsigned char bits;
> int loop, byte;
>
> static char flag_representation[32 + 1] =
> /* FS_IOC_GETFLAGS flags: */
> "????????" /* 31-24 0x00000000-ff000000  */
> "????ehTD" /* 23-16 0x00000000-00ff0000  */
> "tj?IE?XZ" /* 15- 8 0x00000000-0000ff00  */
> "AdaiScus" /*  7- 0 0x00000000-000000ff */
> ;
>
> printf("Inode flags: %08x (", xst->st_ioc_flags);
> for (byte = 32 - 8; byte >= 0; byte -= 8) {
> bits = xst->st_ioc_flags >> byte;
> for (loop = 7; loop >= 0; loop--) {
> int bit = byte + loop;
>
> if (bits & 0x80)
> putchar(flag_representation[31 - bit]);
> else
> putchar('-');
> bits <<= 1;
> }
> if (byte)
> putchar(' ');
> }
> printf(")\n");
> }
>
> if (xst->st_information) {
> unsigned char bits;
> int loop, byte;
>
> static char info_representation[32 + 1] =
> /* XSTAT_INFO_ flags: */
> "????????" /* 31-24 0x00000000-ff000000  */
> "????????" /* 23-16 0x00000000-00ff0000  */
> "??ASHRan" /* 15- 8 0x00000000-0000ff00  */
> "dmorkfte" /*  7- 0 0x00000000-000000ff */
> ;
>
> printf("Information: %08x (", xst->st_information);
> for (byte = 32 - 8; byte >= 0; byte -= 8) {
> bits = xst->st_information >> byte;
> for (loop = 7; loop >= 0; loop--) {
> int bit = byte + loop;
>
> if (bits & 0x80)
> putchar(info_representation[31 - bit]);
> else
> putchar('-');
> bits <<= 1;
> }
> if (byte)
> putchar(' ');
> }
> printf(")\n");
> }
>
> if (xst->st_mask & XSTAT_VOLUME_ID) {
> int loop;
> printf("Volume ID: ");
> for (loop = 0; loop < sizeof(xst->st_volume_id); loop++) {
> printf("%02x", xst->st_volume_id[loop]);
> if (loop == 7)
> printf("-");
> }
> printf("\n");
> }
> }
>
> void dump_hex(unsigned long long *data, int from, int to)
> {
> unsigned offset, print_offset = 1, col = 0;
>
> from /= 8;
> to = (to + 7) / 8;
>
> for (offset = from; offset < to; offset++) {
> if (print_offset) {
> printf("%04x: ", offset * 8);
> print_offset = 0;
> }
> printf("%016llx", data[offset]);
> col++;
> if ((col & 3) == 0) {
> printf("\n");
> print_offset = 1;
> } else {
> printf(" ");
> }
> }
>
> if (!print_offset)
> printf("\n");
> }
>
> int main(int argc, char **argv)
> {
> struct xstat xst;
> int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
>
> unsigned int mask = XSTAT_ALL_STATS;
>
> for (argv++; *argv; argv++) {
> if (strcmp(*argv, "-F") == 0) {
> atflag |= AT_FORCE_ATTR_SYNC;
> continue;
> }
> if (strcmp(*argv, "-L") == 0) {
> atflag &= ~AT_SYMLINK_NOFOLLOW;
> continue;
> }
> if (strcmp(*argv, "-O") == 0) {
> mask &= ~XSTAT_BASIC_STATS;
> continue;
> }
> if (strcmp(*argv, "-A") == 0) {
> atflag |= AT_NO_AUTOMOUNT;
> continue;
> }
> if (strcmp(*argv, "-R") == 0) {
> raw = 1;
> continue;
> }
>
> memset(&xst, 0xbf, sizeof(xst));
> ret = xstat(AT_FDCWD, *argv, atflag, mask, &xst);
> printf("xstat(%s) = %d\n", *argv, ret);
> if (ret < 0) {
> perror(*argv);
> exit(1);
> }
>
> if (raw)
> dump_hex((unsigned long long *)&xst, 0, sizeof(xst));
>
> dump_xstat(&xst);
> }
> return 0;
> }
>
> Just compile and run, passing it paths to the files you want to examine:
>
> [root@andromeda ~]# /tmp/xstat /proc/$$
> xstat(/proc/2074) = 160
> results=47ef
>  Size: 0               Blocks: 0          IO Block: 1024    directory
> Device: 00:03           Inode: 9072        Links: 7
> Access: (0555/dr-xr-xr-x)  Uid: 0
> Gid: 0
> Access: 2010-07-14 16:50:46.609336272+0100
> Modify: 2010-07-14 16:50:46.609336272+0100
> Change: 2010-07-14 16:50:46.609336272+0100
> Inode flags: 0000000100000000 (-------- -------- -------- -------S -------- -------- -------- --------)
> [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm
> xstat(/afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm) = 160
> results=77ef
>  Size: 5413882         Blocks: 0          IO Block: 4096    regular file
> Device: 00:15           Inode: 2288        Links: 1
> Access: (0644/-rw-r--r--)  Uid: 75338
> Gid: 0
> Access: 2008-11-05 19:47:22.000000000+0000
> Modify: 2008-11-05 19:47:22.000000000+0000
> Change: 2008-11-05 19:47:22.000000000+0000
> Inode version: 795h
> Data version: 2h
> Inode flags: 0000000800000000 (-------- -------- -------- ----r--- -------- -------- -------- --------)
>
> Signed-off-by: David Howells <[hidden email]>
> ---
>
>  arch/x86/syscalls/syscall_32.tbl |    2
>  arch/x86/syscalls/syscall_64.tbl |    2
>  fs/stat.c                        |  350 +++++++++++++++++++++++++++++++++++---
>  include/linux/fcntl.h            |    1
>  include/linux/fs.h               |    4
>  include/linux/stat.h             |  126 +++++++++++++-
>  include/linux/syscalls.h         |    7 +
>  7 files changed, 461 insertions(+), 31 deletions(-)
>
> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
> index 29f9f05..980eb5a 100644
> --- a/arch/x86/syscalls/syscall_32.tbl
> +++ b/arch/x86/syscalls/syscall_32.tbl
> @@ -355,3 +355,5 @@
>  346 i386 setns sys_setns
>  347 i386 process_vm_readv sys_process_vm_readv compat_sys_process_vm_readv
>  348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev
> +349 i386 xstat sys_xstat
> +350 i386 fxstat sys_fxstat
> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
> index dd29a9e..7ae24bb 100644
> --- a/arch/x86/syscalls/syscall_64.tbl
> +++ b/arch/x86/syscalls/syscall_64.tbl
> @@ -318,6 +318,8 @@
>  309 common getcpu sys_getcpu
>  310 64 process_vm_readv sys_process_vm_readv
>  311 64 process_vm_writev sys_process_vm_writev
> +312 common xstat sys_xstat
> +313 common fxstat sys_fxstat
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
>  # for native 64-bit operation.
> diff --git a/fs/stat.c b/fs/stat.c
> index c733dc5..af3ef33 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -18,8 +18,20 @@
>  #include <asm/uaccess.h>
>  #include <asm/unistd.h>
>  
> +/**
> + * generic_fillattr - Fill in the basic attributes from the inode struct
> + * @inode: Inode to use as the source
> + * @stat: Where to fill in the attributes
> + *
> + * Fill in the basic attributes in the kstat structure from data that's to be
> + * found on the VFS inode structure.  This is the default if no getattr inode
> + * operation is supplied.
> + */
>  void generic_fillattr(struct inode *inode, struct kstat *stat)
>  {
> + struct super_block *sb = inode->i_sb;
> + u32 x;
> +
>   stat->dev = inode->i_sb->s_dev;
>   stat->ino = inode->i_ino;
>   stat->mode = inode->i_mode;
> @@ -27,17 +39,61 @@ void generic_fillattr(struct inode *inode, struct kstat *stat)
>   stat->uid = inode->i_uid;
>   stat->gid = inode->i_gid;
>   stat->rdev = inode->i_rdev;
> - stat->size = i_size_read(inode);
> - stat->atime = inode->i_atime;
>   stat->mtime = inode->i_mtime;
>   stat->ctime = inode->i_ctime;
> - stat->blksize = (1 << inode->i_blkbits);
> + stat->size = i_size_read(inode);
>   stat->blocks = inode->i_blocks;
> -}
> + stat->blksize = (1 << inode->i_blkbits);
>  
> + stat->result_mask |= XSTAT_BASIC_STATS & ~XSTAT_RDEV;
> + if (IS_NOATIME(inode))
> + stat->result_mask &= ~XSTAT_ATIME;
> + else
> + stat->atime = inode->i_atime;
> +
> + if (S_ISREG(stat->mode) && stat->nlink == 0)
> + stat->information |= XSTAT_INFO_TEMPORARY;
> + if (IS_AUTOMOUNT(inode))
> + stat->information |= XSTAT_INFO_AUTOMOUNT;
> + if (IS_POSIXACL(inode))
> + stat->information |= XSTAT_INFO_HAS_ACL;
> +
> + /* if unset, assume 1s granularity */
> + stat->tv_granularity = sb->s_time_gran ?: 1000000000U;
> +
> + if (unlikely(S_ISBLK(stat->mode) || S_ISCHR(stat->mode)))
> + stat->result_mask |= XSTAT_RDEV;
> +
> + x  = ((u32*)&stat->volume_id)[0] = ((u32*)&sb->s_volume_id)[0];
> + x |= ((u32*)&stat->volume_id)[1] = ((u32*)&sb->s_volume_id)[1];
> + x |= ((u32*)&stat->volume_id)[2] = ((u32*)&sb->s_volume_id)[2];
> + x |= ((u32*)&stat->volume_id)[3] = ((u32*)&sb->s_volume_id)[3];
> + if (x)
> + stat->result_mask |= XSTAT_VOLUME_ID;
> +}
>  EXPORT_SYMBOL(generic_fillattr);
>  
> -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> +/**
> + * vfs_xgetattr - Get the basic and extra attributes of a file
> + * @mnt: The mountpoint to which the dentry belongs
> + * @dentry: The file of interest
> + * @stat: Where to return the statistics
> + *
> + * Ask the filesystem for a file's attributes.  The caller must have preset
> + * stat->request_mask and stat->query_flags to indicate what they want.
> + *
> + * If the file is remote, the filesystem can be forced to update the attributes
> + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags.
> + *
> + * Bits must have been set in stat->request_mask to indicate which attributes
> + * the caller wants retrieving.  Any such attribute not requested may be
> + * returned anyway, but the value may be approximate, and, if remote, may not
> + * have been synchronised with the server.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_xgetattr(struct vfsmount *mnt, struct dentry *dentry,
> + struct kstat *stat)
>  {
>   struct inode *inode = dentry->d_inode;
>   int retval;
> @@ -46,64 +102,184 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
>   if (retval)
>   return retval;
>  
> + stat->result_mask = 0;
> + stat->information = 0;
> + stat->ioc_flags = 0;
>   if (inode->i_op->getattr)
>   return inode->i_op->getattr(mnt, dentry, stat);
>  
>   generic_fillattr(inode, stat);
>   return 0;
>  }
> +EXPORT_SYMBOL(vfs_xgetattr);
>  
> +/**
> + * vfs_getattr - Get the basic attributes of a file
> + * @mnt: The mountpoint to which the dentry belongs
> + * @dentry: The file of interest
> + * @stat: Where to return the statistics
> + *
> + * Ask the filesystem for a file's attributes.  If remote, the filesystem isn't
> + * forced to update its files from the backing store.  Only the basic set of
> + * attributes will be retrieved; anyone wanting more must use vfs_getxattr(),
> + * as must anyone who wants to force attributes to be sync'd with the server.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> +{
> + stat->query_flags = 0;
> + stat->request_mask = XSTAT_BASIC_STATS;
> + return vfs_xgetattr(mnt, dentry, stat);
> +}
>  EXPORT_SYMBOL(vfs_getattr);
>  
> -int vfs_fstat(unsigned int fd, struct kstat *stat)
> +/**
> + * vfs_fxstat - Get basic and extra attributes by file descriptor
> + * @fd: The file descriptor refering to the file of interest
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_xgetattr().  The main difference is
> + * that it uses a file descriptor to determine the file location.
> + *
> + * The caller must have preset stat->query_flags and stat->request_mask as for
> + * vfs_xgetattr().
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_fxstat(unsigned int fd, struct kstat *stat)
>  {
>   struct file *f = fget(fd);
>   int error = -EBADF;
>  
> + if (stat->query_flags & ~KSTAT_QUERY_FLAGS)
> + return -EINVAL;
>   if (f) {
> - error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat);
> + error = vfs_xgetattr(f->f_path.mnt, f->f_path.dentry, stat);
>   fput(f);
>   }
>   return error;
>  }
> +EXPORT_SYMBOL(vfs_fxstat);
> +
> +/**
> + * vfs_fstat - Get basic attributes by file descriptor
> + * @fd: The file descriptor refering to the file of interest
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_getattr().  The main difference is
> + * that it uses a file descriptor to determine the file location.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_fstat(unsigned int fd, struct kstat *stat)
> +{
> + stat->query_flags = 0;
> + stat->request_mask = XSTAT_BASIC_STATS;
> + return vfs_fxstat(fd, stat);
> +}
>  EXPORT_SYMBOL(vfs_fstat);
>  
> -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
> - int flag)
> +/**
> + * vfs_xstat - Get basic and extra attributes by filename
> + * @dfd: A file descriptor representing the base dir for a relative filename
> + * @filename: The name of the file of interest
> + * @flags: Flags to control the query
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_xgetattr().  The main difference is
> + * that it uses a filename and base directory to determine the file location.
> + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a
> + * symlink at the given name from being referenced.
> + *
> + * The caller must have preset stat->request_mask as for vfs_xgetattr().  The
> + * flags are also used to load up stat->query_flags.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_xstat(int dfd, const char __user *filename, int flags,
> +      struct kstat *stat)
>  {
>   struct path path;
> - int error = -EINVAL;
> - int lookup_flags = 0;
> + int error, lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
>  
> - if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
> -      AT_EMPTY_PATH)) != 0)
> - goto out;
> + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
> +      AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0)
> + return -EINVAL;
>  
> - if (!(flag & AT_SYMLINK_NOFOLLOW))
> - lookup_flags |= LOOKUP_FOLLOW;
> - if (flag & AT_EMPTY_PATH)
> + if (flags & AT_SYMLINK_NOFOLLOW)
> + lookup_flags &= ~LOOKUP_FOLLOW;
> + if (flags & AT_NO_AUTOMOUNT)
> + lookup_flags &= ~LOOKUP_AUTOMOUNT;
> + if (flags & AT_EMPTY_PATH)
>   lookup_flags |= LOOKUP_EMPTY;
>  
> + stat->query_flags = flags & KSTAT_QUERY_FLAGS;
>   error = user_path_at(dfd, filename, lookup_flags, &path);
> - if (error)
> - goto out;
> -
> - error = vfs_getattr(path.mnt, path.dentry, stat);
> - path_put(&path);
> -out:
> + if (!error) {
> + error = vfs_xgetattr(path.mnt, path.dentry, stat);
> + path_put(&path);
> + }
>   return error;
>  }
> +EXPORT_SYMBOL(vfs_xstat);
> +
> +/**
> + * vfs_fstatat - Get basic attributes by filename
> + * @dfd: A file descriptor representing the base dir for a relative filename
> + * @filename: The name of the file of interest
> + * @flags: Flags to control the query
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_xstat().  The difference is that it
> + * preselects basic stats only.  The flags are used to load up
> + * stat->query_flags in addition to indicating symlink handling during path
> + * resolution.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
> + int flags)
> +{
> + stat->request_mask = XSTAT_BASIC_STATS;
> + return vfs_xstat(dfd, filename, flags, stat);
> +}
>  EXPORT_SYMBOL(vfs_fstatat);
>  
> -int vfs_stat(const char __user *name, struct kstat *stat)
> +/**
> + * vfs_stat - Get basic attributes by filename
> + * @filename: The name of the file of interest
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_xstat().  The difference is that it
> + * preselects basic stats only, terminal symlinks are followed regardless and a
> + * remote filesystem can't be forced to query the server.  If such is desired,
> + * vfs_xstat() should be used instead.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_stat(const char __user *filename, struct kstat *stat)
>  {
> - return vfs_fstatat(AT_FDCWD, name, stat, 0);
> + stat->request_mask = XSTAT_BASIC_STATS;
> + return vfs_xstat(AT_FDCWD, filename, 0, stat);
>  }
>  EXPORT_SYMBOL(vfs_stat);
>  
> +/**
> + * vfs_stat - Get basic attributes by filename, without following terminal symlink
> + * @filename: The name of the file of interest
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_xstat().  The difference is that it
> + * preselects basic stats only, terminal symlinks are note followed regardless
> + * and a remote filesystem can't be forced to query the server.  If such is
> + * desired, vfs_xstat() should be used instead.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
>  int vfs_lstat(const char __user *name, struct kstat *stat)
>  {
> - return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW);
> + return vfs_xstat(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat);
>  }
>  EXPORT_SYMBOL(vfs_lstat);
>  
> @@ -118,7 +294,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
>  {
>   static int warncount = 5;
>   struct __old_kernel_stat tmp;
> -
> +
>   if (warncount > 0) {
>   warncount--;
>   printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n",
> @@ -143,7 +319,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
>  #if BITS_PER_LONG == 32
>   if (stat->size > MAX_NON_LFS)
>   return -EOVERFLOW;
> -#endif
> +#endif
>   tmp.st_size = stat->size;
>   tmp.st_atime = stat->atime.tv_sec;
>   tmp.st_mtime = stat->mtime.tv_sec;
> @@ -225,7 +401,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
>  #if BITS_PER_LONG == 32
>   if (stat->size > MAX_NON_LFS)
>   return -EOVERFLOW;
> -#endif
> +#endif
>   tmp.st_size = stat->size;
>   tmp.st_atime = stat->atime.tv_sec;
>   tmp.st_mtime = stat->mtime.tv_sec;
> @@ -412,6 +588,122 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename,
>  }
>  #endif /* __ARCH_WANT_STAT64 */
>  
> +/*
> + * Get the xstat parameters if supplied
> + */
> +static int xstat_get_params(unsigned int mask, struct xstat __user *buffer,
> +    struct kstat *stat)
> +{
> + memset(stat, 0xde, sizeof(*stat)); // DEBUGGING
> +
> + if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer)))
> + return -EFAULT;
> +
> + stat->request_mask = mask & XSTAT_ALL_STATS;
> + stat->result_mask = 0;
> + return 0;
> +}
> +
> +/*
> + * Set the xstat results.
> + *
> + * If the buffer size was 0, we just return the size of the buffer needed to
> + * return the full result.
> + *
> + * If bufsize indicates a buffer of insufficient size to hold the full result,
> + * we return -E2BIG.
> + *
> + * Otherwise we copy the extended stats to userspace and return the amount of
> + * data written into the buffer (or -EFAULT).
> + */
> +static long xstat_set_result(struct kstat *stat, struct xstat __user *buffer)
> +{
> + u32 mask = stat->result_mask, gran = stat->tv_granularity;
> +
> +#define __put_timestamp(kts, uts) ( \
> + __put_user(kts.tv_sec, uts.tv_sec ) || \
> + __put_user(kts.tv_nsec, uts.tv_nsec ) || \
> + __put_user(gran, uts.tv_granularity ))
> +
> + /* clear out anything we're not returning */
> + if (!(mask & XSTAT_IOC_FLAGS))
> + stat->ioc_flags = 0;
> + if (!(mask & XSTAT_BTIME))
> + memset(&stat->btime, 0, sizeof(stat->btime));
> + if (!(mask & XSTAT_GEN))
> + stat->gen = 0;
> + if (!(mask & XSTAT_VERSION))
> + stat->version = 0;
> + if (!(mask & XSTAT_VOLUME_ID))
> + memset(&stat->volume_id, 0, sizeof(stat->volume_id));
> +
> + /* transfer the results */
> + if (__put_user(mask, &buffer->st_mask ) ||
> +    __put_user(stat->mode, &buffer->st_mode ) ||
> +    __put_user(stat->nlink, &buffer->st_nlink ) ||
> +    __put_user(stat->uid, &buffer->st_uid ) ||
> +    __put_user(stat->gid, &buffer->st_gid ) ||
> +    __put_user(stat->information, &buffer->st_information ) ||
> +    __put_user(stat->ioc_flags, &buffer->st_ioc_flags ) ||
> +    __put_user(stat->blksize, &buffer->st_blksize ) ||
> +    __put_user(MAJOR(stat->rdev), &buffer->st_rdev.major ) ||
> +    __put_user(MINOR(stat->rdev), &buffer->st_rdev.minor ) ||
> +    __put_user(MAJOR(stat->dev), &buffer->st_dev.major ) ||
> +    __put_user(MINOR(stat->dev), &buffer->st_dev.minor ) ||
> +    __put_timestamp(stat->atime, &buffer->st_atime ) ||
> +    __put_timestamp(stat->btime, &buffer->st_btime ) ||
> +    __put_timestamp(stat->ctime, &buffer->st_ctime ) ||
> +    __put_timestamp(stat->mtime, &buffer->st_mtime ) ||
> +    __put_user(stat->ino, &buffer->st_ino ) ||
> +    __put_user(stat->size, &buffer->st_size ) ||
> +    __put_user(stat->blocks, &buffer->st_blocks ) ||
> +    __put_user(stat->gen, &buffer->st_gen ) ||
> +    __put_user(stat->version, &buffer->st_version ) ||
> +    __copy_to_user(&buffer->st_volume_id, &stat->volume_id,
> +   sizeof(buffer->st_volume_id) ) ||
> +    __clear_user(&buffer->__spares, sizeof(buffer->__spares)))
> + return -EFAULT;
> + return 0;
> +}
> +
> +/*
> + * System call to get extended stats by path
> + */
> +SYSCALL_DEFINE5(xstat,
> + int, dfd, const char __user *, filename, unsigned, flags,
> + unsigned int, mask, struct xstat __user *, buffer)
> +{
> + struct kstat stat;
> + int error;
> +
> + error = xstat_get_params(mask, buffer, &stat);
> + if (error != 0)
> + return error;
> + error = vfs_xstat(dfd, filename, flags, &stat);
> + if (error)
> + return error;
> + return xstat_set_result(&stat, buffer);
> +}
> +
> +/*
> + * System call to get extended stats by file descriptor
> + */
> +SYSCALL_DEFINE4(fxstat, unsigned int, fd, unsigned int, flags,
> + unsigned int, mask, struct xstat __user *, buffer)
> +{
> + struct kstat stat;
> + int error;
> +
> + error = xstat_get_params(mask, buffer, &stat);
> + if (error < 0)
> + return error;
> + stat.query_flags = flags;
> + error = vfs_fxstat(fd, &stat);
> + if (error)
> + return error;
> + return xstat_set_result(&stat, buffer);
> +}
> +
>  /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */
>  void __inode_add_bytes(struct inode *inode, loff_t bytes)
>  {
> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
> index f550f89..faa9e5d 100644
> --- a/include/linux/fcntl.h
> +++ b/include/linux/fcntl.h
> @@ -47,6 +47,7 @@
>  #define AT_SYMLINK_FOLLOW 0x400   /* Follow symbolic links.  */
>  #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
>  #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
> +#define AT_FORCE_ATTR_SYNC 0x2000 /* Force the attributes to be sync'd with the server */
>  
>  #ifdef __KERNEL__
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 8de6755..ec6c62e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1467,6 +1467,7 @@ struct super_block {
>  
>   char s_id[32]; /* Informational name */
>   u8 s_uuid[16]; /* UUID */
> + unsigned char s_volume_id[16]; /* Volume identifier */
>  
>   void *s_fs_info; /* Filesystem private info */
>   unsigned int s_max_links;
> @@ -2470,6 +2471,7 @@ extern const struct inode_operations page_symlink_inode_operations;
>  extern int generic_readlink(struct dentry *, char __user *, int);
>  extern void generic_fillattr(struct inode *, struct kstat *);
>  extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
> +extern int vfs_xgetattr(struct vfsmount *, struct dentry *, struct kstat *);
>  void __inode_add_bytes(struct inode *inode, loff_t bytes);
>  void inode_add_bytes(struct inode *inode, loff_t bytes);
>  void inode_sub_bytes(struct inode *inode, loff_t bytes);
> @@ -2482,6 +2484,8 @@ extern int vfs_stat(const char __user *, struct kstat *);
>  extern int vfs_lstat(const char __user *, struct kstat *);
>  extern int vfs_fstat(unsigned int, struct kstat *);
>  extern int vfs_fstatat(int , const char __user *, struct kstat *, int);
> +extern int vfs_xstat(int, const char __user *, int, struct kstat *);
> +extern int vfs_xfstat(unsigned int, struct kstat *);
>  
>  extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
>      unsigned long arg);
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 611c398..0ff561a 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -3,6 +3,7 @@
>  
>  #ifdef __KERNEL__
>  
> +#include <linux/types.h>
>  #include <asm/stat.h>
>  
>  #endif
> @@ -46,6 +47,117 @@
>  
>  #endif
>  
> +/*
> + * Query request/result mask
> + *
> + * Bits should be set in request_mask to request particular items when calling
> + * xstat() or fxstat().
> + *
> + * The bits in st_mask may or may not be set upon return, in part depending on
> + * what was set in the mask argument:
> + *
> + * - if not available at all, the bit will be cleared before returning and the
> + *   field will be cleared; otherwise,
> + *
> + * - if AT_FORCE_ATTR_SYNC is set, then the datum will be synchronised to the
> + *   server and the field and bit will be set on return; otherwise,
> + *
> + * - if explicitly requested, the datum will be synchronised to a server or
> + *   other medium if out of date before being returned, and the bit will be set
> + *   on return; otherwise,
> + *
> + * - if not requested, but available in approximate form without any effort, it
> + *   will be filled in anyway, and the bit will be set upon return (it might
> + *   not be up to date, however, and no attempt will be made to synchronise the
> + *   internal state first); otherwise,
> + *
> + * - the field and the bit will be cleared before returning.
> + *
> + * Items in XSTAT_BASIC_STATS may be marked unavailable on return, but they
> + * will have a value installed for compatibility purposes so that stat() and
> + * co. can be emulated in userspace.
> + */
> +#define XSTAT_MODE 0x00000001U /* want/got st_mode */
> +#define XSTAT_NLINK 0x00000002U /* want/got st_nlink */
> +#define XSTAT_UID 0x00000004U /* want/got st_uid */
> +#define XSTAT_GID 0x00000008U /* want/got st_gid */
> +#define XSTAT_RDEV 0x00000010U /* want/got st_rdev */
> +#define XSTAT_ATIME 0x00000020U /* want/got st_atime */
> +#define XSTAT_MTIME 0x00000040U /* want/got st_mtime */
> +#define XSTAT_CTIME 0x00000080U /* want/got st_ctime */
> +#define XSTAT_INO 0x00000100U /* want/got st_ino */
> +#define XSTAT_SIZE 0x00000200U /* want/got st_size */
> +#define XSTAT_BLOCKS 0x00000400U /* want/got st_blocks */
> +#define XSTAT_BASIC_STATS 0x000007ffU /* the stuff in the normal stat struct */
> +#define XSTAT_IOC_FLAGS 0x00000800U /* want/got FS_IOC_GETFLAGS */
> +#define XSTAT_BTIME 0x00001000U /* want/got st_btime */
> +#define XSTAT_GEN 0x00002000U /* want/got st_gen */
> +#define XSTAT_VERSION 0x00004000U /* want/got st_version */
> +#define XSTAT_VOLUME_ID 0x00008000U /* want/got st_volume_id */
> +#define XSTAT_ALL_STATS 0x0000ffffU /* all supported stats */
> +
> +/*
> + * Extended stat structures
> + */
> +struct xstat_dev {
> + uint32_t major, minor;
> +};
> +
> +struct xstat_time {
> + int64_t tv_sec;
> + uint32_t tv_nsec;
> + uint32_t tv_granularity; /* time granularity (in nS) */
> +};
> +
> +struct xstat {
> + uint32_t st_mask; /* what results were written */
> + uint32_t st_mode; /* file mode */
> + uint32_t st_nlink; /* number of hard links */
> + uint32_t st_uid; /* user ID of owner */
> + uint32_t st_gid; /* group ID of owner */
> + uint32_t st_information; /* information about the file */
> + uint32_t st_ioc_flags; /* as FS_IOC_GETFLAGS */
> + uint32_t st_blksize; /* optimal size for filesystem I/O */
> + struct xstat_dev st_rdev; /* device ID of special file */
> + struct xstat_dev st_dev; /* ID of device containing file */
> + struct xstat_time st_atime; /* last access time */
> + struct xstat_time st_btime; /* file creation time */
> + struct xstat_time st_ctime; /* last attribute change time */
> + struct xstat_time st_mtime; /* last data modification time */
> + uint64_t st_ino; /* inode number */
> + uint64_t st_size; /* file size */
> + uint64_t st_blocks; /* number of 512-byte blocks allocated */
> + uint64_t st_gen; /* inode generation number */
> + uint64_t st_version; /* data version number */
> + uint8_t st_volume_id[16]; /* volume identifier */
> + uint64_t __spares[11]; /* spare space for future expansion */
> +};
> +
> +/*
> + * Flags to be found in st_information
> + *
> + * These give information about the features or the state of a file that might
> + * be of use to ordinary userspace programs such as GUIs or ls rather than
> + * specialised tools.
> + *
> + * Additional information may be found in st_ioc_flags and we try not to
> + * overlap with it.
> + */
> +#define XSTAT_INFO_ENCRYPTED 0x00000001U /* File is encrypted */
> +#define XSTAT_INFO_TEMPORARY 0x00000002U /* File is temporary (NTFS/CIFS) */
> +#define XSTAT_INFO_FABRICATED 0x00000004U /* File was made up by filesystem */
> +#define XSTAT_INFO_KERNEL_API 0x00000008U /* File is kernel API (eg: procfs/sysfs) */
> +#define XSTAT_INFO_REMOTE 0x00000010U /* File is remote */
> +#define XSTAT_INFO_OFFLINE 0x00000020U /* File is offline (CIFS) */
> +#define XSTAT_INFO_AUTOMOUNT 0x00000040U /* Dir is automount trigger */
> +#define XSTAT_INFO_AUTODIR 0x00000080U /* Dir provides unlisted automounts */
> +#define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U /* File has non-system ownership details */
> +#define XSTAT_INFO_HAS_ACL 0x00000200U /* File has an ACL of some sort */
> +#define XSTAT_INFO_REPARSE_POINT 0x00000400U /* File is reparse point (NTFS/CIFS) */
> +#define XSTAT_INFO_HIDDEN 0x00000800U /* File is marked hidden (DOS+) */
> +#define XSTAT_INFO_SYSTEM 0x00001000U /* File is marked system (DOS+) */
> +#define XSTAT_INFO_ARCHIVE 0x00002000U /* File is marked archive (DOS+) */
> +
>  #ifdef __KERNEL__
>  #define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO)
>  #define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO)
> @@ -60,6 +172,12 @@
>  #include <linux/time.h>
>  
>  struct kstat {
> + u32 query_flags; /* operational flags */
> +#define KSTAT_QUERY_FLAGS (AT_FORCE_ATTR_SYNC)
> + u32 request_mask; /* what fields the user asked for */
> + u32 result_mask; /* what fields the user got */
> + u32 information;
> + u32 ioc_flags; /* inode flags (FS_IOC_GETFLAGS) */
>   u64 ino;
>   dev_t dev;
>   umode_t mode;
> @@ -67,14 +185,18 @@ struct kstat {
>   uid_t uid;
>   gid_t gid;
>   dev_t rdev;
> + unsigned int tv_granularity; /* granularity of times (in nS) */
>   loff_t size;
> - struct timespec  atime;
> + struct timespec atime;
>   struct timespec mtime;
>   struct timespec ctime;
> + struct timespec btime; /* file creation time */
>   unsigned long blksize;
>   unsigned long long blocks;
> + u64 gen; /* inode generation */
> + u64 version; /* data version */
> + unsigned char volume_id[16]; /* volume identifier */
>  };
>  
>  #endif
> -
>  #endif
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 3de3acb..ff9f8d9 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -45,6 +45,8 @@ struct shmid_ds;
>  struct sockaddr;
>  struct stat;
>  struct stat64;
> +struct xstat_parameters;
> +struct xstat;
>  struct statfs;
>  struct statfs64;
>  struct __sysctl_args;
> @@ -858,4 +860,9 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
>        unsigned long riovcnt,
>        unsigned long flags);
>  
> +asmlinkage long sys_xstat(int dfd, const char __user *path, unsigned flags,
> +  unsigned mask, struct xstat __user *buffer);
> +asmlinkage long sys_fxstat(unsigned fd, unsigned flags,
> +   unsigned mask, struct xstat __user *buffer);
> +
>  #endif
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available

Steve French-2
On Tue, Apr 24, 2012 at 4:29 PM, J. Bruce Fields <[hidden email]> wrote:

> On Thu, Apr 19, 2012 at 03:06:12PM +0100, David Howells wrote:
>> Add a pair of system calls to make extended file stats available, including
>> file creation time, inode version and data version where available through the
>> underlying filesystem.
>>
>> The idea was initially proposed as a set of xattrs that could be retrieved with
>> getxattr(), but the general preferance proved to be for new syscalls with an
>> extended stat structure.
>>
>> This has a number of uses:
>>
>>  (1) Creation time: The SMB protocol carries the creation time, which could be
>>      exported by Samba, which will in turn help CIFS make use of FS-Cache as
>>      that can be used for coherency data.
>>
>>      This is also specified in NFSv4 as a recommended attribute and could be
>>      exported by NFSD [Steve French].
>>
>>  (2) Lightweight stat: Ask for just those details of interest, and allow a
>>      netfs (such as NFS) to approximate anything not of interest, possibly
>>      without going to the server [Trond Myklebust, Ulrich Drepper].
>>
>>  (3) Heavyweight stat: Force a netfs to go to the server, even if it thinks its
>>      cached attributes are up to date [Trond Myklebust].
>>
>>  (4) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd
>>      Schubert].
>>
>>  (5) Data version number: Could be used by userspace NFS servers [Aneesh Kumar].
>>
>>      Can also be used to modify fill_post_wcc() in NFSD which retrieves
>>      i_version directly, but has just called vfs_getattr().  It could get it
>>      from the kstat struct if it used vfs_xgetattr() instead.
>>
>>  (6) BSD stat compatibility: Including more fields from the BSD stat such as
>>      creation time (st_btime) and inode generation number (st_gen) [Jeremy
>>      Allison, Bernd Schubert].
>>
>>  (7) Extra coherency data may be useful in making backups [Andreas Dilger].
>>
>>  (8) Allow the filesystem to indicate what it can/cannot provide: A filesystem
>>      can now say it doesn't support a standard stat feature if that isn't
>>      available, so if, for instance, inode numbers or UIDs don't exist...
>>
>>  (9) Make the fields a consistent size on all arches and make them large.
>>
>> (10) Store a 16-byte volume ID in the superblock that can be returned in struct
>>      xstat [Steve French].
>>
>> (11) Include granularity fields in the time data to indicate the granularity of
>>      each of the times (NFSv4 time_delta) [Steve French].
>
> It looks like you're including this with *each* time?  But surely
> there's no filesystem with different granularity (say) for ctime than
> for mtime.  Also, nfsd will want only one time_delta, not one for each
> time.
>
> Note also we need to document carefully what this means: I think it
> should be the granularity that the filesystem is capable of
> representing, but people are sometimes surprised to find out that the
> actual time source is usually more coarse-grained than that.

I also would prefer that we simply treat the time granularity as part
of the superblock (mounted volume) ie returned on fstat rather than on
every stat of the filesystem.   For cifs mounts we could conceivably
have different time granularity (1 or 2 second) on mounts to old
servers rather than 100 nanoseconds.


--
Thanks,

Steve
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available

Andreas Dilger-7
In reply to this post by J. Bruce Fields
On 2012-04-24, at 4:29 PM, J. Bruce Fields wrote:
> On Thu, Apr 19, 2012 at 03:06:12PM +0100, David Howells wrote:
>> (11) Include granularity fields in the time data to indicate the
>>    granularity of each of the times (NFSv4 time_delta) [Steve French].
>
> It looks like you're including this with *each* time?  But surely
> there's no filesystem with different granularity (say) for ctime than
> for mtime.  Also, nfsd will want only one time_delta, not one for each
> time.

I suspect the main reason for having a separate time_delta per timestamp
is to use the extra 32-bit field in the timestamp structs.  Since those
structs have a 64-bit + 32-bit field, it would be messy to pack them,
and leaving the spare bytes unused and adding an additional field for
the granularity would just increase the struct size.

> Note also we need to document carefully what this means: I think it
> should be the granularity that the filesystem is capable of
> representing, but people are sometimes surprised to find out that the
> actual time source is usually more coarse-grained than that.
>
> --b.
>
>>
>> (12) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
>>
>> (13) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
>>    Michael Kerrisk].
>>
>> (14) Spare space, request flags and information flags are provided for future
>>    expansion.
>>
>>
>> The following structures are defined for the use of these new system calls:
>>
>> struct xstat_dev {
>> uint32_t major, minor;
>> };
>>
>> struct xstat_time {
>> uint64_t tv_sec;
>> uint32_t tv_nsec;
>> uint32_t tv_granularity;
>> };
>>
>> struct xstat {
>> uint32_t st_mask;
>> uint32_t st_mode;
>> uint32_t st_nlink;
>> uint32_t st_uid;
>> uint32_t st_gid;
>> uint32_t st_information;
>> uint32_t st_ioc_flags;
>> uint32_t st_blksize;
>> struct xstat_dev st_rdev;
>> struct xstat_dev st_dev;
>> struct xstat_time st_atime;
>> struct xstat_time st_btime;
>> struct xstat_time st_ctime;
>> struct xstat_time st_mtime;
>> uint64_t st_ino;
>> uint64_t st_size;
>> uint64_t st_blocks;
>> uint64_t st_gen;
>> uint64_t st_version;
>> uint8_t st_volume_id[16];
>> uint64_t __spares[11];
>> };
>>
>> where st_information is local system information about the file, st_btime is
>> the file creation time, st_gen is the inode generation (i_generation),
>> st_data_version is the data version number (i_version), st_ioc_flags is the
>> flags from FS_IOC_GETFLAGS, st_volume_id is where the volume identified is
>> stored, st_result_mask is a bitmask indicating the data provided and __spares[]
>> are where as-yet undefined fields can be placed.
>>
>> The defined bits in request_mask and st_mask are:
>>
>> XSTAT_MODE Want/got st_mode
>> XSTAT_NLINK Want/got st_nlink
>> XSTAT_UID Want/got st_uid
>> XSTAT_GID Want/got st_gid
>> XSTAT_RDEV Want/got st_rdev
>> XSTAT_ATIME Want/got st_atime
>> XSTAT_MTIME Want/got st_mtime
>> XSTAT_CTIME Want/got st_ctime
>> XSTAT_INO Want/got st_ino
>> XSTAT_SIZE Want/got st_size
>> XSTAT_BLOCKS Want/got st_blocks
>> XSTAT_BASIC_STATS [The stuff in the normal stat struct]
>> XSTAT_IOC_FLAGS Want/got FS_IOC_GETFLAGS
>> XSTAT_BTIME Want/got st_btime
>> XSTAT_GEN Want/got st_gen
>> XSTAT_VERSION Want/got st_data_version
>> XSTAT_VOLUME_ID Want/got st_volume_id
>> XSTAT_ALL_STATS [All currently available stuff]
>>
>> The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra flags
>> that might be supplied by the filesystem.  Note that Ext4 returns flags outside
>> of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS.  Should
>> {EXT4,FS}_FL_USER_VISIBLE be extended to cover them?  Or should the extra flags
>> be suppressed?
>>
>> The defined bits in the st_information field give local system data on a file,
>> how it is accessed, where it is and what it does:
>>
>> XSTAT_INFO_ENCRYPTED File is encrypted
>> XSTAT_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted)
>> XSTAT_INFO_FABRICATED File was made up by filesystem
>> XSTAT_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs)
>> XSTAT_INFO_REMOTE File is remote
>> XSTAT_INFO_OFFLINE File is offline (CIFS)
>> XSTAT_INFO_AUTOMOUNT Dir is automount trigger
>> XSTAT_INFO_AUTODIR Dir provides unlisted automounts
>> XSTAT_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details
>> XSTAT_INFO_HAS_ACL File has an ACL of some sort
>> XSTAT_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS)
>> XSTAT_INFO_HIDDEN File is marked hidden (DOS+)
>> XSTAT_INFO_SYSTEM File is marked system (DOS+)
>> XSTAT_INFO_ARCHIVE File is marked archive (DOS+)
>>
>> These are for the use of GUI tools that might want to mark files specially,
>> depending on what they are.  I've tried not to provide overlap with
>> st_ioc_flags where something usable exists there.  Should Hidden, System and
>> Archive flags be associated with ioc_flags, perhaps with ioc_flags extended to
>> 64-bits?
>>
>>
>> The system calls are:
>>
>> ssize_t ret = xstat(int dfd,
>>    const char *filename,
>>    unsigned int flags,
>>    unsigned int mask,
>>    struct xstat *buffer);
>>
>> ssize_t ret = fxstat(unsigned fd,
>>     unsigned int flags,
>>     unsigned int mask,
>>     struct xstat *buffer);
>>
>>
>> The dfd, filename, flags and fd parameters indicate the file to query.  There
>> is no equivalent of lstat() as that can be emulated with xstat() by passing
>> AT_SYMLINK_NOFOLLOW in flags.
>>
>> AT_FORCE_ATTR_SYNC can also be set in flags.  This will require a network
>> filesystem to synchronise its attributes with the server.
>>
>> mask is a bitmask indicating the fields in struct xstat that are of interest to
>> the caller.  The user should set this to XSTAT__BASIC_STATS to get the
>> basic set returned by stat().
>>
>> Should there just be one xstat() syscall that does fxstat() if filename is NULL?
>>
>> The fields in struct xstat come in a number of classes:
>>
>> (0) st_dev, st_blksize, st_information.
>>
>>    These are local data and are always available.
>>
>> (1) st_mode, st_nlinks, st_uid, st_gid, st_[amc]time, st_ino, st_size,
>>    st_blocks.
>>
>>    These will be returned whether the caller asks for them or not.  The
>>    corresponding bits in result_mask will be set to indicate their presence.
>>
>>    If the caller didn't ask for them, then they may be approximated.  For
>>    example, NFS won't waste any time updating them from the server, unless as
>>    a byproduct of updating something requested.
>>
>>    If the values don't actually exist for the underlying object (such as UID
>>    or GID on a DOS file), then the bit won't be set in the result_mask, even
>>    if the caller asked for the value and the returned value will be a
>>    fabrication.
>>
>> (2) st_rdev.
>>
>>    As for class (1), but this won't be returned if the file is not a blockdev
>>    or chardev.  The bit will be cleared if the value is not returned.
>>
>> (3) File creation time (st_btime), inode generation (st_gen), data version
>>    (st_version), volume_id (st_volume_id) and inode flags (st_ioc_flags).
>>
>>    These will be returned if available whether the caller asked for them or
>>    not.  The corresponding bits in result_mask will be set or cleared as
>>    appropriate to indicate their presence.
>>
>>    If the caller didn't ask for them, then they may be approximated.  For
>>    example, NFS won't waste any time updating them from the server, unless
>>    as a byproduct of updating something requested.
>>
>> At the moment, this will only work on x86_64 and i386 as it requires system
>> calls to be wired up.
>>
>>
>> =======
>> TESTING
>> =======
>>
>> The following test program can be used to test the xstat system call:
>>
>> /* Test the xstat() system call
>> *
>> * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved.
>> * Written by David Howells ([hidden email])
>> *
>> * This program is free software; you can redistribute it and/or
>> * modify it under the terms of the GNU General Public Licence
>> * as published by the Free Software Foundation; either version
>> * 2 of the Licence, or (at your option) any later version.
>> */
>>
>> #define _GNU_SOURCE
>> #define _ATFILE_SOURCE
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> #include <unistd.h>
>> #include <fcntl.h>
>> #include <time.h>
>> #include <sys/syscall.h>
>> #include <sys/stat.h>
>> #include <sys/types.h>
>>
>> #define AT_NO_AUTOMOUNT 0x800
>> #define AT_FORCE_ATTR_SYNC 0x2000
>>
>> #define XSTAT_MODE 0x00000001U
>> #define XSTAT_NLINK 0x00000002U
>> #define XSTAT_UID 0x00000004U
>> #define XSTAT_GID 0x00000008U
>> #define XSTAT_RDEV 0x00000010U
>> #define XSTAT_ATIME 0x00000020U
>> #define XSTAT_MTIME 0x00000040U
>> #define XSTAT_CTIME 0x00000080U
>> #define XSTAT_INO 0x00000100U
>> #define XSTAT_SIZE 0x00000200U
>> #define XSTAT_BLOCKS 0x00000400U
>> #define XSTAT_BASIC_STATS 0x000007ffU
>> #define XSTAT_BTIME 0x00000800U
>> #define XSTAT_GEN 0x00001000U
>> #define XSTAT_VERSION 0x00002000U
>> #define XSTAT_IOC_FLAGS 0x00004000U
>> #define XSTAT_VOLUME_ID 0x00008000U
>> #define XSTAT_ALL_STATS 0x0000ffffU
>>
>> struct xstat_dev {
>> uint32_t major;
>> uint32_t minor;
>> };
>>
>> struct xstat_time {
>> uint64_t tv_sec;
>> uint32_t tv_nsec;
>> uint32_t tv_granularity;
>> };
>>
>> struct xstat {
>> uint32_t st_mask;
>> uint32_t st_mode;
>> uint32_t st_nlink;
>> uint32_t st_uid;
>> uint32_t st_gid;
>> uint32_t st_information;
>> uint32_t st_ioc_flags;
>> uint32_t st_blksize;
>> struct xstat_dev st_rdev;
>> struct xstat_dev st_dev;
>> struct xstat_time st_atim;
>> struct xstat_time st_btim;
>> struct xstat_time st_ctim;
>> struct xstat_time st_mtim;
>> uint64_t st_ino;
>> uint64_t st_size;
>> uint64_t st_blksize;
>> uint64_t st_blocks;
>> uint64_t st_gen;
>> uint64_t st_version;
>> uint64_t st_volume_id[16];
>> uint64_t st_spares[11];
>> };
>>
>> #define XSTAT_INFO_ENCRYPTED 0x00000001U
>> #define XSTAT_INFO_TEMPORARY 0x00000002U
>> #define XSTAT_INFO_FABRICATED 0x00000004U
>> #define XSTAT_INFO_KERNEL_API 0x00000008U
>> #define XSTAT_INFO_REMOTE 0x00000010U
>> #define XSTAT_INFO_OFFLINE 0x00000020U
>> #define XSTAT_INFO_AUTOMOUNT 0x00000040U
>> #define XSTAT_INFO_AUTODIR 0x00000080U
>> #define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U
>> #define XSTAT_INFO_HAS_ACL 0x00000200U
>> #define XSTAT_INFO_REPARSE_POINT 0x00000400U
>> #define XSTAT_INFO_HIDDEN 0x00000800U
>> #define XSTAT_INFO_SYSTEM 0x00001000U
>> #define XSTAT_INFO_ARCHIVE 0x00002000U
>>
>> #define __NR_xstat 312
>> #define __NR_fxstat 313
>>
>> static __attribute__((unused))
>> ssize_t xstat(int dfd, const char *filename, unsigned flags,
>>      unsigned int mask, struct xstat *buffer)
>> {
>> return syscall(__NR_xstat, dfd, filename, flags, mask, buffer);
>> }
>>
>> static __attribute__((unused))
>> ssize_t fxstat(int fd, unsigned flags,
>>       unsigned int mask, struct xstat *buffer)
>> {
>> return syscall(__NR_fxstat, fd, flags, mask, buffer);
>> }
>>
>> static void print_time(const char *field, const struct xstat_time *xstm)
>> {
>> struct tm tm;
>> time_t tim;
>> char buffer[100];
>> int len;
>>
>> tim = xstm->tv_sec;
>> if (!localtime_r(&tim, &tm)) {
>> perror("localtime_r");
>> exit(1);
>> }
>> len = strftime(buffer, 100, "%F %T", &tm);
>> if (len == 0) {
>> perror("strftime");
>> exit(1);
>> }
>> printf("%s", field);
>> fwrite(buffer, 1, len, stdout);
>> printf(".%09u", xstm->tv_nsec);
>> len = strftime(buffer, 100, "%z", &tm);
>> if (len == 0) {
>> perror("strftime2");
>> exit(1);
>> }
>> fwrite(buffer, 1, len, stdout);
>> printf("\n");
>> }
>>
>> static void dump_xstat(struct xstat *xst)
>> {
>> char buffer[256], ft;
>>
>> printf("results=%x\n", xst->st_mask);
>>
>> printf(" ");
>> if (xst->st_mask & XSTAT_SIZE)
>> printf(" Size: %-15llu", (unsigned long long) xst->st_size);
>> if (xst->st_mask & XSTAT_BLOCKS)
>> printf(" Blocks: %-10llu", (unsigned long long) xst->st_blocks);
>> printf(" IO Block: %-6llu ", (unsigned long long) xst->st_blksize);
>> if (xst->st_mask & XSTAT_MODE) {
>> switch (xst->st_mode & S_IFMT) {
>> case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break;
>> case S_IFCHR: printf(" character special file\n"); ft = 'c'; break;
>> case S_IFDIR: printf(" directory\n"); ft = 'd'; break;
>> case S_IFBLK: printf(" block special file\n"); ft = 'b'; break;
>> case S_IFREG: printf(" regular file\n"); ft = '-'; break;
>> case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break;
>> case S_IFSOCK: printf(" socket\n"); ft = 's'; break;
>> default:
>> printf("unknown type (%o)\n", xst->st_mode & S_IFMT);
>> ft = '?';
>> break;
>> }
>> }
>>
>> sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor);
>> printf("Device: %-15s", buffer);
>> if (xst->st_mask & XSTAT_INO)
>> printf(" Inode: %-11llu", (unsigned long long) xst->st_ino);
>> if (xst->st_mask & XSTAT_SIZE)
>> printf(" Links: %-5u", xst->st_nlink);
>> if (xst->st_mask & XSTAT_RDEV)
>> printf(" Device type: %u,%u",
>>       xst->st_rdev.major, xst->st_rdev.minor);
>> printf("\n");
>>
>> if (xst->st_mask & XSTAT_MODE)
>> printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c)  ",
>>       xst->st_mode & 07777,
>>       ft,
>>       xst->st_mode & S_IRUSR ? 'r' : '-',
>>       xst->st_mode & S_IWUSR ? 'w' : '-',
>>       xst->st_mode & S_IXUSR ? 'x' : '-',
>>       xst->st_mode & S_IRGRP ? 'r' : '-',
>>       xst->st_mode & S_IWGRP ? 'w' : '-',
>>       xst->st_mode & S_IXGRP ? 'x' : '-',
>>       xst->st_mode & S_IROTH ? 'r' : '-',
>>       xst->st_mode & S_IWOTH ? 'w' : '-',
>>       xst->st_mode & S_IXOTH ? 'x' : '-');
>> if (xst->st_mask & XSTAT_UID)
>> printf("Uid: %d   \n", xst->st_uid);
>> if (xst->st_mask & XSTAT_GID)
>> printf("Gid: %u\n", xst->st_gid);
>>
>> if (xst->st_mask & XSTAT_ATIME)
>> print_time("Access: ", &xst->st_atim);
>> if (xst->st_mask & XSTAT_MTIME)
>> print_time("Modify: ", &xst->st_mtim);
>> if (xst->st_mask & XSTAT_CTIME)
>> print_time("Change: ", &xst->st_ctim);
>> if (xst->st_mask & XSTAT_BTIME)
>> print_time("Create: ", &xst->st_btim);
>>
>> if (xst->st_mask & XSTAT_GEN)
>> printf("Inode version: %llxh\n", (unsigned long long) xst->st_gen);
>> if (xst->st_mask & XSTAT_VERSION)
>> printf("Data version: %llxh\n", (unsigned long long) xst->st_version);
>>
>> if (xst->st_mask & XSTAT_IOC_FLAGS) {
>> unsigned char bits;
>> int loop, byte;
>>
>> static char flag_representation[32 + 1] =
>> /* FS_IOC_GETFLAGS flags: */
>> "????????" /* 31-24 0x00000000-ff000000  */
>> "????ehTD" /* 23-16 0x00000000-00ff0000  */
>> "tj?IE?XZ" /* 15- 8 0x00000000-0000ff00  */
>> "AdaiScus" /*  7- 0 0x00000000-000000ff */
>> ;
>>
>> printf("Inode flags: %08x (", xst->st_ioc_flags);
>> for (byte = 32 - 8; byte >= 0; byte -= 8) {
>> bits = xst->st_ioc_flags >> byte;
>> for (loop = 7; loop >= 0; loop--) {
>> int bit = byte + loop;
>>
>> if (bits & 0x80)
>> putchar(flag_representation[31 - bit]);
>> else
>> putchar('-');
>> bits <<= 1;
>> }
>> if (byte)
>> putchar(' ');
>> }
>> printf(")\n");
>> }
>>
>> if (xst->st_information) {
>> unsigned char bits;
>> int loop, byte;
>>
>> static char info_representation[32 + 1] =
>> /* XSTAT_INFO_ flags: */
>> "????????" /* 31-24 0x00000000-ff000000  */
>> "????????" /* 23-16 0x00000000-00ff0000  */
>> "??ASHRan" /* 15- 8 0x00000000-0000ff00  */
>> "dmorkfte" /*  7- 0 0x00000000-000000ff */
>> ;
>>
>> printf("Information: %08x (", xst->st_information);
>> for (byte = 32 - 8; byte >= 0; byte -= 8) {
>> bits = xst->st_information >> byte;
>> for (loop = 7; loop >= 0; loop--) {
>> int bit = byte + loop;
>>
>> if (bits & 0x80)
>> putchar(info_representation[31 - bit]);
>> else
>> putchar('-');
>> bits <<= 1;
>> }
>> if (byte)
>> putchar(' ');
>> }
>> printf(")\n");
>> }
>>
>> if (xst->st_mask & XSTAT_VOLUME_ID) {
>> int loop;
>> printf("Volume ID: ");
>> for (loop = 0; loop < sizeof(xst->st_volume_id); loop++) {
>> printf("%02x", xst->st_volume_id[loop]);
>> if (loop == 7)
>> printf("-");
>> }
>> printf("\n");
>> }
>> }
>>
>> void dump_hex(unsigned long long *data, int from, int to)
>> {
>> unsigned offset, print_offset = 1, col = 0;
>>
>> from /= 8;
>> to = (to + 7) / 8;
>>
>> for (offset = from; offset < to; offset++) {
>> if (print_offset) {
>> printf("%04x: ", offset * 8);
>> print_offset = 0;
>> }
>> printf("%016llx", data[offset]);
>> col++;
>> if ((col & 3) == 0) {
>> printf("\n");
>> print_offset = 1;
>> } else {
>> printf(" ");
>> }
>> }
>>
>> if (!print_offset)
>> printf("\n");
>> }
>>
>> int main(int argc, char **argv)
>> {
>> struct xstat xst;
>> int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
>>
>> unsigned int mask = XSTAT_ALL_STATS;
>>
>> for (argv++; *argv; argv++) {
>> if (strcmp(*argv, "-F") == 0) {
>> atflag |= AT_FORCE_ATTR_SYNC;
>> continue;
>> }
>> if (strcmp(*argv, "-L") == 0) {
>> atflag &= ~AT_SYMLINK_NOFOLLOW;
>> continue;
>> }
>> if (strcmp(*argv, "-O") == 0) {
>> mask &= ~XSTAT_BASIC_STATS;
>> continue;
>> }
>> if (strcmp(*argv, "-A") == 0) {
>> atflag |= AT_NO_AUTOMOUNT;
>> continue;
>> }
>> if (strcmp(*argv, "-R") == 0) {
>> raw = 1;
>> continue;
>> }
>>
>> memset(&xst, 0xbf, sizeof(xst));
>> ret = xstat(AT_FDCWD, *argv, atflag, mask, &xst);
>> printf("xstat(%s) = %d\n", *argv, ret);
>> if (ret < 0) {
>> perror(*argv);
>> exit(1);
>> }
>>
>> if (raw)
>> dump_hex((unsigned long long *)&xst, 0, sizeof(xst));
>>
>> dump_xstat(&xst);
>> }
>> return 0;
>> }
>>
>> Just compile and run, passing it paths to the files you want to examine:
>>
>> [root@andromeda ~]# /tmp/xstat /proc/$$
>> xstat(/proc/2074) = 160
>> results=47ef
>>  Size: 0               Blocks: 0          IO Block: 1024    directory
>> Device: 00:03           Inode: 9072        Links: 7
>> Access: (0555/dr-xr-xr-x)  Uid: 0
>> Gid: 0
>> Access: 2010-07-14 16:50:46.609336272+0100
>> Modify: 2010-07-14 16:50:46.609336272+0100
>> Change: 2010-07-14 16:50:46.609336272+0100
>> Inode flags: 0000000100000000 (-------- -------- -------- -------S -------- -------- -------- --------)
>> [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm
>> xstat(/afs/archive/linuxdev/fedora9/x86_64/kernel-devel-2.6.25.10-86.fc9.x86_64.rpm) = 160
>> results=77ef
>>  Size: 5413882         Blocks: 0          IO Block: 4096    regular file
>> Device: 00:15           Inode: 2288        Links: 1
>> Access: (0644/-rw-r--r--)  Uid: 75338
>> Gid: 0
>> Access: 2008-11-05 19:47:22.000000000+0000
>> Modify: 2008-11-05 19:47:22.000000000+0000
>> Change: 2008-11-05 19:47:22.000000000+0000
>> Inode version: 795h
>> Data version: 2h
>> Inode flags: 0000000800000000 (-------- -------- -------- ----r--- -------- -------- -------- --------)
>>
>> Signed-off-by: David Howells <[hidden email]>
>> ---
>>
>> arch/x86/syscalls/syscall_32.tbl |    2
>> arch/x86/syscalls/syscall_64.tbl |    2
>> fs/stat.c                        |  350 +++++++++++++++++++++++++++++++++++---
>> include/linux/fcntl.h            |    1
>> include/linux/fs.h               |    4
>> include/linux/stat.h             |  126 +++++++++++++-
>> include/linux/syscalls.h         |    7 +
>> 7 files changed, 461 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
>> index 29f9f05..980eb5a 100644
>> --- a/arch/x86/syscalls/syscall_32.tbl
>> +++ b/arch/x86/syscalls/syscall_32.tbl
>> @@ -355,3 +355,5 @@
>> 346 i386 setns sys_setns
>> 347 i386 process_vm_readv sys_process_vm_readv compat_sys_process_vm_readv
>> 348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev
>> +349 i386 xstat sys_xstat
>> +350 i386 fxstat sys_fxstat
>> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
>> index dd29a9e..7ae24bb 100644
>> --- a/arch/x86/syscalls/syscall_64.tbl
>> +++ b/arch/x86/syscalls/syscall_64.tbl
>> @@ -318,6 +318,8 @@
>> 309 common getcpu sys_getcpu
>> 310 64 process_vm_readv sys_process_vm_readv
>> 311 64 process_vm_writev sys_process_vm_writev
>> +312 common xstat sys_xstat
>> +313 common fxstat sys_fxstat
>> #
>> # x32-specific system call numbers start at 512 to avoid cache impact
>> # for native 64-bit operation.
>> diff --git a/fs/stat.c b/fs/stat.c
>> index c733dc5..af3ef33 100644
>> --- a/fs/stat.c
>> +++ b/fs/stat.c
>> @@ -18,8 +18,20 @@
>> #include <asm/uaccess.h>
>> #include <asm/unistd.h>
>>
>> +/**
>> + * generic_fillattr - Fill in the basic attributes from the inode struct
>> + * @inode: Inode to use as the source
>> + * @stat: Where to fill in the attributes
>> + *
>> + * Fill in the basic attributes in the kstat structure from data that's to be
>> + * found on the VFS inode structure.  This is the default if no getattr inode
>> + * operation is supplied.
>> + */
>> void generic_fillattr(struct inode *inode, struct kstat *stat)
>> {
>> + struct super_block *sb = inode->i_sb;
>> + u32 x;
>> +
>> stat->dev = inode->i_sb->s_dev;
>> stat->ino = inode->i_ino;
>> stat->mode = inode->i_mode;
>> @@ -27,17 +39,61 @@ void generic_fillattr(struct inode *inode, struct kstat *stat)
>> stat->uid = inode->i_uid;
>> stat->gid = inode->i_gid;
>> stat->rdev = inode->i_rdev;
>> - stat->size = i_size_read(inode);
>> - stat->atime = inode->i_atime;
>> stat->mtime = inode->i_mtime;
>> stat->ctime = inode->i_ctime;
>> - stat->blksize = (1 << inode->i_blkbits);
>> + stat->size = i_size_read(inode);
>> stat->blocks = inode->i_blocks;
>> -}
>> + stat->blksize = (1 << inode->i_blkbits);
>>
>> + stat->result_mask |= XSTAT_BASIC_STATS & ~XSTAT_RDEV;
>> + if (IS_NOATIME(inode))
>> + stat->result_mask &= ~XSTAT_ATIME;
>> + else
>> + stat->atime = inode->i_atime;
>> +
>> + if (S_ISREG(stat->mode) && stat->nlink == 0)
>> + stat->information |= XSTAT_INFO_TEMPORARY;
>> + if (IS_AUTOMOUNT(inode))
>> + stat->information |= XSTAT_INFO_AUTOMOUNT;
>> + if (IS_POSIXACL(inode))
>> + stat->information |= XSTAT_INFO_HAS_ACL;
>> +
>> + /* if unset, assume 1s granularity */
>> + stat->tv_granularity = sb->s_time_gran ?: 1000000000U;
>> +
>> + if (unlikely(S_ISBLK(stat->mode) || S_ISCHR(stat->mode)))
>> + stat->result_mask |= XSTAT_RDEV;
>> +
>> + x  = ((u32*)&stat->volume_id)[0] = ((u32*)&sb->s_volume_id)[0];
>> + x |= ((u32*)&stat->volume_id)[1] = ((u32*)&sb->s_volume_id)[1];
>> + x |= ((u32*)&stat->volume_id)[2] = ((u32*)&sb->s_volume_id)[2];
>> + x |= ((u32*)&stat->volume_id)[3] = ((u32*)&sb->s_volume_id)[3];
>> + if (x)
>> + stat->result_mask |= XSTAT_VOLUME_ID;
>> +}
>> EXPORT_SYMBOL(generic_fillattr);
>>
>> -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
>> +/**
>> + * vfs_xgetattr - Get the basic and extra attributes of a file
>> + * @mnt: The mountpoint to which the dentry belongs
>> + * @dentry: The file of interest
>> + * @stat: Where to return the statistics
>> + *
>> + * Ask the filesystem for a file's attributes.  The caller must have preset
>> + * stat->request_mask and stat->query_flags to indicate what they want.
>> + *
>> + * If the file is remote, the filesystem can be forced to update the attributes
>> + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags.
>> + *
>> + * Bits must have been set in stat->request_mask to indicate which attributes
>> + * the caller wants retrieving.  Any such attribute not requested may be
>> + * returned anyway, but the value may be approximate, and, if remote, may not
>> + * have been synchronised with the server.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_xgetattr(struct vfsmount *mnt, struct dentry *dentry,
>> + struct kstat *stat)
>> {
>> struct inode *inode = dentry->d_inode;
>> int retval;
>> @@ -46,64 +102,184 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
>> if (retval)
>> return retval;
>>
>> + stat->result_mask = 0;
>> + stat->information = 0;
>> + stat->ioc_flags = 0;
>> if (inode->i_op->getattr)
>> return inode->i_op->getattr(mnt, dentry, stat);
>>
>> generic_fillattr(inode, stat);
>> return 0;
>> }
>> +EXPORT_SYMBOL(vfs_xgetattr);
>>
>> +/**
>> + * vfs_getattr - Get the basic attributes of a file
>> + * @mnt: The mountpoint to which the dentry belongs
>> + * @dentry: The file of interest
>> + * @stat: Where to return the statistics
>> + *
>> + * Ask the filesystem for a file's attributes.  If remote, the filesystem isn't
>> + * forced to update its files from the backing store.  Only the basic set of
>> + * attributes will be retrieved; anyone wanting more must use vfs_getxattr(),
>> + * as must anyone who wants to force attributes to be sync'd with the server.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
>> +{
>> + stat->query_flags = 0;
>> + stat->request_mask = XSTAT_BASIC_STATS;
>> + return vfs_xgetattr(mnt, dentry, stat);
>> +}
>> EXPORT_SYMBOL(vfs_getattr);
>>
>> -int vfs_fstat(unsigned int fd, struct kstat *stat)
>> +/**
>> + * vfs_fxstat - Get basic and extra attributes by file descriptor
>> + * @fd: The file descriptor refering to the file of interest
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_xgetattr().  The main difference is
>> + * that it uses a file descriptor to determine the file location.
>> + *
>> + * The caller must have preset stat->query_flags and stat->request_mask as for
>> + * vfs_xgetattr().
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_fxstat(unsigned int fd, struct kstat *stat)
>> {
>> struct file *f = fget(fd);
>> int error = -EBADF;
>>
>> + if (stat->query_flags & ~KSTAT_QUERY_FLAGS)
>> + return -EINVAL;
>> if (f) {
>> - error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat);
>> + error = vfs_xgetattr(f->f_path.mnt, f->f_path.dentry, stat);
>> fput(f);
>> }
>> return error;
>> }
>> +EXPORT_SYMBOL(vfs_fxstat);
>> +
>> +/**
>> + * vfs_fstat - Get basic attributes by file descriptor
>> + * @fd: The file descriptor refering to the file of interest
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_getattr().  The main difference is
>> + * that it uses a file descriptor to determine the file location.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_fstat(unsigned int fd, struct kstat *stat)
>> +{
>> + stat->query_flags = 0;
>> + stat->request_mask = XSTAT_BASIC_STATS;
>> + return vfs_fxstat(fd, stat);
>> +}
>> EXPORT_SYMBOL(vfs_fstat);
>>
>> -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
>> - int flag)
>> +/**
>> + * vfs_xstat - Get basic and extra attributes by filename
>> + * @dfd: A file descriptor representing the base dir for a relative filename
>> + * @filename: The name of the file of interest
>> + * @flags: Flags to control the query
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_xgetattr().  The main difference is
>> + * that it uses a filename and base directory to determine the file location.
>> + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a
>> + * symlink at the given name from being referenced.
>> + *
>> + * The caller must have preset stat->request_mask as for vfs_xgetattr().  The
>> + * flags are also used to load up stat->query_flags.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_xstat(int dfd, const char __user *filename, int flags,
>> +      struct kstat *stat)
>> {
>> struct path path;
>> - int error = -EINVAL;
>> - int lookup_flags = 0;
>> + int error, lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
>>
>> - if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
>> -      AT_EMPTY_PATH)) != 0)
>> - goto out;
>> + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
>> +      AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0)
>> + return -EINVAL;
>>
>> - if (!(flag & AT_SYMLINK_NOFOLLOW))
>> - lookup_flags |= LOOKUP_FOLLOW;
>> - if (flag & AT_EMPTY_PATH)
>> + if (flags & AT_SYMLINK_NOFOLLOW)
>> + lookup_flags &= ~LOOKUP_FOLLOW;
>> + if (flags & AT_NO_AUTOMOUNT)
>> + lookup_flags &= ~LOOKUP_AUTOMOUNT;
>> + if (flags & AT_EMPTY_PATH)
>> lookup_flags |= LOOKUP_EMPTY;
>>
>> + stat->query_flags = flags & KSTAT_QUERY_FLAGS;
>> error = user_path_at(dfd, filename, lookup_flags, &path);
>> - if (error)
>> - goto out;
>> -
>> - error = vfs_getattr(path.mnt, path.dentry, stat);
>> - path_put(&path);
>> -out:
>> + if (!error) {
>> + error = vfs_xgetattr(path.mnt, path.dentry, stat);
>> + path_put(&path);
>> + }
>> return error;
>> }
>> +EXPORT_SYMBOL(vfs_xstat);
>> +
>> +/**
>> + * vfs_fstatat - Get basic attributes by filename
>> + * @dfd: A file descriptor representing the base dir for a relative filename
>> + * @filename: The name of the file of interest
>> + * @flags: Flags to control the query
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_xstat().  The difference is that it
>> + * preselects basic stats only.  The flags are used to load up
>> + * stat->query_flags in addition to indicating symlink handling during path
>> + * resolution.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
>> + int flags)
>> +{
>> + stat->request_mask = XSTAT_BASIC_STATS;
>> + return vfs_xstat(dfd, filename, flags, stat);
>> +}
>> EXPORT_SYMBOL(vfs_fstatat);
>>
>> -int vfs_stat(const char __user *name, struct kstat *stat)
>> +/**
>> + * vfs_stat - Get basic attributes by filename
>> + * @filename: The name of the file of interest
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_xstat().  The difference is that it
>> + * preselects basic stats only, terminal symlinks are followed regardless and a
>> + * remote filesystem can't be forced to query the server.  If such is desired,
>> + * vfs_xstat() should be used instead.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_stat(const char __user *filename, struct kstat *stat)
>> {
>> - return vfs_fstatat(AT_FDCWD, name, stat, 0);
>> + stat->request_mask = XSTAT_BASIC_STATS;
>> + return vfs_xstat(AT_FDCWD, filename, 0, stat);
>> }
>> EXPORT_SYMBOL(vfs_stat);
>>
>> +/**
>> + * vfs_stat - Get basic attributes by filename, without following terminal symlink
>> + * @filename: The name of the file of interest
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_xstat().  The difference is that it
>> + * preselects basic stats only, terminal symlinks are note followed regardless
>> + * and a remote filesystem can't be forced to query the server.  If such is
>> + * desired, vfs_xstat() should be used instead.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> int vfs_lstat(const char __user *name, struct kstat *stat)
>> {
>> - return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW);
>> + return vfs_xstat(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat);
>> }
>> EXPORT_SYMBOL(vfs_lstat);
>>
>> @@ -118,7 +294,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
>> {
>> static int warncount = 5;
>> struct __old_kernel_stat tmp;
>> -
>> +
>> if (warncount > 0) {
>> warncount--;
>> printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n",
>> @@ -143,7 +319,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
>> #if BITS_PER_LONG == 32
>> if (stat->size > MAX_NON_LFS)
>> return -EOVERFLOW;
>> -#endif
>> +#endif
>> tmp.st_size = stat->size;
>> tmp.st_atime = stat->atime.tv_sec;
>> tmp.st_mtime = stat->mtime.tv_sec;
>> @@ -225,7 +401,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
>> #if BITS_PER_LONG == 32
>> if (stat->size > MAX_NON_LFS)
>> return -EOVERFLOW;
>> -#endif
>> +#endif
>> tmp.st_size = stat->size;
>> tmp.st_atime = stat->atime.tv_sec;
>> tmp.st_mtime = stat->mtime.tv_sec;
>> @@ -412,6 +588,122 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename,
>> }
>> #endif /* __ARCH_WANT_STAT64 */
>>
>> +/*
>> + * Get the xstat parameters if supplied
>> + */
>> +static int xstat_get_params(unsigned int mask, struct xstat __user *buffer,
>> +    struct kstat *stat)
>> +{
>> + memset(stat, 0xde, sizeof(*stat)); // DEBUGGING
>> +
>> + if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer)))
>> + return -EFAULT;
>> +
>> + stat->request_mask = mask & XSTAT_ALL_STATS;
>> + stat->result_mask = 0;
>> + return 0;
>> +}
>> +
>> +/*
>> + * Set the xstat results.
>> + *
>> + * If the buffer size was 0, we just return the size of the buffer needed to
>> + * return the full result.
>> + *
>> + * If bufsize indicates a buffer of insufficient size to hold the full result,
>> + * we return -E2BIG.
>> + *
>> + * Otherwise we copy the extended stats to userspace and return the amount of
>> + * data written into the buffer (or -EFAULT).
>> + */
>> +static long xstat_set_result(struct kstat *stat, struct xstat __user *buffer)
>> +{
>> + u32 mask = stat->result_mask, gran = stat->tv_granularity;
>> +
>> +#define __put_timestamp(kts, uts) ( \
>> + __put_user(kts.tv_sec, uts.tv_sec ) || \
>> + __put_user(kts.tv_nsec, uts.tv_nsec ) || \
>> + __put_user(gran, uts.tv_granularity ))
>> +
>> + /* clear out anything we're not returning */
>> + if (!(mask & XSTAT_IOC_FLAGS))
>> + stat->ioc_flags = 0;
>> + if (!(mask & XSTAT_BTIME))
>> + memset(&stat->btime, 0, sizeof(stat->btime));
>> + if (!(mask & XSTAT_GEN))
>> + stat->gen = 0;
>> + if (!(mask & XSTAT_VERSION))
>> + stat->version = 0;
>> + if (!(mask & XSTAT_VOLUME_ID))
>> + memset(&stat->volume_id, 0, sizeof(stat->volume_id));
>> +
>> + /* transfer the results */
>> + if (__put_user(mask, &buffer->st_mask ) ||
>> +    __put_user(stat->mode, &buffer->st_mode ) ||
>> +    __put_user(stat->nlink, &buffer->st_nlink ) ||
>> +    __put_user(stat->uid, &buffer->st_uid ) ||
>> +    __put_user(stat->gid, &buffer->st_gid ) ||
>> +    __put_user(stat->information, &buffer->st_information ) ||
>> +    __put_user(stat->ioc_flags, &buffer->st_ioc_flags ) ||
>> +    __put_user(stat->blksize, &buffer->st_blksize ) ||
>> +    __put_user(MAJOR(stat->rdev), &buffer->st_rdev.major ) ||
>> +    __put_user(MINOR(stat->rdev), &buffer->st_rdev.minor ) ||
>> +    __put_user(MAJOR(stat->dev), &buffer->st_dev.major ) ||
>> +    __put_user(MINOR(stat->dev), &buffer->st_dev.minor ) ||
>> +    __put_timestamp(stat->atime, &buffer->st_atime ) ||
>> +    __put_timestamp(stat->btime, &buffer->st_btime ) ||
>> +    __put_timestamp(stat->ctime, &buffer->st_ctime ) ||
>> +    __put_timestamp(stat->mtime, &buffer->st_mtime ) ||
>> +    __put_user(stat->ino, &buffer->st_ino ) ||
>> +    __put_user(stat->size, &buffer->st_size ) ||
>> +    __put_user(stat->blocks, &buffer->st_blocks ) ||
>> +    __put_user(stat->gen, &buffer->st_gen ) ||
>> +    __put_user(stat->version, &buffer->st_version ) ||
>> +    __copy_to_user(&buffer->st_volume_id, &stat->volume_id,
>> +   sizeof(buffer->st_volume_id) ) ||
>> +    __clear_user(&buffer->__spares, sizeof(buffer->__spares)))
>> + return -EFAULT;
>> + return 0;
>> +}
>> +
>> +/*
>> + * System call to get extended stats by path
>> + */
>> +SYSCALL_DEFINE5(xstat,
>> + int, dfd, const char __user *, filename, unsigned, flags,
>> + unsigned int, mask, struct xstat __user *, buffer)
>> +{
>> + struct kstat stat;
>> + int error;
>> +
>> + error = xstat_get_params(mask, buffer, &stat);
>> + if (error != 0)
>> + return error;
>> + error = vfs_xstat(dfd, filename, flags, &stat);
>> + if (error)
>> + return error;
>> + return xstat_set_result(&stat, buffer);
>> +}
>> +
>> +/*
>> + * System call to get extended stats by file descriptor
>> + */
>> +SYSCALL_DEFINE4(fxstat, unsigned int, fd, unsigned int, flags,
>> + unsigned int, mask, struct xstat __user *, buffer)
>> +{
>> + struct kstat stat;
>> + int error;
>> +
>> + error = xstat_get_params(mask, buffer, &stat);
>> + if (error < 0)
>> + return error;
>> + stat.query_flags = flags;
>> + error = vfs_fxstat(fd, &stat);
>> + if (error)
>> + return error;
>> + return xstat_set_result(&stat, buffer);
>> +}
>> +
>> /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */
>> void __inode_add_bytes(struct inode *inode, loff_t bytes)
>> {
>> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
>> index f550f89..faa9e5d 100644
>> --- a/include/linux/fcntl.h
>> +++ b/include/linux/fcntl.h
>> @@ -47,6 +47,7 @@
>> #define AT_SYMLINK_FOLLOW 0x400   /* Follow symbolic links.  */
>> #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
>> #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
>> +#define AT_FORCE_ATTR_SYNC 0x2000 /* Force the attributes to be sync'd with the server */
>>
>> #ifdef __KERNEL__
>>
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 8de6755..ec6c62e 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1467,6 +1467,7 @@ struct super_block {
>>
>> char s_id[32]; /* Informational name */
>> u8 s_uuid[16]; /* UUID */
>> + unsigned char s_volume_id[16]; /* Volume identifier */
>>
>> void *s_fs_info; /* Filesystem private info */
>> unsigned int s_max_links;
>> @@ -2470,6 +2471,7 @@ extern const struct inode_operations page_symlink_inode_operations;
>> extern int generic_readlink(struct dentry *, char __user *, int);
>> extern void generic_fillattr(struct inode *, struct kstat *);
>> extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
>> +extern int vfs_xgetattr(struct vfsmount *, struct dentry *, struct kstat *);
>> void __inode_add_bytes(struct inode *inode, loff_t bytes);
>> void inode_add_bytes(struct inode *inode, loff_t bytes);
>> void inode_sub_bytes(struct inode *inode, loff_t bytes);
>> @@ -2482,6 +2484,8 @@ extern int vfs_stat(const char __user *, struct kstat *);
>> extern int vfs_lstat(const char __user *, struct kstat *);
>> extern int vfs_fstat(unsigned int, struct kstat *);
>> extern int vfs_fstatat(int , const char __user *, struct kstat *, int);
>> +extern int vfs_xstat(int, const char __user *, int, struct kstat *);
>> +extern int vfs_xfstat(unsigned int, struct kstat *);
>>
>> extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
>>    unsigned long arg);
>> diff --git a/include/linux/stat.h b/include/linux/stat.h
>> index 611c398..0ff561a 100644
>> --- a/include/linux/stat.h
>> +++ b/include/linux/stat.h
>> @@ -3,6 +3,7 @@
>>
>> #ifdef __KERNEL__
>>
>> +#include <linux/types.h>
>> #include <asm/stat.h>
>>
>> #endif
>> @@ -46,6 +47,117 @@
>>
>> #endif
>>
>> +/*
>> + * Query request/result mask
>> + *
>> + * Bits should be set in request_mask to request particular items when calling
>> + * xstat() or fxstat().
>> + *
>> + * The bits in st_mask may or may not be set upon return, in part depending on
>> + * what was set in the mask argument:
>> + *
>> + * - if not available at all, the bit will be cleared before returning and the
>> + *   field will be cleared; otherwise,
>> + *
>> + * - if AT_FORCE_ATTR_SYNC is set, then the datum will be synchronised to the
>> + *   server and the field and bit will be set on return; otherwise,
>> + *
>> + * - if explicitly requested, the datum will be synchronised to a server or
>> + *   other medium if out of date before being returned, and the bit will be set
>> + *   on return; otherwise,
>> + *
>> + * - if not requested, but available in approximate form without any effort, it
>> + *   will be filled in anyway, and the bit will be set upon return (it might
>> + *   not be up to date, however, and no attempt will be made to synchronise the
>> + *   internal state first); otherwise,
>> + *
>> + * - the field and the bit will be cleared before returning.
>> + *
>> + * Items in XSTAT_BASIC_STATS may be marked unavailable on return, but they
>> + * will have a value installed for compatibility purposes so that stat() and
>> + * co. can be emulated in userspace.
>> + */
>> +#define XSTAT_MODE 0x00000001U /* want/got st_mode */
>> +#define XSTAT_NLINK 0x00000002U /* want/got st_nlink */
>> +#define XSTAT_UID 0x00000004U /* want/got st_uid */
>> +#define XSTAT_GID 0x00000008U /* want/got st_gid */
>> +#define XSTAT_RDEV 0x00000010U /* want/got st_rdev */
>> +#define XSTAT_ATIME 0x00000020U /* want/got st_atime */
>> +#define XSTAT_MTIME 0x00000040U /* want/got st_mtime */
>> +#define XSTAT_CTIME 0x00000080U /* want/got st_ctime */
>> +#define XSTAT_INO 0x00000100U /* want/got st_ino */
>> +#define XSTAT_SIZE 0x00000200U /* want/got st_size */
>> +#define XSTAT_BLOCKS 0x00000400U /* want/got st_blocks */
>> +#define XSTAT_BASIC_STATS 0x000007ffU /* the stuff in the normal stat struct */
>> +#define XSTAT_IOC_FLAGS 0x00000800U /* want/got FS_IOC_GETFLAGS */
>> +#define XSTAT_BTIME 0x00001000U /* want/got st_btime */
>> +#define XSTAT_GEN 0x00002000U /* want/got st_gen */
>> +#define XSTAT_VERSION 0x00004000U /* want/got st_version */
>> +#define XSTAT_VOLUME_ID 0x00008000U /* want/got st_volume_id */
>> +#define XSTAT_ALL_STATS 0x0000ffffU /* all supported stats */
>> +
>> +/*
>> + * Extended stat structures
>> + */
>> +struct xstat_dev {
>> + uint32_t major, minor;
>> +};
>> +
>> +struct xstat_time {
>> + int64_t tv_sec;
>> + uint32_t tv_nsec;
>> + uint32_t tv_granularity; /* time granularity (in nS) */
>> +};
>> +
>> +struct xstat {
>> + uint32_t st_mask; /* what results were written */
>> + uint32_t st_mode; /* file mode */
>> + uint32_t st_nlink; /* number of hard links */
>> + uint32_t st_uid; /* user ID of owner */
>> + uint32_t st_gid; /* group ID of owner */
>> + uint32_t st_information; /* information about the file */
>> + uint32_t st_ioc_flags; /* as FS_IOC_GETFLAGS */
>> + uint32_t st_blksize; /* optimal size for filesystem I/O */
>> + struct xstat_dev st_rdev; /* device ID of special file */
>> + struct xstat_dev st_dev; /* ID of device containing file */
>> + struct xstat_time st_atime; /* last access time */
>> + struct xstat_time st_btime; /* file creation time */
>> + struct xstat_time st_ctime; /* last attribute change time */
>> + struct xstat_time st_mtime; /* last data modification time */
>> + uint64_t st_ino; /* inode number */
>> + uint64_t st_size; /* file size */
>> + uint64_t st_blocks; /* number of 512-byte blocks allocated */
>> + uint64_t st_gen; /* inode generation number */
>> + uint64_t st_version; /* data version number */
>> + uint8_t st_volume_id[16]; /* volume identifier */
>> + uint64_t __spares[11]; /* spare space for future expansion */
>> +};
>> +
>> +/*
>> + * Flags to be found in st_information
>> + *
>> + * These give information about the features or the state of a file that might
>> + * be of use to ordinary userspace programs such as GUIs or ls rather than
>> + * specialised tools.
>> + *
>> + * Additional information may be found in st_ioc_flags and we try not to
>> + * overlap with it.
>> + */
>> +#define XSTAT_INFO_ENCRYPTED 0x00000001U /* File is encrypted */
>> +#define XSTAT_INFO_TEMPORARY 0x00000002U /* File is temporary (NTFS/CIFS) */
>> +#define XSTAT_INFO_FABRICATED 0x00000004U /* File was made up by filesystem */
>> +#define XSTAT_INFO_KERNEL_API 0x00000008U /* File is kernel API (eg: procfs/sysfs) */
>> +#define XSTAT_INFO_REMOTE 0x00000010U /* File is remote */
>> +#define XSTAT_INFO_OFFLINE 0x00000020U /* File is offline (CIFS) */
>> +#define XSTAT_INFO_AUTOMOUNT 0x00000040U /* Dir is automount trigger */
>> +#define XSTAT_INFO_AUTODIR 0x00000080U /* Dir provides unlisted automounts */
>> +#define XSTAT_INFO_NONSYSTEM_OWNERSHIP 0x00000100U /* File has non-system ownership details */
>> +#define XSTAT_INFO_HAS_ACL 0x00000200U /* File has an ACL of some sort */
>> +#define XSTAT_INFO_REPARSE_POINT 0x00000400U /* File is reparse point (NTFS/CIFS) */
>> +#define XSTAT_INFO_HIDDEN 0x00000800U /* File is marked hidden (DOS+) */
>> +#define XSTAT_INFO_SYSTEM 0x00001000U /* File is marked system (DOS+) */
>> +#define XSTAT_INFO_ARCHIVE 0x00002000U /* File is marked archive (DOS+) */
>> +
>> #ifdef __KERNEL__
>> #define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO)
>> #define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO)
>> @@ -60,6 +172,12 @@
>> #include <linux/time.h>
>>
>> struct kstat {
>> + u32 query_flags; /* operational flags */
>> +#define KSTAT_QUERY_FLAGS (AT_FORCE_ATTR_SYNC)
>> + u32 request_mask; /* what fields the user asked for */
>> + u32 result_mask; /* what fields the user got */
>> + u32 information;
>> + u32 ioc_flags; /* inode flags (FS_IOC_GETFLAGS) */
>> u64 ino;
>> dev_t dev;
>> umode_t mode;
>> @@ -67,14 +185,18 @@ struct kstat {
>> uid_t uid;
>> gid_t gid;
>> dev_t rdev;
>> + unsigned int tv_granularity; /* granularity of times (in nS) */
>> loff_t size;
>> - struct timespec  atime;
>> + struct timespec atime;
>> struct timespec mtime;
>> struct timespec ctime;
>> + struct timespec btime; /* file creation time */
>> unsigned long blksize;
>> unsigned long long blocks;
>> + u64 gen; /* inode generation */
>> + u64 version; /* data version */
>> + unsigned char volume_id[16]; /* volume identifier */
>> };
>>
>> #endif
>> -
>> #endif
>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> index 3de3acb..ff9f8d9 100644
>> --- a/include/linux/syscalls.h
>> +++ b/include/linux/syscalls.h
>> @@ -45,6 +45,8 @@ struct shmid_ds;
>> struct sockaddr;
>> struct stat;
>> struct stat64;
>> +struct xstat_parameters;
>> +struct xstat;
>> struct statfs;
>> struct statfs64;
>> struct __sysctl_args;
>> @@ -858,4 +860,9 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
>>      unsigned long riovcnt,
>>      unsigned long flags);
>>
>> +asmlinkage long sys_xstat(int dfd, const char __user *path, unsigned flags,
>> +  unsigned mask, struct xstat __user *buffer);
>> +asmlinkage long sys_fxstat(unsigned fd, unsigned flags,
>> +   unsigned mask, struct xstat __user *buffer);
>> +
>> #endif
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [hidden email]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas





Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/6] xstat: Add a pair of system calls to make extended file stats available

David Howells
In reply to this post by Andreas Dilger-7
Andreas Dilger <[hidden email]> wrote:

> > The idea was initially proposed as a set of xattrs that could be
> > retrieved with getxattr(), but the general preferance proved to be
> > for new syscalls with an extended stat structure.
>
> I would comment that it was the opposite.  It was originally a
> stat()-like extension that degraded into a messy getxattr() mess.

Ummm...  No, my first attempt was definitely through getxattr().  You even
commented on it.

> > The fields in struct xstat come in a number of classes:
> >
> > (0) st_dev, st_blksize, st_information.
> >
> >     These are local data and are always available.
>
> For the extra two bits it would cost us, I don't think st_blksize
> and st_information should always be returned.

Fair enough.

> st_blksize may be variable for a distributed filesystem,

I wonder if there's a way to make this explicit - or is it something that if
the bit isn't set, you can't use the value in st_blksize.  I wonder if this
value always has to be non-zero to make sure existing stat() doesn't explode.

> and some of the fields in st_information (offline) may not be free to access
> either.

True.

David
12345