Daemon News Ezine BSD News BSD Mall BSD Support Forum BSD Advocacy BSD Updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why is PCE not set in CR4?



[ First posted to freebsd-questions and freebsd-ia32 ]
[ Add freebsd-hackers which I hope is appropriate    ]

The References: and In-Reply-To: headers are missing from this message. If your mail client does not thread it correctly, please accept my apologies. Before mailman, I could display messages in raw format, with full headers, e.g.

docs.freebsd.org/cgi/getmsg.cgi?fetch=2771331+0+archive/2003/freebsd-questions/20030928.freebsd-questions+raw

I've been playing with my Athlon's timestamp counter for a while,
and I would like to experiment with the performance-monitoring
counters now.

I can execute the RDTSC instruction from ring 3 because the TSD
(TimeStamp Disable) bit in CR4 (Control Register 4) is cleared.

However, I am not allowed to use the RDPMC instruction from ring 3
because the PCE (Performance-monitoring Counters Enable) bit is not set.

You can do it with /dev/perfmon. man 4 perfmon.

I have read the perfmon documentation and source code. For several reasons, I do not think it is totally adequate in my situation.

It was designed in 1996 with the Pentium Pro in mind, which, apparently, only has two performance counters:

  #define NPMC 2
  if (pmc < 0 || pmc >= NPMC) return EINVAL;

I mentioned kernel modules because I want to avoid having to recompile my kernel. Even if I did set NPMC to 4 and recompiled, I am not convinced that perfmon would still work.

void perfmon_init(void)
{
  ...
  case CPUCLASS_686:
    perfmon_cpuok = 1;
    msr_ctl[0] = 0x186;
    msr_ctl[1] = 0x187;
    msr_pmc[0] = 0xc1;
    msr_pmc[1] = 0xc2;
    writectl = writectl6;
    break;

/* if NPMC>2 then msr_ctl[] and msr_pmc[] are not completely
* initialized, is this a problem? */

Assume I get perfmon to work with my K7's 4 performance-monitoring counters. Since PCE is not set, I am not allowed to call RDPMC from ring 3. I have to make a system call, just to read the counters.

I will pay in terms of computation overhead to process a system call, instead of a single instruction. But more importantly, it will wreck the cache, and possibly the TLB.

There is no point in monitoring an event if the monitoring tools disturb the environment too much.

Is there a reason (security? performance? other?) why FreeBSD does
not set PCE at boot time?

Is it just an oversight that FreeBSD does not set PCE at boot time, or is there a reason?

I can provide a patch if nobody opposes the idea. Or write a kernel module that will do it when loaded.

On a related subject, is there a way for a kernel module to catch a
general-protection fault caused by an application trying to execute
RDMSR or WRMSR, and have the kernel module execute the instruction
for the application? Or is it cleaner to register two new system
calls to achieve the same thing?

That would (probably) require adding superuser-configurable permissions
to read/write to a specific MSR, as some of them are critical. I doubt
it's worth creating extra device nodes, and I wonder if there's a
"cleaner" way to do that.

My intent is to allow an application access to the 4 performance monitoring control registers ONLY. The application would try to execute WRMSR (a privileged instruction) which would cause a GPF. The kernel module would catch the fault, sanity-check the arguments, and proceed with the WRMSR when the arguments are valid.

Could you point me to some documentation, or is the source the only documentation available in this situation? :-)

--
Shill (shill at free dot fr)