Is power meter consistency important?

Shane Miller’s (@GPLama) recent instagram post reminded me of something I’ve long been asking about: Is power meter accuracy important? Or is it only important that it’s accurate and consistent against itself?

I used to think that a powermeter just needs to be consistent, and be relatively close. But after years of using a powermeter, going back to my first powertap in the early 2000’s, I now have 25yrs of data that I’ve collected. Now, after having a longterm view of my own peak power levels, when I look back I have no idea if I’ve improved or deteriorated in some areas or not (I can only assume I’m getting worse with age). As Shane states to in his post, the head unit is another factor in the data’s accuracy.

I’m curious about other perspectives on this topic and how to interpret longterm performance trends when various power meters have been used, as well as head units. When I see that my FTP is roughly the same as it was 20yrs ago, it makes me happy. But I also doubt it’s true…

4 Likes

It depends on what you’re using the data for but, mostly, most riders don’t do much with their data so they don’t need much accuracy or precision. Most riders use power data only for training, and training is one of the least demanding things you can do with power data: after all, riders trained effectively with a wristwatch and a known road segment or hill, or later with HRMs. There are things that do require accuracy and precision, but most riders don’t yet do them. Most riders use power meters to train almost exactly the way they used to use HRMs to train: they train for time in zones, and if that’s the way you’re using power data you shouldn’t be surprised if your results look pretty similar to your results if you trained using HR.

The thing is, if you collect power data long enough, as you have, you’ll eventually have some question or puzzle or conundrum that you can’t easily resolve (if it were easily resolvable, you wouldn’t be puzzled). In those cases, you don’t want yet another question: “was my power meter (or my power meters) giving reliable information?” That’s why we often read pleas from desperate riders with some puzzle about their single-sided power meter: they initially bought it because they were told that consistency was all that mattered, but eventually they got a question that couldn’t be solved by consistency alone; or they buy a smart trainer and then can’t figure out which one is right, or how it affects their historical data.

Power meters are expensive so if all you’re doing with one is training, you’re leaving capability on the table. To measure drag, you actually do need both accuracy and precision. Lots of riders think they don’t need to worry about drag reduction because they’ll make it up with increasing power – but the stop watch and finish line don’t care whether you got faster because of more power or less drag, and the rules allow you to do both.

The simplest possible case is for riders who think they’ll never do anything other than train, and think their first power meter will be their only power meter forever. But if you ever buy a second power meter, or a smart trainer, or in my case, have to replace a stolen bike and stolen power meter, you’ll want to know that your second power meter will agree with your first–and the easiest way to do that is to be sure that both are accurate.

7 Likes

Thanks Robert! Very interesting points. I’m one of those riders who simply trains with a powermeter, gives it to my coach so he can make sense of it (shoutout to Neal Henderson currently), and frequently tells him grand stories how much better I used to be.

I didn’t consider powermeter accuracy in the context aero testing. I can see why accuracy matters there.

1 Like

We’ve long known what it takes to get to the finish line in front of the next guy: more power, less drag, or smarter tactics. A good accurate power meter can help you with all three, and two of those are things you can’t do with just a wristwatch or a HRM. If you’re only using power data for training FTP, you don’t need much accuracy but then you’re not doing everything you could to get faster.

I’m much less powerful than I used to be, and also heavier. I know this because I check the accuracy of my power meters and bathroom scales. The good news is I have accurate power records since the last millennium. The bad news is I have an accurate record of my decline. So I’m old, fat, and slow. But if I do everything I can, there are moments when I’m just old and fat. When those moments arrive they’re pretty sweet.

4 Likes

I think we also discussed this recently in geek-warning. Internal consistency is definitely more important for training & performance monitoring, at least until we start switching power sources.

Accuracy is important if we’re interested in monitoring energy expenditure or metabolic efficiency (metabolic watts, kJ, kcal), or any of the external modelling Robert talks about.

Curious @Jonathan_Baker do you have any thoughts from the Pro side?

2 Likes

I have power data going back just over 20 years and HR data about 30. Through migration most of it is on WKO5. What I find interesting is to look at the time vs power on segments. I had known lap start and stop points pre GPS and through a small amount of spreadsheet geekery can include those. The big picture is that I can see a steeper power decline than time decline. This is likely do to greatly improved aerodynamics and rolling resistance as the years progressed.

That’s very possibly true; though remember that power scales with the cube of speed so you will ordinarily see a much larger decline in power than decline in speed even if aerodynamics and rolling resistance hadn’t improved. In order to figure out whether aero and/or rolling resistance really had improved, you need accuracy of both speed and power.

Is accuracy needed for aero testing if I’m always using the same device to measure power? In my mind staying on the same bike and testing a bunch of different helmets/positions/ whatever I should at least be able to find what’s fastest for me. Is that correct or have I missed something about how the Chung method works?

1 Like

Hmmm. That’s a deep question. The reason why the approach I use is relatively fast and precise is because we can distinguish rolling and aero resistance, taking into account acceleration (and deceleration) and changes in slope. Being able to distinguish between the two relies on being able to distinguish between something that varies linearly with speed and something that varies with the cube of speed, while removing the effect of acceleration and slope. Distinguishing between those two depends both on a wide range of power and speed, and enough accuracy so you don’t get fooled by something that looks like it can vary with square of speed or maybe the 4th power of speed. If you do see a square or quartic component, then that means there was something odd about the test, so you want to chase that down.

So, it turns out that one of the methods I developed was high precision coast down tests, where you don’t need a power meter at all: if you’re coasting, you know the power is zero. The trade-off for this is that coast down testing takes longer, and the longer you’re out there testing, the more things can go wrong: the air density can change, or the temperature can change, or the wind can pick up, or you can get tired and lose your concentration and ability to hold your position steady. In addition, if you know power is zero, then all of the ability to distinguish changes in equipment or position depends on accuracy and precision in speed measurement, which is why I recommend a wheel speed sensor rather than GPS speed. Years ago when I first came up with this, a guy said it was too much trouble and he figured he’d just go out to a long steep hill, coast down it and part way up the other side, and make a chalk mark where he rolled to a stop: if he rolled further, that meant he had less drag. This is essentially what Jan Heine did for many years. That turns out to not be very high precision, and if the weather changed or there was a puff of wind, or you couldn’t hold your position exactly constant, you couldn’t compare across different days. That’s why Jan’s claims about things that made him fast were always a tad suspect.

It turns out that shortly after the Stages single-sided PM came out, riders started asking me if they really needed accuracy. The complicated answer is that if they’re looking for big changes in drag, like the difference between a road bike and a TT bike, they could easily tell that one was slipperier than the other; but if they were looking for small differences in drag, like the difference between two pairs of socks, the results were noisy enough that they couldn’t reliably tell them apart. Virtual elevation is sort of like a microscope that makes details look bigger: it enlarges the differences between different configurations so you can detect them much more easily than other methods. So if you’re looking to resolve small differences, you need accuracy and precision in a host of measurements: power and speed mostly, but to lesser extent also in total mass and air density.

I guess the bottom line is that without accurate power (and speed) you might be able to tell the difference between two changes in configuration, but it will take you many more runs, so much more time, and you may have difficulty distinguishing between small changes. So there are situations where you can get away with not having accuracy in your input measures, but it’s so much faster and easier if you do.

I tell people that if their power meters are suspect, they should just do variable speed coast downs on a shallow hill with a wheel speed sensor. That’s a case where you know the power with certainty.

Aren’t you sorry you asked this question?

1 Like

Thanks Robert! Not sorry I asked at all - that’s exactly the sort of detail I was hoping for. Glad you singled out Stages too as that’s the PM I have :grinning_face:

In that case, why don’t you ask on the “Performance – Aero Testing” topic about coast down protocols. That’d be a good place to put it so people can find it later.

@Jem_Arnold - thanks for the tag, and apologies for the delayed reply. I took a brief post-TDF holiday to, incidentally, visit my good friend Uli Schoberer (the inventor of the [modern] powermeter).

Before I start, @Robert_Chung has covered a lot of good stuff already, so I’ll try not to overlap too much.

From professional cycling, I think there’s an acceptance that power metres do have a little variation between devices, even of the same model. And we (reluctantly) accept this variation.

Why? Because when a team has around 80 riders (Men’s, Women’s, WT, development, etc), and I’d expect 6-800 power meters (maybe more) – it’s almost impossible to be super fussy about the last few watts. They’re either “close enough” or clearly “wrong”. When one looks wrong, basic checks are performed, including batteries and zero-offset. If it’s still wrong, it’s replaced.

My personal view is that for general training, “close enough” is good enough. I appreciate the day-to-day variation in both measurement and human ability. Indeed, I triangulate a few things to guide training – typically, power, heart rate and perceived effort.

For power and heart rate, I have “zones” set up on my Wahoo head unit, and it displays both the number and the zone as a colour. I use the colours as my guide for training. E.g., for “endurance training” I want to see blue or green (from HR), for “threshold intervals” it’s orange (from power). I also look at both power and HR, and how they’re tracking relative to each other (and perceived effort) and adjust based on that feedback too. I don’t want to get into the weeds here, but this helps to manage effort based on fatigue, temperature, fuelling etc.

I have another page set up with a histogram for time in zone by power and heart rate. I get a “quick look” into whether I have been doing what I intended to do in this session.

So while riding, I’m not micromanaging things too much. I’m using the numbers and colours to guide my efforts, and I’m letting my brain and legs figure the rest out. I’m the same with “time” too. While I may plan to do 10-min intervals, they’re never to the exact second. +/- 20% is fine.

Where I do want accuracy and precision is in the saved data. As discussed earlier, I want to be able to answer those deep questions and if the data is clearly wrong, that’s bad. For this reason, I’m lucky (thanks Uli) to have 4x SRM powermeters which all get checked for offset and slope calibrations*.

I currently have a Garmin in my jersey pocket as a data logger collecting GPS, power, HR, ventilation, core temperature and (sometimes) muscle oxygen data. I do not look at this daily, but I might want to look at it over time and so I ensure that everything records well.

So in short: Day-to-day: Accuracy matters less to me as a training tool. Long-term: Accuracy is super important, as it enables us to answer questions from our data.

*the biggest source of inaccuracy, in my opinion, is people calibrating their power meters. It should be performed very carefully, and only when something is clearly wrong. NOT before each ride, or interval. I think over-calibration adds more uncertainty than an inaccuracy in a power meter itself. If it’s working fine – leave it alone.

5 Likes

Thanks! Very informative.

Yeah, I think the best power meter is the one that is mostly right, and when it’s wrong, it’s very obviously wrong. And the worst PM is the one that is always very subtly wrong in unpredictable ways. I’ve owned both of those PMs :smile:

Interested in your last point about over-calibration. And just to clarify, do you mean the common pre-ride single-point zero offset? Or a “full” intercept & slope calibration, like you’re describing for SRM?

I’ve always just accepted the common advice to zero-offset before each ride. Is the concern about a directional drift over time each time we re-zero? Or that we are randomising the intercept and thus adding a bit of uncertainty when comparing any two rides to each other? What about the PMs which frequently auto-”calibrate” during rides?

Cheers

2 Likes

Just speaking in theory, what might a team do if their sponsor’s power meter is wrong most of the time, and the sponsor requires them to use only that PM in races? Like, imagine the sponsor is a large component manufacturer.

@Weiwen_Ng it’s normal for teams to use whatever material a sponsor supplies. This goes from frames, to wheels, to groupsets, to power meters. Accuracy and concisetency have been discussed earlier in the thread. It goes a bit further than simply right, or wrong.

@Jem_Arnold I like the idea of right, and obviously wrong. No grey areas, please. That makes life easier.

I can’t talk to all other power meters, as I’ve only used SRM’s for the last 12 years, but my zero-offset is very stable. They have an auto temperature correction built in which, seems ot do a good job. If I went to check my bike in the hallway, I’d guess the offset is 555 +/- 2-3 mV (as it always is). That’s about a watt difference.

My comments about over-calibration were about zero-offset. I think many people think it’s a magic button that turns wrong into right. But it’s more nuanced than that.

Getting absolutely zero load on the strain gauge, when you have a chain attached to a wheel (and the chainring), is not so easy. Especially when outside in the real World.

Is for floor flat so the bike weight is not tensioning the chain. Are the cranks perfectly top/bottom dead centre? Are the pedal bearings free enough so they hang at the same angle? Is there nothing touching anything that shouldn’t be touched? Have you waited long enough for the strain gauges to reach environmental temperatures? Is it colder first thing when you calibrate it, than later when you might do intervals?

I prefer to leave it alone, and only check and maybe adjust things very carefully.

4 Likes

Case in point below. Same ride. Same power meter. Very different data:

1 Like

This reminds me of long ago when each PM manufacturer also had to make a proprietary head unit to record the data. It was clear that some head unit/PM combinations were doing oddball things with the data but there wasn’t a simple way to identify whether the problem was in the data that were being sent or the data as they were being received and stored. I had hoped that when the ANT+ standard was being widely adopted and head units started getting standardized that it would be easier to figure out what was happening. This seems like a step backward, but interesting from an analytical point of view. I’d like to run them through a VE comparison, if you get a chance.

That said, there were certain speeds you never saw in an old SRM file (like, you’d see 22.7 km/h and 22.9 km/h but you’d never ever see 22.8 km/h), and this was true across many many different riders and head units and ride lengths and generations of SRM models.

2 Likes

@GPLama - Interesting. I assume this is two different head units?

Again, I don’t know how all power meters work (that’s for people like yourself), but my older SRM PM7 sends torque and cadence data to the head unit, and the head unit calculates the power (using stored offset and slope values).

My newer SRM PM9’s computes torque and power internally, and sends the power value directly to the head unit, which simply displays that value. I’d assume this would / should remove the error you see in that image.

Except, maybe, a small offset due to not being able to press “start” for recording on two headunits at exactly the same moment.

Edit: I have some data dual-recorded on a PM9 and sent to two different devices (a Wahoo Roam 3 and a Garmin 840). It’s certainly close, but not as perfect as I’d expect. Interesting.

These files were downloaded from Strava, and TrainingPeaks (the latter being my preferred storage location, as I use WKO5 a fair bit). I note it would be better to compare the FIT files taken directly from the device, to remove any processing errors from the software used. Note the Wahoo file size difference between data exported from Strava (256 kb) vs TrainingPeaks (442 kb). The Garmin file is likely larger because I have more things paired to that device (on this day, a Tymewear ventilation strap).

So I think the discussion is about whether the “error in power” is due to a measurement error in the power meter itself, how it’s calculated/displayed on a head unit, and then on a software platform. It’s not simply X power meter is good/bad. There are additional layers to consider. It’s the whole power recording and displaying ecosystem.

1 Like

@Robert_Chung - Interesting about the 22.8 km/h. Was that with a speed sensor, or a GPS chip? The latter are improving all the time. I have 10 Hz (and had a 100 Hz) GPS sensor sitting on my desk.

I’m not a protocol expert. But with ANT+ (potentially) going away, maybe BT will improve things. But let’s not hold our breath.

I think using the same technology, in a consitent way, is probably good enough.