Carrier IQ: A look inside the brave new world of carrier phone tracking

Posted on by Oliver Khademi message icon

0 Comments Carrier IQ: A look inside the brave new world of carrier phone tracking

 
in Android, Apple, Bada, Blackberry, FEATURED ARTICLES, Google, HTC, Internet, iPhone, Latest Stories, LG, Microsoft, Most Popular stories, News, Nokia, RIM, Sony Ericsson, Verizon.      

You may have heard of the “internet of things,” a vision of the future where cheap sensors are everywhere, and they allow machines to automatically track everything at all times. Over the last few days, we got an eye-opening look into that future thanks to a company called Carrier IQ. Founded in 2005, Carrier IQ provides remote tracking data to cellular network operators including AT&T, Sprint and T-Mobile, and its software has been loaded on over 141 million phones, primarily in the United States. You’d expect a cellular operator to have access to your phone number, name, address, and billing information, and even be able to see your calls and text messages while you’re connected to the network, no? Well, Carrier IQ takes things a step further by tracking your device even when it’s not connected, and can deliver things you might not expect it to, such as the apps you’re using and the secure URLs you visit in your cellphone browser.

There are plenty of accusations flying around, and plenty of confusion about just what it is that the company does with this data, what kinds of data it collects, and why Carrier IQ’s partners secretly bury the software deep within the operating system rather than asking users to opt into the program. That’s why we sat down face-to-face with Carrier IQ at the company’s Mountain View offices this weekend, where we had a surprisingly open and detailed two-hour conversation with VP Andrew Coward about nearly everything the company does.

It may not surprise you to learn that Carrier IQ claims to not have final control over the data it collects for cellular carriers, but what you might not know is that the 112-employee company actually has two different business models. One merely provides anonymous radio data to the carriers about dropped calls and the like, to help the networks troubleshoot issues… but the other, combined with the data a cellular carrier collects by itself, can uniquely identify a user so that the carrier can individually troubleshoot their phone’s performance and battery life by suggesting, for instance, which particular apps a user should uninstall.

It might also surprise you to know that Carrier IQ may be installed on more devices than have already been uncovered. The company actually has two different models for collecting data: the first is built directly into the operating system, while the second is more of an aftermarket solution that can be installed by the OEM or carrier. It’s only the latter that has seen widespread investigation, but Carrier IQ has been around for six years and has been installed on over 141 million devices in that time. Which devices? Carrier IQ literally won’t say: the company cites its contracts with carriers as the reason it cannot tell you whether or not its software is installed on your phone. Even so, it’s seriously troubling to hear a company flat-out refuse to tell you on which phones its tracking software is installed and with which carriers and OEMs it has partnered. All too often, on issues of disclosure, data privacy, and technical implementation, Carrier IQ shifted responsibility onto its un-named partners.

As we revealed over the weekend, Carrier IQ claims that it is not the source of the insecure log files discovered on HTC devices. Other technical details — including how exactly Carrier IQ stores and transmits its data and how carriers utilize it — are both comforting and disquieting by turns. Although more secure and less nefarious than originally feared, there may still be ample opportunity for malware to access its data. At the very least, how Carrier IQ’s software is implemented on various devices needs wider scrutiny from both security experts and regulators.

You can read the entire transcript for yourself below, but in our opinion, the biggest takeaways are that Carrier IQ and its client operators have logical reasons for taking most of the information they do — and mind you, many forms of personal data, like the contents of SMS and emails, aren’t being tracked at all, and no data is tracked in real time — but by the same token, it feels like there may be a lack of oversight when it comes to mobile privacy.

In terms of deciding what your software does and does not track, is that something you develop and pitch to the carriers, or is that something they carrier comes to you and asks for?

Andrew Coward: It’s a combination of the two. Operators don’t necessarily know what’s available, or what can be gathered, but as they start thinking through what kind of business problem they have, they’ll often ask us “Can you do this?” Some of that will be about collecting new types of information from the device, but oftentimes it’s how you process it on the backend and what you’re doing with it.

For example, knowing where dropped calls are happening, and where you’re roaming from one operator to another — when you shouldn’t be roaming at all — came to be really interesting use cases for a number of our customers.

Where is this information actually processed?

Some of the larger operators prefer to have their own systems, in their own data centers… we’re pretty ambivalent. What’s the fastest way to get you results? You set it up, we set it up, in your data center or in our data center… we don’t care.

You definitely keep your own data centers, but what’s your policy for keeping user data?

First of all, we have no rights to the data that’s gathered, and that’s a pretty important point. We’re acting on behalf of our customer, the operator. Data ownership is a very complex area, so talking about rights is much easier, and who owns the data tends to become a philosophical question if you’re not careful.

Under the contracts we have with operators, we are not allowed to do anything ourselves with the data that is gathered. We are not allowed to resell it, we cannot process it in different ways, we can only do what’s been asked for. There’s no sort of third use, if you like, for the data.

How long do you keep it?

That’s a function of the agreement we have with the operator, and on average, it’s about 30 days. What’s interesting is that the data degrades really quickly, so the fact that you had a dropped call yesterday or last week is kind of interesting, but if you had a dropped call or an application crash six months ago, it’s just not important. The individual data records about what happened have no relevance over a very short period of time.

“OUR CUSTOMERS ARE EXTREMELY SENSITIVE TO WHAT THEY’RE ALLOWED TO COLLECT”

Now, our understanding is that this data isn’t strictly anonymous…

There are two use cases for our software: anonymous or non-anonymous. In an anonymous mode… by the way, the two things that come off your phone to identify you are the IMEI number and the MSID number.

Do you as Carrier IQ ever get anything more than the IMEI or MSID?

We’ll see phone numbers of things that are dialed, but we don’t see your name or other details. Obviously the carrier can marry those two things together, but that’s an important construct because if those two don’t match… if you’re swapping the SIMs out, you’re no longer a customer.

In an anonymous mode, we don’t send to the operator the IMEI or MSID number, they get obfuscated in a one-way hash so they cannot be reverse engineered. And that use case is around network management and all the things that are around the performance of the network, but obviously you can’t do customer care if you don’t know who someone is. Where things become interesting is in the customer care scenario. You’d say “My phone battery lasts for three hours, what’s wrong?” and from that information that we’ve been able to gather off the phone, we’d be able to provide an understanding of why you’re having a bad experience.

Do you have a code of ethics for your company, or was there ever a moment where a carrier said “We want this thing,” and you said no, we’re not comfortable tracking that?

Our customers are extremely sensitive to what they’re allowed to collect and what they’re not allowed to collect — you’re talking about huge companies that have their entire public image at stake.

Could we go change our software to go get things that cross that line? That line for us is around content, and it’s never been crossed, and it’s never been asked to be crossed, and that’s why we were so outraged by the accusations, because getting a URL, for example, is very different to looking at the content of a webpage.

Why do networks need Carrier IQ at all? Which things are carriers unable to do themselves with the information they already collect by virtue of having a network?

When we founded the company — and I wasn’t one of the founders, I’ve only been here about a year and a half — there was an understanding that there was a huge mismatch between what the operators were saying about dropped calls, maybe one percent, and the surveys they were quoting from Nielsen or JD Power which would measure user perception of about 8 or 10 dropped calls for every hundred calls. So you have this massive disconnect, and looking at this problem, we wondered where was the best place to be to understand the real user experience.

Providing that absolute knowledge, and being able to say not just what happened, but kind of why it happened, became really important.

There are a couple of places, obviously, where the network just never knows that you’re having a problem — when you have no service, when there’s congestion on the network and you can’t even reach the towers at all, and the handoffs between towers is incredibly complex and often goes wrong. For example, if you cross a bridge and pick up a radio tower 20 miles away, not even close to you, when you get to the other end of the bridge the call drops.

How does your software actually get onto a phone? Who puts it onto the phone, and at what stage of the development process?

So there are two types of software deployment that we have, there’s what we call the integrated model where we work together with the handset manufacturer. Every manufacturer is different, meaning that we say “here are the list of things that this operator wants, let’s work together to implement them,” and we have a standard reference model for our software.

So a carrier says we want information A, B, and C, and so you go to the manufacturer and say “let’s work together to get that information for the carrier.”

Yes, that’s right, and we have a standard list of metrics, and we try to avoid a swiss cheese problem where different devices can obtain different pieces of data. We have a reference architecture, a referencing porting layer where we say “we suggest you do this” if you’re using a particular chipset, so in that way we help OEMs deliver that information to us through the porting layer. So that’s one model of deployment.

The second, which we developed about 18 months ago now, was an aftermarket model. Obviously, if a phone ships and we sign a new contract with its manufacturer tomorrow, we’ve got to wait for new phones to ship because the integration process takes time. So we can be waiting a long time for the first customer to be turned on. So what we did was we produced an aftermarket product, primarily for Android, and we did some other ones too, but the one that got the most traction is Android. You can download it aftermarket. The other thing you can do if you’re an operator and you want to get going quickly is you can preload it on the phone. The aftermarket software uses standard APKs, so we’re not getting anything that isn’t available to any app that runs on the phone. To get it going, it can be pre-installed on the phone before it ships, but it’s not the same as the embedded system variety.

With the downloadable, we know what happened. With the embedded version, we know why it happened.

Within the Android APK framework, for example, you get to know there’s a dropped call and you see that. We get to know application changes and things like that, but you don’t get to know the deep radio things that sit behind that, so the crown jewels, the golden nuggets for the operators, if you like, are in that radio data. If you started deploying our software now, you’d go “My goodness, we’ve got this huge problem, but I don’t know how to fix it.” Getting to that radio is the key.

Is the downloadable version something that could be rolled out with an over-the-air update, or would it have to be pre-loaded by the manufacturer or carrier on a physical device? For that matter, does such a thing happen at the OEM or the carrier?

Either. The operator might decide that, but the OEM is going to pre-load it, or there might be a staging place where it gets pre-loaded.

So it’s easy. Is it easy enough that it could be an over-the-air update?

We have maintenance releases — MR — and an MR completely changes the system image of the phone, and the radio interface too, but it is possible aftermarket to introduce our software in either mode. For example, if there’s a problem with the phone and our software doesn’t get shipped in the first release, it might get placed in a subsequent update.

Have you ever had a customer that wasn’t a carrier? Say, a customer telling you, “I’m just an OEM, I made these phones and I want to get this information to make better phones, but I don’t care about AT&T and Sprint; I just want to have this data myself.”

Yes, we have. It’s a tiny portion of our business, but I guess…

And I imagine you can’t name them…

No.

Fair enough. So I guess we need to start asking you about things that users are freaking out about over this past week, and one of the big ones has been “I didn’t agree to this kind of data collection, I never signed a terms of service, I never clicked a checkbox.” I know many people don’t read those anyways, but it seems like there have been cases with certain carriers and certain phones where this data is getting collected without any disclosure to the user about that happening. Is that true? Your name does not appear in these Terms of Service or privacy policy agreements.

Yes, you’re right. We trust the operators to ensure that they have the correct policies in place with their users.

If it’s a customer who is using your software-as-a-service solution where you’re the one who’s actually collecting the data, do you have a policy that requires some sort of disclosure to tell the customer that you’re collecting data, or is it still up to the operator?

We don’t see any difference between our software operating inside the data center of the operator versus in our own facilities. Essentially the two are the same from a customer perspective. The results are the same, our relationship with the carrier is the same, it’s just the semantics of where the hardware is. We’ve even got operators who own their own hardware in our data center.

It’s mentioned in one of your videos that Carrier IQ would be opt-in, similar to the Google location policy: when you start up your Android phone, if you want better navigation and to help Google improve navigation data, you check the box — but there’s nothing like that with Carrier IQ. Was this the way you wanted it to go? Was it supposed to be there, and at some point an OEM or a carrier said, “We don’t want to do that, we just want to collect this anonymously and let our privacy policy take care of it?”

We have a very open framework about how our software gets implemented, and as operators want to use opt-in, if they want to, that’s something that can be implemented. They can do that on the web service side, and they can do that on the agent. It’s not something that’s a problem because it’s within the framework.

So would you pull for the software becoming opt-in, especially given the recent confusion about what Carrier IQ does? Is this something where you were prefer that it be opt-in from now on, or would that be detrimental to the model if people heard about this, decided they weren’t going to opt-in, and suddenly you had no data?

Yeah, from our perspective we agree that the user should understand what information is being collected, and the form that takes. There should be a good understanding of what information is being collected from the device.

Do you feel that’s happening now, that users have a good understanding of what information is being collected?

Well, given the press that’s taken place over the past week, it suggests not…

(all laugh)

It’s that “My goodness, you know where I am?” Well, we’re a mobile phone company, and we have to route the call to you. If you think about it from the operator’s perspective, they’re kind of stuck between a rock and a hard place. If the phone calls drop, or you call customer service and spend ten minutes telling them the make and model of the phone you’ve got, they’re not giving good customer service. So when they try to solve for that, they end up getting shouted at for getting information that enables them to deliver a better service and customer care. So you can’t really win.

One of the things that’s gotten a lot of attention is how deeply Carrier IQ is built into the phone, and the fact that there’s no opt-out, no ability to stop or remove the software, and there’s this vague sense that there’s something nefarious going on here. Is it important for the software to be buried beneath the surface to function properly?

Let me explain the history. When we started out six years ago, we were on feature phones, and there wasn’t this concept of an app — everything ran in the background. As it turns out, a lot of services run in the background on mobile devices now, not just ours (such as manufacturer debug software). There are all kinds of things that are there. We ran in that mode, but the expectation has kind of changed as we moved forward, and now people are looking at their phones really, really carefully, and asking “What are all the things that are on there?”

The other thing that’s shifted over time is that when a feature phone came out it was just an extension of the network from a service provider’s perspective. Ninety-nine percent of it was about delivering the service, and one percent was the user, if you like. Obviously, people having spent $300-$400 on a smartphone feel that they own it, and everything about it. The operators respect that, but at the same time they’re saying that to deliver service, they have to be in control of a piece of it, otherwise all bets are off as to whether they can do that or not. That is why they certify phones before they go on the network, why they pre-load software, and why they do all these things. I think we’re just seeing this shift in the industry to a lot more transparency about what’s actually on these devices, and we just happen to be at the vanguard of that.

Do you as a company have an opinion about whether your software should be that deeply hooked in?

It isn’t about whether it’s easy to delete or not delete, it’s about getting information from multiple aspects of the system. There are two chips inside most smartphones, there’s the radio chip and the app chip. We have to get information from the radio chip in order to function.

Do you feel that users ought to be able to opt out of providing certain pieces of information to the carrier, with the knowledge that it might degrade their own service, or do you feel that breaks the whole model because everyone will opt out?

We don’t think that’s for us to decide. We’re in the business of providing a technology service that is capable of delivering this; we provide the framework where opt-in or opt-out could be delivered, and how our clients choose to implement that is a function of the confidence and trust that they have with their customers.

But if it were up to you…

It’s not up to me. (laughs)

Still, as a company delivering a service, you inherently have some ability to negotiate these kinds of things, especially if it’s going to lead to… questionable publicity. We’re just wondering if you’re weighing the ability of your service to deliver this data consistently and reliably to carriers, and the interest of users in being able to trust you because they have that choice whether to use it or not. Is this something where you believe it should be opt-in, or deletable, or stoppable, if the user chose to? If it were up to you?

I would say, as Carrier IQ, if we had direct relationships with consumers, if we were a normal application company, we’d have to build up trust and say “do you mind if we do A, do you mind if we do B, do you mind if we do C.” But in the service provider world, that question just hasn’t really come up… until recently. Since the telephone was invented, there’s just been this enormous trust between the consumer and operator. For instance, our software doesn’t see content, but within the network, you surely can.

Let’s talk about data collection. What’s the minimum amount of data you collect, and what’s the maximum amount of data you collect?

Sure. It really comes down to use cases, and the minimal one is to provide an anonymous understanding of network performance and quality. If there’s a dropped call, where was the dropped call, and what were the radio conditions around that, and we submit that data anonymously.

What’s interesting about that is there’s this frequency of reporting which can vary from a day up to a week, and there’s a processing dilemma for us here. If you report in every two days, rather than every day, the difference between that is literally double the amount of information that comes in. We’re still talking about relatively small amounts of information, though: it varies, but these packages that come out of these devices add up to about 200kb a day per device. There’s a cost penalty in hardware and processing for collecting more data. One of the technology benefits we deliver is really to throw away as much data as early as we can in the cycle, so if we can summarize what happened on the phone before it leaves the phone, that actually reduces what has to happen later. If the carrier just wants a heat map for San Francisco of good coverage and bad coverage, instead of saying “give me every call record and what happened,” they can just get a summary of that data instead.

Processing on the phone rather than the data center… does that impact performance on users’ devices?

No, and that’s a great question. Six months ago, we had some worries about possible battery drain issues and performance hits, and “What if I could root my phone and get more battery life with Cyanogen” questions. We don’t get on phones if we impact battery, because that’s a condition that the operators and handset manufacturers place. Does it drain battery a little bit? Of course it does, it takes some cycles, but we test the phones side by side with and without our product, and it’s a challenge to spot the difference.

Does that testing happen at Carrier IQ, or at the operator level?

We do it ourselves when we produce new versions of our code, and the OEMs have to do it to qualify the product for the operator. The operator’s view of how long the battery lasts isn’t with some vanilla version of the phone, it’s the phone with all the things that they want to go out on it, and there’s a massive difference between that and vanilla Android.

The time when we are using a little bit of CPU power is obviously when we do the upload, and so understanding that the device is not being used at the time when the upload takes place is kind of important, so we don’t impact the device experience or the broadband experience that you’re having at the time.

Do you wait to see if it’s idle, or schedule it for 2AM or something?

Again, with compute resources, you don’t want all the phones on the network reporting in at two in the morning, all at the same time, so you stagger them to keep the load balanced at certain points during the day.

I imagine you may not have an answer for this, but are users having to pay for those 200kb per day?

Actually, the answer, as far as I know, is no. We use the data connection, but what’s happening — and I don’t know if this is widely known — is that there are destinations within the operator’s network that are non-billable. There’s another angle on that too, and that’s if you fly to London this weekend, and it’s time to do an upload, we know that you’re roaming, and we won’t do the upload until you get back.

“MOBILE OPERATORS DON’T LIKE THAT THEY HAVE TO PROVIDE ALL THIS CARE FOR SMARTPHONES, WHICH IS AN INCREDIBLE COST FOR THEM”So that’s the minimum use case. What’s the maximum amount of data you collect?

That’s really the customer care scenario. Mobile operators don’t like that they have to provide all this care for smartphones, which is an incredible cost for them, and interestingly enough, users hate calling up customer care and would probably rather swap service providers first. You can take literally ten minutes on the phone with an operator before the point where they actually get to understanding what your problem is, and so with us providing this additional information about not just the status of the phones, but the conditions that exist, allow us to collect a lot of additional analytics that sit behind that.

Perhaps the two most sensitive ones are application usage and web URLs.

We’d definitely like to go in-depth with those, but first, we also wanted to ask about text messages.

Yes, so we’ve publicly said that we record the number of SMSs you’ve sent and received, we record when they’ve failed and why they’ve failed, and through that we also capture the phone numbers when you’re sending and receiving — so if you send me a text message, and it fails, was it on your end, or on my end? Understanding those two things is important. What we do not capture is the content of that SMS message.

So this has been an issue of contention, because some people are saying that you do capture those messages, and there appears to be some video evidence that keys are being logged as well. We’ve really got three questions here: How do you collect information on the device, what is stored on the device, and what is being transmitted?

Let’s take the keylogging thing, first of all. We don’t need to read the keyboard to understand what, say, a URL is. The operating system will give us what we call “metrics,” and the metrics will contain the URLs that are seen, and that’s separate from looking at the keyboard or anything else.

And by saying the OS gives you metrics, is that something where you’re asking manufacturers to make sure their OS talks to your software?

Yes, we have an internal… API, if you like, between the operating system and us. Our standard porting layer expects these metrics, and the operating system, through this API, provides them. And then there is, and I don’t know if you guys have caught on to this yet, but those log files shouldn’t really be there. We have this internal API that should never show up in a log file, that basically gives us the information. It’s extremely inefficient for us to troll through a log file looking for information that’s already meant for us. In software, you never do that, you write the API and so on.

So you’ve got an API, the OS gives you which apps are used via that API… you are collecting that information, but does your software specifically watch for apps, or do you ask which apps are running at a given time?

When an app comes in foreground, we get an event notification, and when it goes in background, we get an event notification, and we can look at pure CPU usage. The thing we’re looking for that ties all this together is how much battery drain is occurring because of app usage. In the carrier environment, if you call in and say “My battery’s only lasting three hours,” they can say, “Well, you’ve downloaded 20 apps, and all of them are running in the background, each consuming five percent of CPU time,” basically we’ve got the answer for the consumer about which app they’ll probably want to uninstall, and that becomes useful information. We can do the same for apps that crash in the foreground. I listened in on a care call where a customer called up and said, “I got my fourth phone from you last week, and it’s just crashed again.” We were able to establish with our software exactly which app was causing the issue. “Madam, would you mind uninstalling this app, because we think it’s the one causing the damage.” There are a lot of valid use cases for that.

So again, when it comes to how your software is getting this information, is it that you’re helping the manufacturer provide it by giving them these APIs, or are you actually writing the software?

We give them what we call a reference porting layer and a long list of metrics, and say “here’s what you should be able to implement, you can use our reference porting layer or not, but this is what you should be giving us.”

So in the best-case scenario where you’re collecting the maximum amount of data, you would get a bunch of information from the OS where you’re using the reference porting layer, you would package it, do some processing on it to minimize the amount of data to what’s actually useful and necessary, and you’d upload it when it’s appropriate for the network and won’t bug the user, and everybody’s happy, but it seems like that ideal scenario isn’t what’s happening right now on some phones. Is that a fair statement?

What’s happening on some phones that shouldn’t be happening is that information, not just about information that is sent to us, but other things as well, are showing up in this log file. We should discuss that separately — that’s not a Carrier IQ issue, specifically, but it is something that’s very interesting.

I agree with you, we need to discuss that particular log file, because other software might be able to get a hold of that information, and it’s great to hear that’s separate from you… but do you keep log files of your own on the phone, and if so, for how long, and is it protected and encrypted in some manner?

So let’s say you’ve just launched an application in the foreground. We’ll place that event within the memory of the phone, and we’re not going to tell you how we do that, but the events that we want to capture and do preprocessing on are then held for a period of time, and then at the time that the upload occurs, a SSL connection is made to the server and the data uploaded.

Once it gets to the data center, the package sits in a queue it sits in a queue for a little while in its format and then get disassembled if you like and then processed depending on the problem that’s being solved, and at that point, once the data has been uploaded, it gets deleted and the buffer starts running again. It’s a transient, RAM-based solution.

So it collects some data, and you send the data off, and it gets replaced by the next batch of data, and you’re not keeping any persistent logs on any of your products?

Yeah, yes. No, no. “Persistent” for us is the time between receiving the information and uploading the information, which in the worst case would be as long as seven days.

And would it be longer if, say, I went to Europe for a month?

The log would start wrapping. We might just count the number of dropped calls you had, versus recording your vital information, that way. We’re fairly intelligent about what we start dropping at that point in time. Like a DVR that guesses which programs you don’t want to watch, because you haven’t watched any of them.

That is not your log file?

That log file is not our log file. It’s just a standard, Android system logfile. What goes in that logfile is up to the manufacturer… So, you would hope in a shipping device, you wouldn’t get very much information to go in there.

Do you make any recommendations about logging to manufacturers? Do you say “you shouldn’t log this after you give it to us,” or something like that?

We have a standard list of things that we will log when our software doesn’t work properly, or things happen… It’s like “the application stopped,” or “it restarted.” It’s up to the manufacturer to decide whether to place that in the log file of a shipping device.

I’m trying to understand why a manufacturer, in order to give you certain information, is actually logging keystrokes. I want to separate those two things. It’s logging it, putting it into this file, and then giving it to you?

What should be happening, is it should just be giving it to us through the API. What appears to be happening is that it’s giving it to us and making a copy of what it gave to us in the log file.

Well, I guess you’re not the ones we should be asking about that particular log file, then…

But there are two very good questions that sit behind it, because it does demonstrate that keystrokes are coming into our software, and that information is coming in through our API, it just happens to recorded out to this log file.

So you do receive keystrokes.

We do receive keystrokes, yes.

Do you log those keystrokes?

No. What we do with them…

Then why do you bother receiving them?

There are short codes that can get dialed by the user… we have half-a-dozen codes that will cause an upload, or cause things… I don’t particularly want the entire world going out to try and figure out what those codes are or what they do.

And that’s why you’re logging keystrokes, to keep an eye out for those…

Logging is the wrong word. We are filtering keys that get pressed to pattern match.

That’s why you’re listening for keystrokes, then. And that all happens on the phone, or after they get uploaded through the encrypted channel?

All on the phone. The keystrokes are never sent off the device.

Do you listen to all keystrokes?

It depends on implementation. All we care about are the dial codes. Whatever stream gets sent to us, we just read that. The SMSs is also a control channel discussion. “It’s said you guys listen to the content of SMSs.” The backend system can send SMS messages to a phone to cause it check in, or cause it to upload, again it’s a very standard way of doing things. So we have pattern matching for the SMS string that’s ours.

So these aren’t short codes for particular carrier functions, but for Carrier IQ specifically?

Yes, but the operators know what they are. Our software has its own set of short codes that cause our software to take certain actions.

Most of it is diagnostic — say a phone hasn’t reported in for a while, for whatever reason, you don’t have any recent information — you could say to the consumer, please dial “*8080##” (not a real code) to cause the phone to do an upload.

So we understand that there are these certain messages you’re looking for, but why did you implement your software in such a way that it is listening to all text messages? It seems nefarious to say “Oh, well of course we listen to every text message that comes in.” Well really, that’s how you had to do it? (laughs)

It’s an implementation discussion. In some devices, we only get the text messages passed through this channel that are destined for us. In other devices we get everything, we’re just doing a pattern match. So we’re not fussy.

It’s amazing the difference between the OEMs. Some OEMs are really strict and say “Right, you’re only getting the SMSs that have got your name on it” and others are “Yeah, whatever.”

Even though you have a log that is not persistent beyond seven days and gets constantly overwritten, what sort of data protections are you putting on that log? You’re uploading over SSL, so the data channel is secure when you’re sending it. But on the phone itself, sitting in volatile RAM or wherever it sits. Is it encrypted? Is it plaintext? The data that you’re logging?

It’s not in plain text.

Is it encrypted?

I don’t want to go down the path of describing what it is, because I think the guys that reverse engineered that code over the last few days will probably talk all about it. I don’t want to talk about exactly the format of that file, but I will tell you that it is not in plain text and it’s not readable if you don’t have our tools. I don’t want to go down that path of “Let’s all go figure out what that is” and maybe that’s already happened, but the idea is that I don’t want to challenge anyone.

What we’re concerned about is the security of that data… even if you aren’t logging keystrokes, you’re listening for them. Even if you aren’t logging SMS, you’re listening for them. If someone could get access…

They still wouldn’t see them.

Okay, that’s clear, but they still would have access to information that a customer might not be comfortable with, like their app usage or URLs. What sort of measures are you taking to protect that data from malicious people? Is it just security through obscurity?

I don’t actually understand all the details, but suffice it to say, to date what we’ve been doing has been more than sufficient, and it’s very hard to do that. I don’t rule it out, though, and I don’t ever want to create a challenge for someone to say, “Let’s go hack this.”

So we spent a while discussing SMS. What about HTTPS? We’d imagine that carriers might know which webpages you’re going to, since they’re providing your internet connection, but HTTPS is potentially a step beyond that. Sometimes you’ll find website usernames, passwords and geolocation embedded in an HTTPS URL, things that we wouldn’t normally expect our carrier to be looking at.

It’s a good question, and it’s important to highlight that kind of information is available.

How do you feel about that information being available?

There are a lot of interesting things that have come out of the discussions over the last couple of weeks, and we obviously have to evaluate all of those things, and we’re not at the end of that path.

Perhaps we can rephrase the question: You talked about how you don’t want to cross the line into logging and transmitting content — content is the line — and it turns out that content’s kind of a fuzzy line. I would consider that visiting even a fairly insecure website that puts a hash of your password in the URL or something, I would consider that to be content, personally. I would consider which apps I’m using to be content. I think it says something about me if I downloaded this app about whatever creepy hobby I might have. For me, the line for content has been crossed by collecting URLs and app usage… do you agree, and is there something you might consider requiring or changing to make that sort of disclosure opt-in?

Again, whether a service provider does or does not have an opt-in, we can’t have that discussion, but clearly we recognize that this is sensitive data, and clearly we recognize that it falls under the agreement that the consumer has with the operator about whether or not that should or shouldn’t be collected… and also it clearly has a massive use in delivering the business value that we have in customer care and so on. It’s an information discussion around whether people are aware that it’s being collected and whether or not they have agreements about that.

But do you feel it crosses the line into collecting content?

I think that’s a philosophical question.

Fair enough. Was this something where you and or the carrier had recognized that HTTPS in particular would provide those things, and are any of those being used or monitored or stored or anything like that?

Not specifically, to the first part of that question, and to the second part, the operator’s intentions for that data are wrapped up in the care scenario, and we don’t think anyone’s really thought beyond that point at this time.

Regardless of their intentions for the data, do you feel like your software has been used to violate customers’ privacy?

Again I think we’re into a philosophical discussion. There’s a high level of trust between the operator and the consumer, and that extends into “Why the hell is my phone not working properly,” and not being able to ask those questions… the tradeoff for sharing information enables those answers to be provided quickly and efficiently.

 Your customers, the carriers, what they can actually see? Do they have a Carrier IQ client on their screen where they can see all the metrics that you do provide them or is it build into their management system?

It can go both ways. We have our own portals, and they’re graphically based. So you can see a map of a city and be able to see where all the dropped calls were, they’ll light up red or whatever. So say you notice that over a bridge, there’s all these dropped calls. You can zoom in and say “Show me this bridge, and I can see twenty dropped calls. Here’s one dropped call, why was there a dropped call there?” That’s the network management piece.

Could the representative of the carrier actually see things like the examples of URLs, for instance, where they see “This is the URL you typed” while talking to the customer on the phone, “Oh, well you mistyped that.”

There’s a carrier dashboard, and the carrier dashboard provides information on an individual. So while the network management console is not focused on the individual, you can get to that.

[…] If you look at the carrier environment, what’s interesting is that, over time, the care systems have just kind of sprung up. So it not unusual for these guys to have like fifteen or twenty different carrier apps. It just blows my mind. When you’ve got three or four, who thought it would be a good idea to bring in a fifth or a sixth?

The preference that we have is to drive our data into existing systems. The concept of alt-tabbing through twenty other apps to get to ours is just mind-boggling.

What about the fact that these carrier employees having the ability to see this much personal information from their users?

It’s a philosophical question. By the same token, at some point in the network somebody could be looking at your SMS traffic, at some point in the network somebody could be listening to your conversation. So this trusted relationship is perhaps under represented and understood, again I’d argue ever since the invention of the telephone, we’ve been placing this huge trust with every employee of that telephone company in securing and not interfering with what we do.

Comments are closed.