1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Can a computer be uniquely identified? Computer experts

Discussion in 'General' started by SGVRider, Jun 20, 2014.

  1. SGVRider

    SGVRider Well-Known Member

    http://it.slashdot.org/story/05/03/04/1355253/tracking-a-specific-machine-anywhere-on-the-net

    I've been doing some research on browser fingerprinting, it's quite clever stuff. Yes, the article is very old. What non-technical person would even conceive of this, though?

    I'm very interested in how machines can be individually tracked. Obviously clock skew data can't uniquely identify a machine in isolation. When combined with other data, though, it could be very powerful in narrowing sessions down to a unique device.

    If you were using multiple virtual machines on the same physical computer, each with different user agents and setups, and different IPs etc., could a machine still be uniquely identified using some technique like clock skew?

    I'm not talking about the NSA installing chips with the collusion of the manufacturer. Would a Google level corporation be able to do it?
     
  2. tophyr

    tophyr Grid Filler

    Absolutely. MAC addresses are unique enough that they make this fairly trivial - they're not guaranteed to be unique but they were designed to be so, and combined with other data it's not hard to develop a persistent unique identifier from browser-accessible information.

    Clock skew wouldn't be one of the pieces of info, however. Application code only has access to the clock through the OS, and the OS smooths things out to make it as close to an ideal clock as possible.
     
  3. SGVRider

    SGVRider Well-Known Member

    Thanks for the response. Isn't a machine's MAC address only available to others on an internal network? I thought only a router's MAC could be traced by a remote network, not an individual device's MAC. The white papers and research papers I've read suggest that timing can be inferred through some kind of TCP/IP timestamp. It seems like it's also commercially available technology, though obviously nothing is clear about fingerprinting.

    Would a remote website be able to detect that several sessions were originating from the same machine, assuming A) only technical means are used, B) every session originates from a different IP and router, C) every session uses its own dedicated virtual machine isolated from the main system, D) the VM instances aren't using the site at the exact same time.

    Bottom line, what are possible technical means of correlation if someone doesn't want to be identified or tracked?
     
    Last edited: Jun 20, 2014
  4. tophyr

    tophyr Grid Filler

    The server wouldn't get your MAC address directly via network packets - the only MAC it'd see in those is the MAC of the router closest to it, you're correct. However the browser may or may not have access to the MAC, and even if it didn't have direct access to read that it has access to several more values that are derived from the MAC(s) (or similar identifiers). MAC addresses specifically aren't something to get hung up on, either way - you can actually set your MAC to be any value you want as long as it doesn't conflict with another device on your network segment. The point is that they are one of several ubiquitous, nearly-unique identifiers that are readily available to applications that can (and do) send those identifiers to a server somewhere. Those identifiers are almost always modified in some fashion, however - 99.99% of the world isn't actually interested in tracking you specifically outside of the context of their app.

    Clock drift via TCP timestamps is something that I'd doubt, but I haven't read the papers you're referring to so I won't say it's impossible. TCP timestamps use the same OS clock APIs that I referred to above, however - they don't, unless there's some sort of weird hardware TCP processing going on (aka, dedicated network routers, not consumer computers), draw timestamps directly from the hardware clock.

    Given all four A/B/C/D conditions you specified, it would be pretty difficult to determine that two unique sessions came from the same piece of hardware. A and B don't make it particularly complicated, but C would (probably) do a very good job of obfuscating/changing identifying values and D would make session correlation much more difficult. The extent to which C (different virtual machines) would make it more difficult would largely depend on how much hardware is virtualized vs emulated: A virtualized CPU will return the same CPUID as the host CPU (modern OSes virtualize things like that already, even when actually not running inside what most ppl consider a "VM"), but an emulated CPU is a truly separate processing entity than the host CPU. Same goes for network cards, sound cards, graphics processors and RAM.. more virtualization = closer to bare metal, more emulation = more abstract. For what you're talking about, you want as much abstraction as possible. Unfortunately, more abstraction also equals more overhead and worse computational performance.

    </geek>
     
  5. beac83

    beac83 "My safeword is bananna"

    Most Intel processors have Electronic Serial Numbers these days. That number is available to application-level processes.
     
  6. SGVRider

    SGVRider Well-Known Member

    AWESOME stuff, gentlemen. I've been trying to figure out this stuff on my own, but it's all very unclear using just Googlefu. I love this /geek stuff, I guess I'm one at heart, too.

    I have a couple questions

    1) Would a website have a way of reading my CPU ID if my VMware didn't emulate it as a unique value in every instance? Assume they're extremely sneaky, very highly motivated to correlate sessions, have Google level technical ability and resources, but are also overwhelmed so likely won't spend a huge amount of resources per user.

    2) Some data is available to application level processes, could a clever designer for a remote website get or infer that data using JavaScript, Flash, whatever?

    3) Is there any way to determine what is emulated vs. virtualized on my VMware. Currently just using Oracle VirtualBox. I know there are a thousand different packages though.

    4) Would there be an advantage to using different VMware for each session?

    5) Say we use software to spoof the user agent to make it look like the most common type of UA, is that in itself detectable? Would someone be able to determine that Mozilla / Windows 7 was actually Mozilla / Ubuntu? I assume spoofing the browser identity is very detectable, if you have knowledge of how each browser phrases its requests and compare it to the declared UA.
     
    Last edited: Jun 21, 2014
  7. notbostrom

    notbostrom DaveK broke the interwebs

    OK spill it... who are you stalking?
     
  8. SGVRider

    SGVRider Well-Known Member

    Haha, this is way too much effort for stalking. I prefer old fashioned binoculars and hand cream for that.
     
  9. In Your Corner

    In Your Corner Dungeonesque Crab AI Version

    And to think that the the most intricate problem I am currently considering is "what can I cook for supper that has bacon in it?".
     
  10. Steeltoe

    Steeltoe What's my move?

    That was my thought. My bios has an option to disable it. It was supposedly an anti-piracy measure.
     
  11. Knotcher

    Knotcher Well-Known Member

    Do you care if this information can be used to identify multiple sessions from the same hardware or if this information can be used to identify a user?


    As to the first scenario, this is extremely doubtful.
     
  12. Venom51

    Venom51 John Deere Equipment Expert - Not really

    I'll go with the simple answer. Because so much of the information could be altered in transit once it left the originating source machine before it reached it's destination I'll go with not with any real accuracy. You can of course hardware fingerprint a machine but that's only good up to the point the hardware configuration gets altered.

    With the information in that article I can already tell you that their concept is flawed. They are working under the assumption that no one is altering the payload in transit. I can sit in the middle and alter traffic in both directions. The fingerprint they would end up with would be altered to the point of being unreliable to identify a machine with absolute certainty.
     
  13. SGVRider

    SGVRider Well-Known Member

    I have to prevent both. I do know that analytic means can be used to suss out patterns and identify supposedly distinct users who are likely the same person.

    Each different VM / VPN combination has to look like a distinct user from any other VM / VPN combination employed on the same machine. We can change operating systems, browsers, and anything else for each VM.

    I need to know if someone (other than NSA) has the technical means to identify different VMs as originating from the exact same machine. The solution has to be able to resist a more than casual technical analysis from someone who is actively trying to correlate a user with another.

    As Tophyr said, VM software emulates some functions and virtualizes others.

    Is it possible for a remote website to have the right type and quantity of data to identify a distinct piece of hardware if:

    1) Two users have already been identified as potentially correlated through non-technical means
    2) One user is singled out for further investigation but no second correlated user was identified through non-technical analysis
     
  14. iomTT

    iomTT Well-Known Member

    some threads where people proove they really do have brains really piss me off
     
  15. tophyr

    tophyr Grid Filler

    If I were smart enough to come up with a working unique identification scheme, why wouldn't I choose to send all my data via SSL? I won't go as far as saying MITM attacks are a thing of the past, but for intelligent developers they're pretty avoidable.
     
  16. caferace

    caferace No.

    I'm not sure I get this conversation. If I don't want to be tracked (like, in a life threatening way) there are tools to do so. If my motivation is below that, there are plenty of simple ways to track me.

    What are your goals? That's always the question. What and why are you trying to accomplish? OK, here is how you do it.

    -jim
     
  17. tophyr

    tophyr Grid Filler

    +1 to Knotcher (edit: and Jim, apparently).. you're getting specific enough that it may be helpful to take a step back and describe the overall goal. Are you looking to avoid session correlation to the same piece of hardware, or same user? You can have fifteen different users connecting to the same website on the same piece of hardware, or you could have one user connecting to the same website on fifteen different pieces of hardware. The tracking/correlation tactics would be very different depending on which scenario you're going for - if I wanted to track the machine, I'd come up with a snippet of code that generates a hardware-dependent ID and runs on the client. If I wanted to track the user, I'd come up with usage pattern heuristics and run them on the server.

    1. Websites generally won't be able to read CPUID - maybe if you allowed them to install an ActiveX control or something - but that's not required for development of a unique identifier, either. Either way, if your VM software emulates the processor instead of virtualizes it, guest applications inside the VM won't be able to read the CPUID of the host processor no matter what. Hope that answers your #1 - not entirely sure I understood the question.

    2. Yes. JavaScript itself is run in a VM inside the browser and so doesn't have direct access to machine information but there are a lot of various APIs exposed that can reveal machine information. Flash, likewise, runs in a VM but has more direct machine access (and has a shitload of security exploits, to boot). ActiveX controls are the worst from this standpoint - they can contain compiled native code and will run natively in the same process as the browser (aka, with its permissions).

    3. I've only tinkered a bit with VirtualBox but it definitely virtualizes the x86 processor, at the very least. Pretty much every VM package in existence will try to virtualize the processor if the guest machine isn't specified to be a different proc architecture; emulating modern processors is insanely slow compared to virtualizing them. (Emulation vs Virtualization) VB is quite configurable however, so it probably has options to control how much it virtualizes.

    4. I'm assuming you mean would it be harder to correlate your sessions if each one came from not only different VMs, but each VM running on a different platform (ie one on VMware, one on Parallels, one on QEMU, one on VBox)? Yes, it would be at least as hard, and likely much harder, to figure out that each session was coming from the same physical machine than if each session was run from the same VM platform. Once you're already into the different VMs level, however, you've already made it really, really hard.

    5. UA spoofing is easy to detect; all I would do is try a series of browser-unique JavaScript code snippets and see which ones worked and which ones failed.
     
  18. Venom51

    Venom51 John Deere Equipment Expert - Not really

    Correct you are but 98% of the net is still not SSL protected. You and I are not the targets of their concept.
     
  19. tophyr

    tophyr Grid Filler

    Well, what I was saying is that anyone looking to track someone like SGV is talking about, is the developer of a website or app - and is thus in charge of whether their data is encrypted in transit. Unless you're imagining a different scenario somehow.
     
  20. SGVRider

    SGVRider Well-Known Member

    Assume the following conditions
    1. Each user has to present a login credential.
    2. User 1 on VM 1 can't be identified as being the same as User 2 on VM 2 or User 3 on VM 3 running on the same hardware.
    3. Analytical correlation is a strong possibility (precautions can be taken but may not be effective). Say their algorithms say there's a 50% probability that user 1 and user 2 are related. Also say their processes require that a definite link be established before taking action against correlated user accounts.

    The goal is to deny them any possibility of establishing a non-analytical link between user 1 and user 2 and breaching the "boot user" threshold.

    We also want to deny them the possibility of linking user 1 and user 2 directly through technical processes.
     

Share This Page