Showing posts sorted by relevance for query facetime. Sort by date Show all posts
Showing posts sorted by relevance for query facetime. Sort by date Show all posts

9.7.10

Special Look: Face Time (part 3: Call Connection Initialization)

Introduction

In part 1 of this series evaluating the FaceTime protocol, we established that the FaceTime network traffic exchange looks like this:

  • Unknown TCP protocol starts the conversation (TCP/5223);
  • Unknown UDP traffic between the iPhone and two hosts with similar IP addresses (UDP/16385 and UDP/16386);
  • Certificate validation through an Akamai server (HTTP);
  • HTTPS request to an Apple server;
  • STUN traffic for NAT traversal;
  • SIP traffic for call setup, negotiation and authentication;
  • UDP stream data for video/audio (RTP streaming H.264 with AAC audio).

In part 2 we looked at the SIP and RTP traffic in more depth, identifying what I believe is a proprietary authentication protocol in the SIP MESSAGE verb and H.264 and AAC audio data in an RTP stream, extracting that data with videosnarf.  Jason Ostrom, one of the authors of videosnarf has even indicated that they plan to work on getting video extraction working so we can record and play-back FaceTime calls.

In this installment of the series we’ll look at the unknown protocol that starts the FaceTime conversation over TCP/5223.

Traffic Analysis

Wireshark does a great job evaluating a packet capture and applying heuristics or standard port designations when applying packet dissectors.  Sadly, the FaceTime traffic over TCP/5223 is not interpreted any further than the TCP layer, as shown below (due to some lost traffic during my 888-Facetime packet capture, I’ve switched to a different capture which was more complete):

888-facetime-tcp5442-wireshark-default

We’ll have to apply our own creativity to evaluate this traffic further.  First, Wireshark’s wonderful TCP stream reassembly feature gives us the ability to view the TCP exchange in a hexadecimal view, with the option to save the data in binary format (“Raw”), ASCII, hex-dump or even C Arrays (great for taking data and dumping it into a C tool for manipulation, or otherwise modifying it to work with Python or other popular languages).

888-facetime-tcp5442-stream-reassembly

Although obviously a binary protocol (e.g. non-ASCII based) we can see plaintext strings that look similar to certificate content.  This is a common characteristic of SSL-based protocols, though Wireshark wasn’t able to identify this automatically.  Fortunately, Wireshark is also an extremely flexible tool with a little know-how.  Using the “Analyze | Decode As” feature, we can tell Wireshark to treat this traffic as SSL-encrypted to gather a bit more information from the protocol.

First, select one of the packets of the exchange that you want to decode using an alternate protocol and click Analyze | Decode As.  From the Wireshark: Decode As menu, select the Transport tab.  Specify that both ports should be decoded as SSL, as shown below:

888-facetime-wireshark-tcp5442-decode-as-dialog

Clicking “Apply” will cause Wireshark to reload the capture data, applying the SSL decoder to the specified port pair, as shown.

888-facetime-wireshark-tcp5442-ssl-decode

One of the great features of the Wireshark SSL dissector is that it will do stream reassembly for us, giving us the option to extract data even if it is transmitted across multiple TCP segments.  For example, in the screen-shot above I’ve selected the certificate information, highlighting the bytes in the hex view below.  For any highlighted data in Wireshark, we can export it to a binary file by selecting “File | Export | Selected Packet Bytes”.  In the Export Raw Data dialog, save the data with the filename extension “.der” to allow Windows to open it as a certificate.

888-facetime-wireshark-tcp5442-export-selected-packet-bytes-cert1

Double-clicking on the file with the “.der” (or “.cer”) extension will open the certificate viewer.  We can navigate the certificate details to gather some additional information about the server service.

888-facetime-tcp5442-cert-general 888-facetime-tcp5442-cert-detail-keyusage 888-facetime-tcp5442-cert-detail-eku-client server_auth

A few points of interest from this certificate:

  • Issued to courier.push.apple.com by Entrust on April 13, 2010;
  • Key use is for Digital Signatures and Key Encipherment (e.g. key encryption)
  • Enhanced Key Usage indicates that it is valid for Server and Client authentication (e.g. mutual authentication)

Other certificates are also delivered through this exchange, including the root certificate for Entrust.

A Gentle Tap

Curiosity getting the better of me, I decided to give the Apple server at 17.149.37.6 a “gentle tap” to find out more about the authentication requirements here.  One of my favorite tools is “openssl”, the binary that ships with the OpenSSL suite.  We can use this tool to connect to SSL services, extracting debug information as shown:

$ openssl s_client -msg -connect 17.149.37.6:5223 | grep -v "^ "
CONNECTED(00000003)
>>> SSL 2.0 [length 008c], CLIENT-HELLO
<<< TLS 1.0 Handshake [length 002a], ServerHello
<<< TLS 1.0 Handshake [length 0dc7], Certificate
depth=2 /O=Entrust.net/OU=www.entrust.net/CPS_2048 incorp. by ref.
(limits liab.)/OU=(c) 1999 Entrust.net
Limited/CN=Entrust.net Certification Authority (2048)
verify error:num=19:self signed certificate in certificate chain
verify return:0
<<< TLS 1.0 Handshake [length 000a], CertificateRequest
<<< TLS 1.0 Handshake [length 0004], ServerHelloDone
>>> TLS 1.0 Handshake [length 0007], Certificate
>>> TLS 1.0 Handshake [length 0086], ClientKeyExchange
>>> TLS 1.0 ChangeCipherSpec [length 0001]
>>> TLS 1.0 Handshake [length 0010], Finished
<<< TLS 1.0 Alert [length 0002], fatal handshake_failure
52865:error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handsha
ke failure:/usr/src/secure/lib/libssl/../../../crypto/openssl/ssl/s3_
pkt.c:1052:SSL alert number 40
52865:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:/
usr/src/secure/lib/libssl/../../../crypto/openssl/ssl/s23_lib.c:226:

I’ve filtered out the hex-dump data with grep, leaving us just the informational messages in this output.  The traffic marked with “>>>” is from my system to the Apple server, “<<<” is from the Apple server to my client.


First, my system attempts to do a SSL 2.0 negotiation sending a CLIENT-HELLO message.  Apple’s server responds with a TLS 1.0 ServerHello response, followed by the certificate information (such as we saw earlier).  Following this delivery, Apple’s server sends a CertificateRequest to my client.  My client sends an empty certificate response (as indicated with a length of 7 bytes) and tries to complete the ClientKeyExchange without the use a client-side certificate.  The Apple server rejects this with a fatal “handshake_failure” and terminates the connection.


From this exchange we can see that this TLS protocol uses mutual certificate authentication; a certificate on the Apple server from Entrust and a certificate on the iPhone to complete the exchange.  This is interesting since Apple has stated that FaceTime will be an open protocol, but will apparently require a client-side certificate to connect to the Apple server, which gives them a grant/deny option for all connections on a per-device basis.  Steve Papa Esteban is no dummy (here’s looking at you, Android users!)


Client-Side Certificate


Returning to the Wireshark capture decoding SSL traffic over TCP/5223, we can extract the client certificate sent from the iPhone to the Apple server using the technique detailed above.


888-facetime-tcp5442-cert-client-issuer 888-facetime-tcp5442-cert-client-subject  888-facetime-tcp5442-cert-client-keyusage


More interesting observations are now possible:



  • The iPhone client certificate is issued by the “Apple iPhone Device CA”;
  • The iPhone client certificate common name (CN) is a GUID, likely generated at the factory;
  • Key constraints are for authenticating the iPhone as a device entity;
  • Key usage is similar to the Apple Server certificate, intended for digital signatures and key encipherment.

I Probably Should Have Started Here


I probably should have started here, but it would have been much less fun.  The Apple well-known TCP and UDP ports list used by Apple products indicates that TCP/5223 is used for XMPP over SSL.  XMPP is the Extensible Messaging and Presence Protocol, the formal name for Jabber.  Apple indicates that TCP/5223 is used for authentication in unencrypted Jabber conversations, as well as for authentication and data exchange for SSL-protected Jabber sessions.


From this analysis, we can determine that FaceTime uses XMPP to authenticate and establish a connection to an Apple “Jabber” server.  Although I don’t have a packet capture for the remote session, I imagine that some kind of GSM message is sent from the initiating device to the responding device to have both devices join the Jabber server, authenticate and exchange data that initiates the FaceTime conversation including the subsequent SIP exchange.  Due to the use of certificate-based mutual authentication, it’s unlikely that anyone will be sufficiently reproducing the FaceTime protocol on another device without Apple’s assistance for certificate issuance.


Evil Thinking For Future … Evil


I’ll leave you with a final thought to consider for future evildoing.  The private portion of the certificate used for XMPP authentication by the iPhone is stored on the iPhone device; unless the iPhone uses a TPM, it is probably stored somewhere on the file system.  If you were to jailbreak your iPhone 4g and extract that certificate, you could likely use a standard Jabber client to connect to the Apple Jabber server and monitor the activity there, including the connections on who is joining and leaving the network.  Maybe even setup a Jabber Bot and automate your evil manipulation of Apple’s server.


There’s someone knocking very loudly at my door, so that’s it for me today.  Next time we’ll catch up on the HTTPS traffic and more FaceTime analysis fun.


-Josh

6.7.10

Special Look: Face Time (part 1: Introduction)

Facetime Introduction

With the iPhone 4g, video chat through Facetime is a reality in a mobile device. As a frequent traveler, I use Skype on my laptop or netbook to stay in touch with family and friends, but it usually requires some planning and coordination. With Facetime, we can initiate a voice call over the cellular network, then switch to video on demand, when WiFi service is also available (which hopefully not be a requirement in the future).

As a packet junkie, I find the concept of Facetime very interesting. The intended usage for Facetime, as described by SteveEsteban, is for a user to place a call over the cellular network with the freedom to switch to video, then back and forth as desired. Focusing on the network protocol components, there are several interesting challenges:


  • Device capabilities negotiation and call setup over WiFi;
  • Video content streaming between devices;
  • Authorization to accept the video stream by recipient;
  • NAT traversal for users behind a WiFi NAT interface;
  • Binding between GSM and WiFi traffic to mitigate spoofing attacks.

Knowing this, a lot of interesting questions come to mind. How is the management and streaming traffic protected? How is the call authorized by the end-user? What can we deduce by sniffing the WiFi-side of a Facetime transaction?

In this multi-part series, we'll look at how the Facetime protocol works, answering these and other questions while looking at tools and techniques for network protocol analysis. It's my hope that you'll learn about the Facetime protocol by reading this series, and furthermore, be able to apply these techniques to other protocols as well.

High-Level Assessment

To assess the protocol, I've taken several packet captures from my unencrypted wireless network, c alling 888-Facetime (Apple's service for customers to try out Facetime) and a colleague at the SANS Institute. Most of the analysis will be on the call to 888-Facetime, though I'll introduce other packet captures as needed.

The Facetime call with 888-Facetime was initiated by Apple's representative, which I'll herein refer to as an "inbound" session, due to the differences in Facetime calls in the role of initiator or responder. The details of my iPhone 4g are as follows:







iOS Version:4.0 (8A293)
IP Address:172.16.0.114
MAC Address:5c:59:48:02:8a:65

My AP was running in 802.11b mode (for simplifying the packet capture process), also acting as a NAT at 172.16.0.1.

Loading up the packet capture in Wireshark, I applied a display filter to include traffic only from or to my address:

ip.addr eq 172.16.0.114
Using Wireshark's Protoco l Hierarchy summary (Statistics | Protocol Hierarchy), we can get a quick look at all the protocols in this 28,034 packet capture file, as shown.


Besides the low-layer protocols, we can see different activity here:

  • UDP DNS traffic (to be expected);
  • Session Traversal Utilities for NAT (STUN);
  • Session Initiation Protocol (SIP);
  • Lots of unrecognized UDP data packets;
  • HTTP traffic transmitting XML data;
  • HTTPS traffic;
  • Unrecognized TCP traffic;
  • ICMP.

Wireshark doesn't give us the option to sort this traffic view by time, but we can switch to the Conversations view (Statistics | Conversations) to view time-relative data by protocol, as shown (TCP first, then UDP):

We can see a few nodes are involved here:


Address Name Note
17.149.36.103No DNS NameApple, Inc system in the 17/8 netblock
72.215.224.43init.ess.apple.com.edgesuite.netAn Akamai server, a239.da1.akamai.net
199.7.52.190crl.verisign.net
Verisign's CRL server
17.155.4.14No DNS Name
Apple, Inc system in the 17/8 netblock
17.155.5.251No DNS Name
Apple, Inc system in the 17/8 netblock
17.155.5.252No DNS Name
Apple, Inc system in the 17/8 netblock
68.105.28.11cdns1.cox.net
My ISP's DNS server
17.109.28.227No DNS Name
Apple, Inc system in the 17/8 netblock

Using the timing and address information, we can construct a timeline of what happens in this session:



Step

Nodes

Description

1172.16.0.114 -> 17.149.36.103The iPhone 4g initiates a TCP session to the remote host over TCP/5223. Wireshark does not have a dissector for this protocol, though it believes the port number is associated with the HP Virtual Group protocol.
2172.16.0.114 -> 17.155.5.251Several UDP connections from the iPhone 4g to Apple's server over UDP/59007.
3172.16.0.114 -> 17.155.5.252More UDP traffic to a host with the next 4th octet over UDP/59007
4172.16.0.114 -> 72.215.224.43HTTP traffic to the Akamai server over XML, retrieving certificate information from Apple's servers.
5172.16.0.114 -> 17.155.4.14HTTPS traffic to an Apple server.
6172.16.0.114 -> 17.109.28.227UDP STUN traffic to an Apple server for NAT traversal.
717.109.28.227 -> 172.16.0.114UDP SIP traffic from Apple revealing phone numbers, among other details.
817.155.5.14 -> 172.16.0.114UDP traffic over port 16402; making up the majority of the packet capture data, this is likely the video stream information which continues until a SIP BYE message is observed.


Summary

Based on this analysis we can determine several critical pieces of how Facetime works:

  • Unknown TCP protocol starts the conversation, likely initiated following an event that starts on the GSM network;
  • Unknown UDP traffic between two hosts with similar IP addresses;
  • Certificate validation through an Akamai server, followed by an HTTPS request to an Apple server;
  • STUN traffic for NAT traversal;
  • SIP traffic for call setup and negotiation;
  • UDP stream data for video/audio.
In the next part of this series,we'll spend some more time look at the SIP and video/audio streaming traffic and look at some tools we can use to extract that data. Stay tuned!

-Josh

7.7.10

Special Look: Face Time (part 2: SIP and Data Streams)

Introduction

In part 1 of this series we looked at the protocols involved in a Facetime call. The basic outline of the Facetime network exchange is as follows:
  • Unknown TCP protocol starts the conversation (TCP/5223);
  • Unknown UDP traffic between the iPhone and two hosts with similar IP addresses (UDP/16385 and UDP/16386);
  • Certificate validation through an Akamai server (HTTP);
  • HTTPS request to an Apple server;
  • STUN traffic for NAT traversal;
  • SIP traffic for call setup and negotiation;
  • UDP stream data for video/audio.
In this installmemt of the series we'll look at the last two components: SIP and the UDP stream information.

Examining SIP

SIP is the Session Initiation Protocol, used for controlling the setup and establishment of audio and video calls over TCP or UDP. As a text-based protocol, it looks a lot like HTTP (verbs like INVITE and BYE and numeric response codes), with a little SMTP love thrown in there as well.

Wireshark does a great job of identifying SIP traffic, even on non-standard ports. While SIP is typically done over port 5060 (or 5061 for SIP over TLS), Facetime is using UDP/16402. Wireshark gives us a summary of SIP activity in the packet capture by selecting the Telephony | SIP option, as shown.

Of interesting note here is the lack of the SIP request method REGISTER, which would be used with digest authentication to authenticate the device. This isn't a statement of vulnerability, but it indicates that Apple is not using the standard SIP authentication method, instead relying on an alternate exchange to authenticate the devices.

Also interesting here is the use of the SIP MESSAGE verb. According to RFC3428, the MESSAGE verb is used for instant messaging as part of the SIP exchange.

Otherwise, the SIP exchange is straightforward, as follows:

  • INVITE from the initiator to the responder;
  • ACK from the responder;
  • Several MESSAGE frames back and forth;
  • After a few minutes (the duration of the video call), a BYE from the responder to terminate the session.
The SIP exchange is shown in Wireshark packet list form below:



A few IP addresses worth clarifying here:
  • 17.109.28.227: The remote iPhone from Apple's 888-Facetime service;
  • 172.16.0.114: My iPhone's IP address on my open WiFi network;
  • 68.9.119.102: My NAT address from my ISP, previously negotiated with STUN.
In the packet list we see that Apple is using user@address:port for the SIP address (URI). Looking at the detail of the INVITE frame we can gather additional detail. First we'll look at the message header content (content has been omitted to protect the privacy of the remote caller):


More interesting stuff here:
  • The Display component in the To and From fields reveals the cell phone number of both parties. In the first "To:" field shown, my cell phone number is listed "4015242911" followed by an unknown "570". This is interesting since the 888-Facetime caller's phone number was blocked from my phone display, but accessible to me from a packet capture.
  • The User-Agent of the 888-Facetime caller is "Viceroy 1.4/GK", which is similar to the User-Agent used by the iChat video client ("Viceroy 1.3", or "1.2" in older iChat clients).
Looking at the message body detail reveals more details about the session:


The message body details the Session Description Protocol (SDP) content, including the SDP session owner as "GKVoiceChatService" which is documented in Apple's iPhone SDK. We can also see the Real Time Control Protocol (RTCP) negotiated for UDP/16402, as well as multiple negotiated media attributes, essentially reduced to AAC for audio and X-H.264 for video.

Later in the SIP exchange, we see several of the MESSAGE verbs. Although intended for use in instant messaging applications, the MESSAGE verb is used by Facetime to exchange arbitrary data between the two iPhone devices. The MESSAGE verb payload data repeats the "User-Agent: Viceroy 1.4/GK" information, then includes the message "Content-Type: application/ske", similar to a HTTP exchange. Following this tag we have a Content-Length tag and "SKESeq: 1;0" for the first of the 4 MESSAGE verbs. Each subsequent MESSAGE verb also includes this content, changing the numeric identifier "1" for the successive packets (e.g. "SKESeq: 2;0", "SKESeq: 3;0" and "SKESeq: 4;0").

We can apply the display filter "sip.Request-Line contains "MESSAGE"" to focus the Wireshark display on these MESSAGE frames, as shown below.



A quick Google search doesn't turn up anything about the SKE protocol, though I'll speculate here that it is some kind of authentication negotiation mechanism. A summary of the 4 payloads following the SKESeq header is as follows:
  • SKESeq 1: A large-ish payload commonly around 785 bytes which appears to include certificate-looking information.
  • SKESeq 2: Always 4 bytes of payload: "61 f4 27 9f" (in one capture)
  • SKESeq 3: A consistent payload length of 170 bytes, no significant ASCII strings.
  • SKESeq 4: Always 4 bytes of payload: "53 a0 8e a3" (in one capture)
This data requires further analysis, possibly representing a proprietary authentication protocol used by Facetime through SIP MESSAGE verbs.  I'll devote further analysis to a later article so we can move on to the good stuff.

Data Streams

Following the SIP exchange we see a RTP exchange over UDP/16402 with a reflexive source port.  To evaluate this stream we'll turn to the videosnarf tool by Arjun Sambamoorthy and Jason Ostrom.  Videosnarf and the parent tool ucsniff are really impressive, and Jason and Arjun are really cool guys as well.

Videosnarf can read from a libpcap file, but the current version of the tool does not properly accommodate wireless packet capture link types other than native 802.11 (e.g. it cannot interpret PPI or Radiotap headers), with the following error:

# videosnarf -i 4g-inbound-888FACETIME-session-1.pcap
Starting videosnarf 0.63
[+]Starting to snarf the media packets
[+] Please wait while decoding pcap file...
[-] Invalid IP header length: 0 bytes
[-] Invalid IP header length: 0 bytes
[omitted]
[-]No RTP media stream found
[+]Snarfing Completed
#

My packet capture uses the PPI header, so I added support to handle this link type with videosnarf.  Download and apply the patch as shown (against videosnarf 0.63, future versions will hopefully integrate this functionality and not require patching):

# cd videosnarf-0.63
# wget -q http://www.willhackforsushi.com/code/videosnarf-wifi-ppiheader.diff
# patch -p1 <videosnarf-wifi-ppiheader.diff
patching file src/videosnarf.c
patching file src/videosnarf.h
# ./configure && make && make install

Once videosnarf includes the ability to read from wireless packet captures with the PPI header, we can run it against the packet capture again:

# videosnarf -i 4g-inbound-888FACETIME-session-1.pcap
Starting videosnarf 0.63
[+]Starting to snarf the media packets
[+] Please wait while decoding pcap file...
[-] Invalid IP header length: 16 bytes
[omitted]
Protocol: Unsupported
[-] Invalid IP header length: 16 bytes
[-] Invalid IP header length: 16 bytes
[+]Stream saved to file H264-media-1.264
[+]Stream saved to file H264-media-2.264
[+]Stream saved to file H264-media-3.264
[+]Stream saved to file H264-media-4.264
[+]Number of streams found are 4
[+]Snarfing Completed
# ls -l H264-media-*
-rw-r--r-- 1 root root  413160 Jul  5 18:24 H264-media-1.264
-rw-r--r-- 1 root root  272459 Jul  5 18:24 H264-media-2.264
-rw-r--r-- 1 root root 3765017 Jul  5 18:24 H264-media-3.264
-rw-r--r-- 1 root root 1761492 Jul  5 18:24 H264-media-4.264

Videosnarf was able to extract four H.264 data streams, saving them to files.  We can quickly evaluate the contents of the files to determine if the content itself is encrypted using the "ent" tool:

# ent H264-media-3.264
Entropy = 4.509034 bits per byte.

Optimum compression would reduce the size
of this 3765017 byte file by 43 percent.

Chi square distribution for 3765017 samples is 298830527.55, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 55.8586 (127.5 = random).
Monte Carlo value for Pi is 3.626079279 (error 15.42 percent).
Serial correlation coefficient is 0.622531 (totally uncorrelated = 0.0).

Ent applies several tests to evaluate the entropy and randomness of a given file.  In this example, entropy is fairly low at 4.5 bits per byte.  Compare this to a data stream collected from the Linux /dev/urandom device:

# dd if=/dev/urandom of=rand bs=4096 count=1000
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB) copied, 1.64117 s, 2.5 MB/s
# ent rand
Entropy = 7.999961 bits per byte.

Optimum compression would reduce the size
of this 4096000 byte file by 0 percent.

Chi square distribution for 4096000 samples is 224.01, and randomly
would exceed this value 90.00 percent of the times.

Arithmetic mean value of data bytes is 127.5652 (127.5 = random).
Monte Carlo value for Pi is 3.142760882 (error 0.04 percent).
Serial correlation coefficient is 0.000133 (totally uncorrelated = 0.0).

Or an encrypted file of all 0's:

# dd if=/dev/zero of=zero bs=4096 count=1000
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB) copied, 0.0211891 s, 193 MB/s
# openssl enc -aes-128-cfb -in zero -out zero.enc
enter aes-128-cfb encryption password:
Verifying - enter aes-128-cfb encryption password:
# ent zero.enc
Entropy = 7.999947 bits per byte.

Optimum compression would reduce the size
of this 4096016 byte file by 0 percent.

Chi square distribution for 4096016 samples is 300.32, and randomly
would exceed this value 5.00 percent of the times.

Arithmetic mean value of data bytes is 127.4714 (127.5 = random).
Monte Carlo value for Pi is 3.137092793 (error 0.14 percent).
Serial correlation coefficient is 0.000067 (totally uncorrelated = 0.0).

Clearly the output from the Facetime video stream as extracted by videosnarf is not encrypted. Sadly, it does not appear that the extracted data is viable to play with mplayer:

# mplayer H264-media-3.264 -fps 17
MPlayer 1.0rc2-4.3.2 (C) 2000-2007 MPlayer Team
CPU: Intel(R) Core(TM)2 Duo CPU     L7100  @ 1.20GHz (Family: 6, Model: 15, Step
ping: 11)
CPUflags:  MMX: 1 MMX2: 1 3DNow: 0 3DNow2: 0 SSE: 1 SSE2: 1
Compiled with runtime CPU detection.
mplayer: could not connect to socket
mplayer: No such file or directory
Failed to open LIRC support. You will not be able to use your remote control.

Playing H264-media-3.264.
H264-ES file format detected.
xscreensaver_disable: Could not find XScreenSaver window.
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
Selected video codec: [ffh264] vfm: ffmpeg (FFmpeg H.264)
==========================================================================
Audio: no sound
FPS forced to be 17.000  (ftime: 0.059).
Starting playback...
[h264 @ 0x896a290]illegal POC type 5
[h264 @ 0x896a290]sps_id out of range
[h264 @ 0x896a290]sps_id out of range
[omitted]
[h264 @ 0x896a290]decode_slice_header error
[h264 @ 0x896a290]concealing 12 DC, 12 AC, 12 MV errors


MPlayer interrupted by signal 11 in module: decode_video
- MPlayer crashed by bad usage of CPU/FPU/RAM.
  Recompile MPlayer with --enable-debug and make a 'gdb' backtrace and
  disassembly. Details in DOCS/HTML/en/bugreports_what.html#bugreports_crash.
- MPlayer crashed. This shouldn't happen.

It appears the reconstructed file is close to a H264 file, but has some errors preventing it from being played back.  This is still positive from an attack perspective though, since we know the content is not encrypted; hopefully the videosnarf developers will release an updated version soon that can address any problems with reconstructing and saving the H.264 stream.

Summary

Let's summarize what we learned today:

  • While Facetime uses SIP, it does not use the standard authentication mechanisms;
  • Phone number information is disclosed in the SIP exchange, even if it is blocked on the phone itself;
  • Facetime uses the SIP MESSAGE verb for passing arbitrary data between iPhone devices involved in a Facetime call.  This could be a proprietary authentication mechanism;
  • Videosnarf with a minor patch can extract video and audio stream data.
  • The video and audio content of a Facetime conversation are NOT encrypted, leaving them susceptible to eavesdropping attacks if the underlying WLAN infrastructure is weak or otherwise compromised;
  • Mplayer is unable to play back this stream data today; hopefully fixes can be applied by the videosnarf team to resolve this in the future.

Next time we'll spend some time looking at the initial TCP exchange between the iPhone 4g and the authorization process that initiates the connection.  Comments and questions are welcome, thanks!

-Josh