In this Instructable, you'll learn how to intercept the video, microphone, and controls of the $30 Kaicong SIP1602 wireless pan-tilt camera on Windows, Linux, or OSX! Everything is rolled neatly into python scripts; you can use the output data for things like voice transcription, computer vision, and automated directional control. If you're feeling truly adventurous, keep on reading and you'll learn my methods to discover and reverse engineer wireless cameras!
Installation time: ~30 minutes
You will Need:
For anything other than just installing and running the camera code, intermediate-level experience in Python and OpenCV will also be very useful. Let's get to it!
If you like this hack, don't forget to follow us on Instructables, Facebook or Twitter, and check out our other projects on our website!
Step 1: Setting up the Camera
On the box that contains the camera is Kaicong's motto: "Nothing Important Than Safty". And it shows - they really made the manual secure, because anyone that can't read Mandarin is going to have a pretty hard time understanding it! That said, installation is surprisingly simple.
By default, these cameras are viewable to anyone on the internet who guesses your .kaicong.info address - which can be awesome for projects, but not so awesome for security and privacy. To solve this, you can either change your DDNS username and password, or simply set both of them to blank (thereby making it impossible to access your camera outside of your local network)
Step 2: Installing Python Controls
With the camera set up complete, we'll need to install a few libraries before we can run our scripts.
For Windows: Here are links to windows installation tutorials, or pages where you can find the windows installer.
For Ubuntu: setup can be done via this command: sudo apt-get install python python-opencv python-pyaudio python-pygame
For OSX: First install OpenCV and Homebrew - I had to additionally install eigen (brew install eigen) to prevent compiler errors.
Then run the following:
brew install python
brew install gcc
brew install homebrew/python/pygame
brew install portaudio
Then download the pyaudio wrapper for OSX and install that as well.
Now that we've got the dependencies out of the way, head over to the git repository where this project is hosted, download it, and extract the files. Open up a command window or terminal in the directory with the extracted files, and run each script with the following commands, replacing 192.168.1.19 with the IP address of your camera:
python KaicongAudio.py 192.168.1.19
This script pulls audio from the mic and plays it on your speakers.
python KaicongVideo.py 192.168.1.19
This script displays video from the camera and displays it in an OpenCV window.
python KaicongMotor.py 192.168.1.19
This script opens up a black Pygame window. Click it with the mouse so it can capture your keyboard, then use the WASD keys to pan and tilt the camera!
At this point, we've successfully hooked up the camera and can intercept audio, video, and motor control from it via programming. But how did we do this? Read on to find out...
Step 3: How We Did It: Hacking Motion
We started out with a camera with a web page interface and wanted to control it programmatically, so what better way to figure out how it works than inspect the code?
We saved the webpage to disk and looked at monitor.htm. It was there that we found some interesting looking variables, such as PTZ_UP and PTZ_STOP, which appeared to be motion control constants. Keeping that in mind, we opened up the web inspection console (Ctrl+Shift+C in Chrome) and inspected the network traffic while clicking the camera motion buttons. We found several calls to a decoder_control.cgi page with a "command=" argument matching the constants we found earlier in the HTML - one whenever a click begins, and another whenever a click ends. So the controls are ON/OFF and via HTTP GET request? Let's find out!
We copied the url we saw:
into the browser and loaded the page, and sure enough the camera began moving! From then it was a matter of throwing the constants and a formattable URL string into Python to complete the controller. Done.
But what about video? A camera's not a camera without it, after all...
Step 4: How We Did It: Hacking Video
As it turns out, video hacking was actually pretty simple - we looked in the network requests and found a lot of requests to snapshot.cgi. Entering one of these into Chrome produced a still image every time the page was loaded. Neat!
But we wanted something a bit more efficient: the streamed video that the ActiveX object seemed to receive. The ActiveX object itself didn't seem too useful to disassemble (reversing assembly code is way overrated), so instead we opened up Wireshark. We filtered the capture down to the IP of our camera (Capture->Options->Capture Filter) and started the capture, before reloading the ActiveX control page in our browser. What we found were two GET requests for audiostream.cgi and livestream.cgi, presumably for the audio and video.
Putting aside the audio url for now, we turned to Google to see if anyone had decoded an IP camera video stream before. Under a search for "IP camera HTTP stream" we found a handy little python script to get everything running in OpenCV. All it took was replacing the script's URL with ours, and we were in business!
Next, it was time to intercept the audio.
Step 5: How We Did It: Hacking Audio
Getting video wasn't too hard. Hopefully audio would be just as easy, right? After a few hours of Google searching, it looked like no one else has ever managed to successfully pull out and decode the audio stream of an IP camera. We were on our own.
Going back to our audiostream.cgi url we found via Wireshark, we captured a few bytes of audio with Ubuntu:
Then hit Ctrl+C to cut off the stream. Raw audio in hand, we marched over to Audacity to attempt to play it via File->Import->Raw Data. Most attempts sounded like noise, however we found that using the VOX ADPCM encoding at 8kHz produced something recognizable!
There was still the matter of removing that weird pattern of clicks. I figured it had something to do with the packets, as with the video stream we had to remove some headers at the start and end. Maybe the same was true with audio?
We looked a bit more closely at each packet, and noticed that the data started with the same 0x55aa15a8... bytes, plus a value that looked to be counting upwards each packet, and a long stream of zeroes, for a total of 32 bytes. Presumably, Audacity was taking these packet headers as audio data and trying to decode them, which is what made the nasty clicking sounds.
A few experimental python scripts later, we removed the headers and passed it through the ADPCM decoder in Audacity - most of the clicks were removed! But there were a few left over, specifically during the noisier parts of the audio.
So we read into how ADPCM works - apparently it encodes audio via the difference between samples, and caches the previous audio state so that it can add the two and produce a new sample. After a few more python scripts, we managed to capture the packets directly and reset this state at the start of each packet. Clicks were completely removed, and nothing but camera audio remained. Success!
Step 6: The Future
It's awesome to have such a complex device completely controllable via python. We plan on using our camera for person detection and room occupancy tracking as well as spoken voice commands, but we can think of a few other uses for a camera like this one, such as:
We'll be hacking on these cameras with at least a few such projects in mind, so be sure to follow us on Facebook and Instructables if you're looking for inspiration!