Hive (part 2)

8 minute read

I’s continuation of previous post about Hive. The first part focused on developers, environment and tools. This time I’m going to look deeper into code.

Erratum

In the first post about Hive I wrote about accounts. Each commit contains information about author like: User #140 <Account.155@devlan.net>. It looks a bit strange, isn’t it? This unusual form niggled me. I couldn’t imagine that developers use only ID and talk about a code in this way: Did you see what #140 made in foo.c? What a bummer! I made a deeper investigation and after few minutes all was clear. Data are malformed.

Someone from wikileaks obfuscated data, small prove .git/logs/HEAD:

b6ac4fd7cdeac91bb8425b9f9df49bda4411186b b6ac4fd7cdeac91bb8425b9f9df49bda4411186b fix <fix@wikileaks.org> 1508578973 -0400  checkout: moving from master to master
b6ac4fd7cdeac91bb8425b9f9df49bda4411186b b6ac4fd7cdeac91bb8425b9f9df49bda4411186b fix <fix@wikileaks.org> 1508579017 -0400  checkout: moving from master to b6ac4fd7cdeac91bb8425b9f9df49bda4411186b
b6ac4fd7cdeac91bb8425b9f9df49bda4411186b b6ac4fd7cdeac91bb8425b9f9df49bda4411186b fix <fix@wikileaks.org> 1508579044 -0400  checkout: moving from b6ac4fd7cdeac91bb8425b9f9df49bda4411186b to master
b6ac4fd7cdeac91bb8425b9f9df49bda4411186b 9a039f9fe67cefeed9042a8493b9e7f39f504c4c fix <fix@wikileaks.org> 1508579082 -0400  checkout: moving from master to armv5
9a039f9fe67cefeed9042a8493b9e7f39f504c4c b6ac4fd7cdeac91bb8425b9f9df49bda4411186b fix <fix@wikileaks.org> 1508579116 -0400  checkout: moving from armv5 to master
b6ac4fd7cdeac91bb8425b9f9df49bda4411186b 55de783e964f4cae02dc8530af8b24c3e5d93edd fix <fix@wikileaks.org> 1508579253 -0400  checkout: moving from master to mt6
55de783e964f4cae02dc8530af8b24c3e5d93edd 06e887de8037537af39ac5adf65c652536d6230f fix <fix@wikileaks.org> 1508579320 -0400  checkout: moving from mt6 to dhm
06e887de8037537af39ac5adf65c652536d6230f 1d93a9417e2bf263a73c019f7cd93c248e4f76b6 fix <fix@wikileaks.org> 1508579345 -0400  checkout: moving from dhm to debug
1d93a9417e2bf263a73c019f7cd93c248e4f76b6 1273b362f69f2ab0b4c4c7c25f3ccee65c0da0fd fix <fix@wikileaks.org> 1508579351 -0400  checkout: moving from debug to autotools
1273b362f69f2ab0b4c4c7c25f3ccee65c0da0fd 5c7b3f32dc88ed01d9c573bd4ef98158d614a66b fix <fix@wikileaks.org> 1508579380 -0400  checkout: moving from autotools to makemods
5c7b3f32dc88ed01d9c573bd4ef98158d614a66b 2b2e540698c7ffa760a1eae0fa6ecaf20a17ed30 fix <fix@wikileaks.org> 1508579390 -0400  checkout: moving from makemods to solarisbug
2b2e540698c7ffa760a1eae0fa6ecaf20a17ed30 46658be8101e799ef514516e2742c668a15ebe4b fix <fix@wikileaks.org> 1508579401 -0400  checkout: moving from solarisbug to ubiquiti
46658be8101e799ef514516e2742c668a15ebe4b 7450d4019017ba311c2b92167b21996dd447861c fix <fix@wikileaks.org> 1508579417 -0400  checkout: moving from ubiquiti to polar-1.3.4
7450d4019017ba311c2b92167b21996dd447861c b6ac4fd7cdeac91bb8425b9f9df49bda4411186b fix <fix@wikileaks.org> 1508846640 -0400  checkout: moving from polar-1.3.4 to master

It took more or less 3 days.

dirdival@pld:~$ python3
>>> import time
>>> time.gmtime(1508578973)
time.struct_time(tm_year=2017, tm_mon=10, tm_mday=21, tm_hour=9, tm_min=42, tm_sec=53, tm_wday=5, tm_yday=294, tm_isdst=0)
>>> time.gmtime(1508846640)
time.struct_time(tm_year=2017, tm_mon=10, tm_mday=24, tm_hour=12, tm_min=4, tm_sec=0, tm_wday=1, tm_yday=297, tm_isdst=0)

BTW, do you recognize the time zone? If not follow previous post. Did he make obfuscation well? Fortunately, he doesn’t know git like I do. Even if you remove or change something all data are still stored in git objects. I checked them all and I found hidden things.

Let’s now look under the hood:

dirdival@pld:/tmp/hive$ git cat-file -p  54bb023b7b95908455dddbc1b4ca984cd484e095
tree 8392a8140d924018067ef4ece3c2260c7427784d
parent 0265ca26c2c7f1765ab7c87b60a06a8a5587d2bd
author User #140 <Account.155@devlan.net> 1377009112 -0400
committer User #140 <Account.155@devlan.net> 1377009112 -0400

Added files and directories related to Eclipse IDE.

nailed:

dirdival@pld:/tmp/hive$ git cat-file -p a4bddadd3678dcdd96c075a96fa725f5b92c685a
tree 8392a8140d924018067ef4ece3c2260c7427784d
parent 6683eecc36487d8a23834a623a1f74daec899695
author Jack M <jack@neutrino.edb.devlan.net> 1377009112 -0400
committer Jack M <jack@neutrino.edb.devlan.net> 1377009112 -0400

Added files and directories related to Eclipse IDE.

It wasn’t too hard to collect all data and uncover real emails. Mapping:

User #140 <Account.155@devlan.net> - Jack M <jack@neutrino.edb.devlan.net>
User #142 <Account.156@devlan.net> - Jack M <jackmc@devlan.net>
User #217 <Account.227@devlan.net> - miker <user@hive-builder.edb.devlan.net>
User #226 <Account.233@devlan.net> - miker <miker@localhost.localdomain>
User #226 <Account.234@devlan.net> - miker <miker@stash.devlan.net>
User #226 <Account.235@devlan.net> - miker <miker@devlan.net>

How you can see on this repo work two developers, but they use different machines to commit data. According to emails both of them are team members of Embedded Devices Branch (EDB). More information about this department you can find on wikileaks, also there is a chart shows structure of CIA.

Extra speculations about developers and environment

Miker has got access to at list 2 machines:

  • hive-builder.edb.devlan.net,
  • stash.devlan.net.

The first one hive-builder looks like a machine for making binary releases. Second one stash is a repo server. I have seen many commits with description:

Merge branch 'idkey' of ssh://stash.devlan.net:7999/hive/hive into idkey

According to this speculation we can assume that miker has higher rank in EDB than jackmc.

Moreover, I compared number of commit made by developers:

dirdival@pld:/tmp/hive$ cat out.txt | grep miker | wc -l
37
dirdival@pld:/tmp/hive$ # 37
dirdival@pld:/tmp/hive$ cat out.txt | grep jackmc | wc -l
308

Jackmc made much more commits but most of them show that he is a sloppy developer, small probe of changelog:

Author: User #142 <jackmc@devlan.net>
Date:   Tue Oct 7 11:02:27 2014 -0400

    Numerous fixes. Still debugging...

commit c8fb7630c77c7e8699268b1f7fa945bb229461de
Author: User #140 <jack@neutrino.edb.devlan.net>
Date:   Mon Oct 6 15:57:19 2014 -0400

    ILM-client working, but with xterm. To change see Command.cpp:320

commit 4f0e918cd8c89aab12939bc01f850226a42d40a6
Author: User #142 <jackmc@devlan.net>
Date:   Mon Oct 6 15:01:48 2014 -0400

    Fix oops.

commit 0a37703c10d4381c8bb18bb4d7deaf2796b0d674
Author: User #142 <jackmc@devlan.net>
Date:   Mon Oct 6 14:22:12 2014 -0400

    Tweaks

commit 5f53fb84421a2c822b1cd98f475eb1e9a6e2d038
Author: User #140 <jack@neutrino.edb.devlan.net>
Date:   Mon Oct 6 09:31:18 2014 -0400

    Fix ILM-client debugging.

We have rapid development here and this descriptions are present in main branch. Interesting is his machine: neutrino.edb.devlan.net. I haven’t found more information about it.

Code

First, sad information. History of repo starts from version 2.6.1. According to initial comit repo was moved from subversion into git. It was done in the simplest way, without care about past data.

More details about releases I found in Hive 2.9 User's Guide (git object 1315c15d006e6783b208643602d85f423086e013) – created on May 21, 2015 – informantion about first release: 10/26/2010, Initial Release v1.0, Authority TDR. 02/14/2011 released v1.0.2 and according to notes it should work on all supported architectures/systems:

  • Windows,
  • Linux/MikroTik MIPS,
  • Linux/MikroTik PPC,
  • Solaris 9-10 x86 & Linux x86.

Repository hierarchy

In common directory we have bzip2, polarssl and sslOpenSouce libraries. Client is stored in client directory and server I suppose that you can guess – yes, inside server. Also, we have infrastructure, documentation with lyrics. Moreover, honeycomb is a server written in python, definitely author is not a native python developer. In user guide we can find this note: Honeycomb acts much like a traditional iterative server that handles incoming beacon connections one-by-one. I didn’t focus on this code. Finally, we have snapshot_* directories. In my opinion storing binary releases in git repo is bad idea. It’s worth mentioning that in client/ctHive/ILMSDK we have beautiful code in C++, mostly XML parser. Obviously it wasn’t made by this team. The last but not the least, there is not tests here. If you are a programmer you should know what does it mean – troubles.

client and server review

Hive developers disappointed me. I expected something interesting but we have in most places mess. Small example, master branch client/misc.c:

void DisplayStatus(struct proc_vars* info) {
   int argc = 0;
   char* message;

//   fprintf(stdout, "\n****************************************************************\n\n");
   //fprintf(stdout, "\n %sSession configuration parameters:%s\n", BLUE, RESET);
   fprintf(stdout, "\n %s%s:%s\n", BLUE, sessionConfigParamString, RESET);

/*
   if (info->listen == NO) {
      fprintf(stdout, "   TCP socket type = connect (active)\n");
   } else {
      fprintf(stdout, "   TCP socket type = listen (passive)\n");
   }
*/

   if (info->interactive == YES) {
      //fprintf(stdout, "  . Interactive mode established\n");
//      fprintf(stdout, "%s", interactiveModeString);
   } else {
      if (info->ignore == NO) {
         //fprintf(stdout, "   Automatic mode established (not ignoring errors)\n");
         fprintf(stdout, "%s", automaticMode1String);
      } else {
         //fprintf(stdout, "   Automatic mode established (ignoring errors)\n");
         fprintf(stdout, "%s", automaticMode2String);
      }
      fprintf(stdout, "%s", message);
      free(message);

Variable message is not initialized and we print something and we (sic!) release it. Madness. Let’s assume that is was accident. Maybe server looks better? Part of server/client_session.c:

//******************************************************************
/*!
 * Download a file from the local system to the command post
 * @param path - complete path and filename
 * @param size - size of file
 * @param sock - socket
 * @return
 */
int DownloadFile(char *path, unsigned long size, int sock)
{
	REPLY ret;		// Reply struct
	unsigned char data[DATA_BUFFER_SIZE];
	struct stat buf;
	FILE *fd;
	int	bytes_read, bytes_written;

	//TODO: Review and fix/remove this.
	// to silence compiler warnings. this var no longer needed because of the 
	// ssl_context declared global to this file
	sock = sock;

Just take a look on comment TODO near sock and this assignment:

sock = sock;

Could be even worse? Yes, It just a beginning. server/netstat_an.c:

//Called by beacon.c to release data 
void release_netstat_an(unsigned char* netstat_an)
{
	if(netstat_an != NULL)
	{
		free(netstat_an);
		netstat_an = NULL;
	}
	return;
}

They don’t know that assign NULL it this case has no effect outside the function. And this return on the end – beautiful.

Yet another example of poor development – server/persistence.c:

int EnablePersistence(char* beaconIP, int beaconPort)
{
//TODO: just to silence the compiler warning
beaconIP++;
beaconPort++;

	return 0;
}

I’m curious why people store that code server/decode_dns.h:

#if 0
// Broken code that isn't used anyway
typedef struct {
	char		name[NS_MAXDNAME];    // <--- this doesn't work here
	DNS_rr_data	rrmetadata;
	const u_char 	*rdata;
} DNS_rr;
#endif

Does author think that anybody uncomment it?

server/survey_mac.c:

// lower three byte are psuedo-random
mac[3] = (unsigned char) (htonl(rand()) >> 24);
mac[4] = (unsigned char) (htonl(rand()) >> 24);
mac[5] = (unsigned char) (htonl(rand()) >> 24);

Definitely, call htonl() on random() makes result more random.

And my favorite one from server/daemonize.c:

int daemonize( void )
{
	int	i;
	pid_t	pid;
	char	devnull[10];

#ifdef SOLARIS
	// set process max core file size to zero.
	struct rlimit 	corelimit = { 0, 0 };
#endif

	devnull[0] = '/'; devnull[5] = 'n';
    devnull[1] = 'd'; devnull[6] = 'u';
    devnull[2] = 'e'; devnull[7] = 'l';
    devnull[3] = 'v'; devnull[8] = 'l';
    devnull[4] = '/'; devnull[9] = '\0';

I h a v e n o i d e a w h a t I ' m d o i n g h e r e. Clearly evidence that EDB doesn’t have any code review.

Conclusion

We can laugh at poor code, but the truth is that this software works. In one year they made product able to interact with most used systems/architectures. According to user guide the most problems they have with Solaris and Linux on MIPS, quote: As of Hive version 2.9, Solaris and MIPS little-endian architectures are no longer supported. Moreover dump of repo comes from 2015 and we can assume that new versions were released. I suppose that both mentioned developers right now are only maintainers of this code. They fix bugs, don’t add new features. Also, they should keep eye on new releases of supported systems/architectures.


One thing just scared me. I expected much more from wikileaks, they didn’t make properly data anonymization. It is a real problem for whistleblowers.

Updated: