Sorting in EnScript – Sorting Arrays and NameListClass / NameValueClass

Every language has its own quirks when it comes to sorting data. In this post, I’ll take an introductory look at some of the most basic methods available for sorting data in EnScript. First, we need a list of some type of data that we want to sort. Our first example is going to use the ulong type, which, in EnScript, is a 64-bit unsigned integer. You might use a ulong if you were storing a list of file sizes, such as those presented by EntryClass::LogicalSize().

We’re going to declare a ulong array type by using the typedef statement as seen in line 3 below. On line 18, we’ll create our array with five values. One of the nicest things about arrays in EnScript is that they have a built-in Sort function, and it takes options from the built-in class ArrayClass. Lines 22, 25, and 28 demonstrate the differences in those options. Using ArrayClass::SORTENABLED performs a default sort of the array in ascending order (smallest to largest). Adding ArrayClass::SORTDESCENDING sorts in descending order (largest to smallest). Finally, adding ArrayClass::SORTNODUPE removes duplicates from the array. This could be useful if you were trying to generate a list of unique values. Take a look through the example below and then check out the output section below it.

ulong Example:

class MainClass {

  typedef ulong[] ulongArray;

  void printArray(ulongArray array) {
    ulong curr;
    forall (ulong u in array) {
      Console.Write(u);
      if (++curr < array.Count()) {
        Console.Write("\t");
      }
    }
    Console.Write("\n");
  }

  void Main() {
    SystemClass::ClearConsole();
    ulongArray array {548, 23, 164, 87, 164};
    //Original order
    printArray(array);
    //Sort ascending
    array.Sort(ArrayClass::SORTENABLED);
    printArray(array);
    //Sort descending
    array.Sort(ArrayClass::SORTENABLED | ArrayClass::SORTDESCENDING);
    printArray(array);
    //Sort ascending, remove duplicates
    array.Sort(ArrayClass::SORTENABLED | ArrayClass::SORTNODUPE);
    printArray(array);
  }
}

Output:

548	23	164	87	164
23	87	164	164	548
548	164	164	87	23
23	87	164	548

Take a look at the output of our code above. Are the results as you expected them? The first line contains the original list in its original order. The second line shows the list after it has been sorted in ascending order, the third after a descending order sort, and the fourth shows the list in ascending order after duplicates have been removed. Pretty straightforward, right? The options are the same for sorting other numerical types of arrays, such as int, char, and even DateClass.

Next let’s take a look at sorting a String array. When you’re dealing with everything in the same case, the options will look much the same as they did above. You’ll note that I threw in a different length string just so you can see how it’s sorted. Scroll down to see the output.

String Example:

class MainClass {

  typedef String[] StringArray;

  void printArray(StringArray array) {
    ulong curr;
    forall (String s in array) {
      Console.Write(s);
      if (++curr < array.Count()) {
        Console.Write("\t");
      }
    }
    Console.Write("\n");
  }

  void Main() {
    SystemClass::ClearConsole();
    StringArray array {"abc", "abb", "aab", "aacd"};
    //Original order
    printArray(array);
    //Sort ascending
    array.Sort(ArrayClass::SORTENABLED);
    printArray(array);
    //Sort descending
    array.Sort(ArrayClass::SORTENABLED | ArrayClass::SORTDESCENDING);
    printArray(array);
  }
}

Output:

abc	abb	aab	aacd
aab	aacd	abb	abc
abc	abb	aacd	aab

You’ll see that it sorts on the first letter, then moves on and sorts on the second letter, and so on. You can see that even though "aacd" is a longer string than "abb" and "abc", it is sorted before them, because the first and second letters "aa" come before "ab" of the latter two strings.

So what happens when we have mixed case strings? You can see on line 19 that all of the array members have the letters "abc" in various mixed case. On line 23 we’re doing a standard ascending sort just like before. On line 26, however, we see a new option: ArrayClass::SORTCASE. This option will turn on case sensitive sorting for strings. You can see its effect in the output below our code.

String Case Sensitive Example:

class MainClass {

  typedef String[] StringArray;

  void printArray(StringArray array) {
    ulong curr;
    forall (String s in array) {
      Console.Write(s);
      if (++curr < array.Count()) {
        Console.Write("\t");
      }
    }
    Console.Write("\n");
  }

  void Main() {
    SystemClass::ClearConsole();
    //Create a new array with mixed case
    StringArray array {"abc", "Abc", "aBC", "abC", "ABc", "AbC"};
    //Original order
    printArray(array);
    //Sort default - case insensitive
    array.Sort(ArrayClass::SORTENABLED);
    printArray(array);
    //Sort case sensitive
    array.Sort(ArrayClass::SORTENABLED | ArrayClass::SORTCASE);
    printArray(array);
  }
}

Output:

abc	Abc	aBC	abC	ABc	AbC
Abc	aBC	abC	ABc	AbC	abc
ABc	AbC	Abc	aBC	abC	abc	

As usual, the first line shows the original order. The second line shows a case insensitive sort – if you converted all of the strings to lowercase, this is the sort you would get. The third line shows our new option – the case sensitive sort. You’ll quickly notice that the first three strings start with a capital "A" and the last three start with a lowercase "a".

The last basic sorting functionality I’ll show you is for NameListClass and NameValueClass. Both of these types inherit from NodeClass, and thus can take advantage of the NodeClass::INSERTSORTED option when inserting a new node. You can see on lines 11 through 16 that we’ve used this option when inserting our NameValueClass objects. This uses an extra line of code for each new object we insert into the list, but it allows us to perform a sorted insert instead of just the order we come upon the values. Both NameListClass and NameValueClass will be sorted on the string value of the Name() property.

NameListClass and NameValueClass Example:

class MainClass {
  void Main() {
    SystemClass::ClearConsole();
    NameValueClass stringList();
    NameValueClass foo1(null, "abc", 0, "foo1");
    NameValueClass bar(null, "abb", 0, "bar");
    NameValueClass foo2(null, "abc", 0, "foo2");
    NameValueClass baz(null, "aab", 0, "baz");
    NameValueClass qux(null, "aac", 0, "qux");
    NameValueClass foo3(null, "abc", 0, "foo3");
    stringList.Insert(foo1, NodeClass::INSERTSORTED);
    stringList.Insert(bar, NodeClass::INSERTSORTED);
    stringList.Insert(foo2, NodeClass::INSERTSORTED);
    stringList.Insert(baz, NodeClass::INSERTSORTED);
    stringList.Insert(qux, NodeClass::INSERTSORTED);
    stringList.Insert(foo3, NodeClass::INSERTSORTED);
    forall (NameValueClass n in stringList) {
      Console.WriteLine(n.Name() + "(" + n.Value() + ")");
    }
  }
}

Output:

aab(baz)
aac(qux)
abb(bar)
abc(foo3)
abc(foo2)
abc(foo1)

You can see that this works in the same manner that our string sort example did. It’s very interesting to note, however, the order that the items with the same value for Name() (foo1, foo2, and foo3) were sorted in. That is – they are sorted in reverse of the order in which they were inserted. The nodes are being inserted just before their value equivalents in the list. I actually find this quite annoying, though as long as we know the behavior we can work around it.

By now you should have a good understanding of the basics of sorting lists in EnScript. In the next post, I’ll show you how to sort arrays of user-defined class objects, and we’ll do some performance tests to see how the built-in array sorting algorithms hold up with large lists.

SANS DFIR Summit 2012 – Evidence is Data: Your Secret Advantage

Fellow Lightboxer Jon Stewart will be presenting at the SANS Forensics and Incident Response Summit 2012 on Wednesday, June 27 from 2 – 3 PM in the Senate Room. Make sure to ask Jon for a trial version of Lightgrep Search for EnCase after the presentation, as he’ll have Lightbox thumb drives with installers onsite. Please stop by for a great presentation and grab a free thumb drive.

You can read more about the presentation at the Lightbox Technologies blog.

Going to CEIC 2012? Ping me for a free Lightgrep trial!

I’m proud to announce that our company, Lightbox Technologies, will be launching Lightgrep Search for EnCase just in time for CEIC. We’ll have free thumb drives with trial versions of Lightgrep on them, so please come find us. Be sure to follow us or ping us on Twitter while you’re there! You can reach me at @geoff_black and Jon at @codeslack.

I’ll also be doing a redux of last year’s presentation, Statistical Analysis and Data Sampling, at this year’s CEIC with Jon. We’re on at 4:30 PM on Monday in the eDiscovery Lab track. You can find the description on the CEIC website:

Ever worked on a matter where you wanted to validate that the search terms were working correctly? What about when a judge requests that you testify on your procedures for this validation process? This session will show you how to take culled evidence from the EnCase eDiscovery solution and create a representative random set of data to be used in the validation process. The options demonstrated will be: the number of items to review and the percentage of accuracy. Once a random sub-set has been created, this session will show how the EnCase eDiscovery solution can be used to manually tag the items and provide reporting.

The presentation will be updated with some new features on predictive coding and recent rulings. If you’re interested in how sampling can be used to reduce review time and improve keyword results, you should come check us out.

Unfortunately we’re up against Craig Ball and Chris Dale who will be rockin’ with “The Future of Social Media in E-Discovery.” Craig recently wrote a good piece for Law Technology News entitled Gold StandardA true gold standard for keyword search incorporates both precise inclusion and defensible exclusion. He touches on keyword precision in the article, and that’s one of our primary goals with our talk – how to get the best bang for your buck with a little extra testing.

Association of Certified E-Discovery Specialists (ACEDS) Conference 2012

The Association of Certified E-Discovery Specialists (ACEDS) is a groundbreaking organization that is seeking to build the eDiscovery community through training and certification. ACEDS offers education in the form of live training seminars, access to recorded training online, and an annual conference. ACEDS is concerned about making sure certified individuals are proficient in not only one area of the EDRM, but in all. The certification covers a wide range of topics which are all important in the eDiscovery process – legal hold, collection, processing, project management and review planning, and everything surrounding them. ACEDS has partnered with organizations such as ARMA, ALSP, and ILTA. These organizations see the value in a vendor neutral certification for eDiscovery, and so do I. A lot has been written in every industry about pros and cons for certification, and eDiscovery is no exception.

Most technical and semi-technical fields have two basic types of certifications: vendor application specific and vendor neutral. I’ve seen several vendor exams for eDiscovery certifications. While those can be important for users of a specific application, they’re not always portable between organizations as there are so many different products prevalent in the market. Many of them are very specific to the functionality of the tools, and less focused on overall eDiscovery knowledge. Passing means you know how to run the application, but not that you necessarily understand the reasoning behind the actions you perform. I currently hold a vendor certification for forensics which attempts to remain balanced between general industry knowledge and tool-specific information, but the focus is definitely on the tool.

Vendor neutral certifications for eDiscovery, such as CEDS from ACEDS, don’t worry about how any specific tool tries to tackle eDiscovery. They aim to verify knowledge across multiple areas in a given discipline, without relying on how one tool functions. If you’re interested in learning more about the certification, check out the ACEDS website: What The Exam Is About. For other sources: Gabe Acevedo with Above The Law has a great analysis written just after last year’s ACEDS Conference. Dennis Kiker with LeClairRyan also wrote a well-reasoned article describing eDiscovery certification as the logical next step, and rebutting some recent criticism.

I can say from my own experience hiring forensic and eDiscovery professionals that certification is not a panacea or guarantee when choosing a candidate. What it does demonstrate, though, is that someone is interested in investing time in themselves and their chosen career field. In the case of CEDS, it shows that they care about advancing in the field of eDiscovery.

ACEDS is prepping for their annual conference at the beautiful Westin Diplomat in Hollywood, FL, April 2 – 4. The line-up is absolutely stellar. Topics include: addressing catastrophic eDiscovery events; timely items such as social media; often overlooked project management; eDiscovery malpractice risks; and of course, exam prep courses.

If you’re planning on attending the conference, enter discount code “BLACK” when you register to receive $150 off the already very reasonable conference fee. Don’t wait too long, though – the discount code expires soon!

Full disclosure: I serve on the ACEDS Advisory Board, lending my perspective on technology in eDiscovery and the intersection of eDiscovery and Forensics.