Grep versus Select-String Speedtest

Archive for the ‘Blogs’ Category

Grep versus Select-String Speedtest

Posted by

How fast is grep? Reasonably fast. Over the weekend, we were discussing on Twitter a post from Mike Haertel. Mike was the original developer of GNU grep. In the post titled “why GNU grep is fast“, Mike described the algorithm grep uses. He also provided this excellent advice: “#1 trick: GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE. #2 trick: GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH BYTE that it *does* look at.” “The key to making programs fast is to make them do practically nothing.”
This had me wondering about how PowerShell’s Select-String stacks up. Richard Minerich (@rickasaurus) brought up a good point: compiled C code is generally faster than C# code. As PowerShell rests on .NET, we can make an assumption that grep should be faster than Select-String. Mark Boltz (@mtezna) suggested running several tests of both and taking an average to get a sense of how Select-String stacks up.

If Select-String was significantly slower, then a good weekend project might be to write a faster parser. I do have the occassional free weekend and I was very curious. Today, I performed such a test. Read on to find out what I learned.
Test Parameters

I generated sample files using a sample dictionary file. Each file contained sentences of random length (5-25 random words). One in ten sentences contained the word “key” at a random location within the sentence. There were eleven sample files: 1,000 sentences, 10,000 sentences, 20,000 sentences, and so on to 100,000 sentences. (You can download the resulting test files here: grep-select-string-test.zip).

Each search was performed seven times. System.Diagnostics.Stopwatch was used as the time source. The total milliseconds elapsed was used as the time measure. The minimum time and the maximum time were dropped. The time recorded was the average of the remaining five tests.
I used the latest GNU grep for Windows, version 4.2.1 released 2012-12/18. The command executed for grepping the file was: grep “key” “file1000.txt”
For PowerShell, I used version 3 (build 6.2.9200.16398). The PowerShell equivilant of the grep command was: Select-String -Pattern “key” -Path .\file1000.txt

The host operating system is Windows 2008 Server R2 SP1 with the latest hotfixes.

 

Results

In the following graph, the number of lines in the sample files is plotted on the x-axis. The total time to search the sample file is plotted on the y-axis in milliseconds.

Lines — Grep — Select-String
1,000 — 248.2245 — 29.8712
10,000 — 1,907.8156 — 299.4792
20,000 — 4,013.5332 — 643.2678
30,000 — 6,689.0545 — 1,036.1867
40,000 — 8,419.1654 — 1,319.9755
50,000 — 10,870.3179 — 1,662.6931
60,000 — 12,487.7127 — 1,955.2525
70,000 — 15,048.1311 — 2,344.9599
80,000 — 16,623.6946 — 2,594.3496
90,000 — 16,775.1033 — 2,995.7644
100,000 — 18,697.6675 — 3,303.2918
The bottom line? Select-String is significantly faster than GNU grep on Windows Server 2008 R2. PowerShell is closing the gap between Linux and Windows shell environments.

Out and About: Incident Management with PowerShell

Posted by

Matt Johnson and I will be presenting on incident management and PowerShell at next month’s Motor City ISSA. This is part of the PoshSec initiative.

 

Incident Management with PowerShell

Have you seen the latest scare? The Java 0-day exploit that allows attackers to execute code on your computer? Now scares come and scares go. But let’s suppose for a moment your servers were infected using this exploit. How could your administrators detect the attack? How would you recover? Even better, what could have been done beforehand and how could you prevent this from happening again?

Incident Management, of course, is the security practice that seeks to answer these questions. In Windows server environments, PowerShell is the way Incident Management gets put into practice. This session will introduce InfoSec professionals and systems administrators to PowerShell’s security features. We will provide an overview of Incident Management and PowerShell. Then, using the Java 0-day exploit as a driver, we will walk through the lifecycle of an incident. The audience will leave with information on the policy and practice of managing security incidents in Windows with PowerShell.

Biography:

J Wolfgang Goerlich is the information systems and security manager for a Michigan-based financial institution. He is responsible for managing the software development and network operations team. Wolfgang’s background is in architecting new systems, securing existing systems, and optimizing performance and recovery. With over a decade of experience, Mr. Goerlich has a solid understanding of both the IT infrastructure and the business it enables.

Matt Johnson is a Systems Analyst from the Metro Detroit area. As an avid technologist and tinkerer, he is always looking to understand and improve the world around him. Matt has a strong interest in automation and the use of PowerShell. Matt founded the SE Michigan PowerShell User Group and was a judge for the last two years for the Microsoft Scripting Games. He holds numerous certifications and writes a blog at http://www.mwjcomputing.com. You can follow him on twitter by following @mwjcomputing.

 

Motor City ISSA. February 21st, 2013. Livonia, MI.

Privilege management at CSO

Posted by

Least Privilege Management (LPM) is in the news …

The concept has been around for decades. J. Wolfgang Goerlich, information systems and information security manager for a Michigan-based financial services firm, said it was, “first explicitly called out as a design goal in the Multics operating system, in a paper by Jerome Saltzer in 1974.”

But, it appears that so far, it has still not gone mainstream. Verizon’s 2012 Data Breach Investigations Report found that, of the breaches it surveyed, 96% were not highly difficult for attackers and 97% could have been avoided through simple or intermediate controls.

“In an ideal world, the employee’s job description, system privileges, and available applications all match,” Goerlich said. “The person has the right tools and right permissions to complete a well-defined business process.”

“The real world is messy. Employees often have flexible job descriptions. The applications require more privileges than the business process requires,” he said. “[That means] trade-offs to ensure people can do their jobs, which invariably means elevating the privileges on the system to a point where the necessary applications function. But no further.”

Read the full article at CSO: Privilege management could cut breaches — if it were used

Considerations when testing Denial of Service

Posted by

Stress-testing has long been a part of every IT Operations toolkit. When a new system goes in, we want to know where the weaknesses and bottlenecks are. Stress-testing is the only way.

Now, hacktivists have been providing stress-tests for years in the form of distributed denial of service attacks. Such DDoS are complementary with just about any news event. As moves are underway to make DDoS a form of free speech, we can expect more in the future.

With that as a background, I have been asked recently for advice on how to test for a DDoS. Here are some considerations.

First, test on the farthest router away that you own. The “you own” part is essential. Let’s not run a DDoS across the public Internet or even across your hosting provider’s network. That is a quick way to run afoul of terms of service and, potentially, the law. Moreover, it is not a good test. A DDoS from, say, home will be bottlenecked by your ISP and the Internet backbone (1-10 Mbps). A better test is off the router interface (100-1000 Mbps).

Second, use a distributed test. A distributed test is a common practice when stress-testing. It required to get a D in the DDoS. Alright, that was a bad joke. The point is that you want to remove individual device differences from affecting the test, such as a bottleneck within the OS or the testing application. My rule of thumb is 5:1. So if you are testing one router interface at 1 Gbps, you would want to send 5 Gbps of data via five separate computers.

Third, use a combination of traditional administration tools and the tools in use for DDoS. Stress-test both the network layer and the HTTP layer of the application. If I were to launch a DDoS test, I would likely go with iperf, loic, and hoic. Check also for tools specific to the web server, such as ab for Apache. Put together a test plan with test scripts and repeat this plan in a consistent fashion.

Forth, test with disposable systems. The best test machine is one with a basic installation of the OS, the test tools, and the test scripts. This minimizes variables in the test. Also, while rare, it is not unheard of for tools like loic and hoic to be bundled with malicious software. Once the test is complete, the systems used for testing should be re-imaged before returned to service.

Let’s summarize by looking at a hypothetical example. Assume we have two Internet routers, two core routers, two firewalls, and then two front-end web servers. All are on 1 Gbps network connections. I would re-image five notebooks with a base OS and the DDoS tools. With all five plugged into the network switch on the Internet routers, I would execute the DDoS test and collect the results. Then repeat the exact same test (via script) on the core routers network, on the firewall network, and on the web server network. The last step is to review the entire data set to identify bottlenecks and make recommendations for securing the network against DDoS.

That’s it. These are simple considerations that reduce the risk and increase the effectiveness of DDoS testing.

Incog: past, present, and future

Posted by

I spent last summer tinkering with covert channels and steganography. It is one thing to read about a technique. It is quite another to build a tool that demonstrates a technique. To do the thing is to know the thing, as they say. It is like the art student who spend time duplicating the work of past masters.

And what did I duplicate? I started with the favorites: bitmap steganography and communication over ping packets. I did Windows-specific techniques such as NTFS ADS, shellcode injection via Kernel32.dll, mutexes, and RPC. I also replicated Dan Kaminsky’s Base32 over DNS. Then I tossed in a few evasion techniques like numbered sets and entropy masking.

Incog is the result of this summer of fun. Incog is a C# library and a collection of demos which illustrate these basic techniques. I released the full source code last fall at GrrCon. You can download Incog from GitHub.

If you would like to see me present on Incog, including my latest work with new channels and full PowerShell integration, I am up for consideration for Source Boston 2013.

 

Please vote here: https://www.surveymonkey.com/s/SRCBOS13VS

This year SOURCE Boston is opening up one session to voter choice. Please select the session you would like to see at SOURCE Boston 2013. Please only vote once (we will be checking) and vote for the session you would be the most interested in seeing. Voting will close on January 15th.

OPTION 5: Punch and Counter-punch with .Net Apps, J Wolfgang Goerlich, Alice wants to send a message to Bob. Not on our network, she won’t! Who are these people? Then Alice punches a hole in the OS to send the message using some .Net code. We punch back with Windows and .Net security configurations. Punch and counter-punch, breach and block, attack and defend, the attack goes on. With this as the back story, we will walk thru sample .Net apps and Windows configurations that defenders use and attackers abuse. Short on slides and long on demo, this presentation will step thru the latest in Microsoft .Net application security.

Write-up of the 29c3 CTF “What’s This” Challenge

Posted by

Subtitled: “How to capture a flag in twelve easy days”

The 29th Chaos Communication Congress (29C3) held an online capture the flag (CTF) event this year. There were several challenges, which you can see at the CTF Time page for the 29c3 CTF. I spent most of the time on the “What’s This” challenge. The clue was a USB packet capture file named what_this.pcap.

The first thing we did was run strings what_this.pcap and look at the ASCII and Unicode strings in the capture. ASCII: CASIO DIGITAL_CAMERA 1.00, FAT16, ACR122U103h. Unicode: CASIO QV DIGITAL, CASIO COMPUTER, CCID USB Reader.

The second thing we did was to open the capture in Wireshark 1.84. (Using the lastest version of Wireshark is important as the USB packet parser is still being implemented.) We knew Philip Polstra had covered USB forensics in the past, such as at GrrCon, and Philip pointed us to http://www.linux-usb.org/usb.ids for identifying devices. We see a Genesys Logic USB-2.0 4-Port HUB (frame.number==2), a Linux Foundation USB 2.0 root hub (frame.number==4), Holtek Semiconductor Shortboard Lefty (frame.number==32, 42), a Casio Computer device (frame.number==96, 106), and another Casio Computer device (frame.number==1790).

Supposition? The person is running Linux with a keyboard, Casio camera, and smart card (CCID) reader attached over USB. A Mifare Classic card (ACR122U103) is swiped on the CCID reader. The camera is mounted (FAT16) and a file or files are read from the device.

Next, we extracted the keystrokes. I had previously written a simple keystroke analyzer for the CSAW CTF Qualification Round 2012. This simply took the second byte in the USB keyboard packets (URB_INTERRUPT) and added 94. This meant the alphabetical characters were correct, however, all special characters and linefeeds were lost. The #misec 29c3 CTF captain, j3remy, passed along a lookup table. Using this lookup table, we found the following keystrokes:

dmeesg
mmouunt t vfaat //ddev/ssdb1 /usb
ccd usb
llss —l
fiille laagg
dmmeessg
nfc-lsisst
ccat flaag \ aespipe -p 3 -d 3,,, nffs-llisst \c-llisst \ grrep uuid \ cut =-d -f 10\ dd sbbs=113 couunnt=2

There are a number of problems with this method of analyzing keystrokes. First, when the key is held down too long, we get multiple letters (dmeesg). Second, special keys like shift and backspace are ignored. I redid my parser to read bytes 1, 2, and 3. The first byte in a keyboard packet is whether or not shift is depressed. The second byte is the character (including keys like enter and backspace). The third byte is the error code for repeating characters. Using this information, I mapped the HID usage tables toMicrosoft’s SendKeys documentation and replayed the packet file into Notepad.

dmesg
mount -t vfat /dev/sdb1 usb
cd usb
ls -l
file  lag
dmesg
nfc-list
cat flag | aespipe -p 3 -d 3<<< "`nfc-list | grep UID | cut -d  " " -f 10-| dd bs=13 count=2`"

Supposition? The person at the keyboard plugged in the Casio camera and mounted it to usb. He listed the folder contents, then scanned for the Mifare Card (nfc-list lists near field communications devices via ISO14443A). Once confirmed, he read the flag from the camera and decrypted it via AES 128-bit encryption in CBC mode (man aespipe). The passcode was the UID of the Mifare Card in bytes (nfc-list | grep | cut | dd). To find the flag, we need both the UID and the flag file.

The hard work of finding the UID was done by j3remy. He followed the ACR122U API guide and traced the calls/responses. For example, frame.number==1954 reads Data: ff 00 00 00 02 d4 02, or get (ff) the firmware (2d d4). The response comes in frame.number==1961 Data: 61 08, 8 bytes coming with the firmware. Then frame.number==1966, Data: ff c0 00 00 00 08, get (ff) read (c0) the 8 bytes (08). And the card reader returns the firmware d5 03 32 01 06 07 90 00 in frame.number==1973. j3remy likewise parsed the communications and found frame.number==3427 which reads: d54b010100440007049748ba3423809000

d5 4b == pre-amble
01 == number of tag found
01 == target number
00 44 == SNES_RES
07 == Length of UID
04 97 48 ba 34 23 80 == UID
90 00 == Operation finished

The next step was to properly format the UID as the nfc-list command would display it. This took some doing. Effectively, there are 4 blank spaces before ATQA, 7 blank spaces before UID, and 6 spaces before SAK. There is one space after : and before the hexadecimal value. Each hexadecimal value is double-spaced. With that in mind, we created an nfc-list.txt file:

    ATQA (SENS_RES): 00  44
       UID (NFCID1): 04  97  48  ba  34  23  80
      SAK (SEL_RES): 00

Determining the spacing took some time. Once we had it, we could run the cat | grep | dd command and correctly return 26 bytes of ASCII characters.

$ echo "`cat nfc-list.txt | grep UID | cut -d  " " -f 10-| dd bs=13 count=2`"
2+0 records in
2+0 records out
26 bytes (26 B) copied, 6.2772e-05 s, 414 kB/s
04  97  48  ba  34  23  80

To recap: we have the UID, we have correctly converted the UID to a AES 128-bit decryption key, and we are now ready to decrypt the file. How to find the file, though? Rather than reverse engineering FAT16 over USB, we took a brute force approach. We exported all USB packets with data into files named flagnnnn (where nnnn was the frame.number). We then ran the following script:

FILES=flag*
for f in $FILES
do
    echo -e \n Processing $f file… \n
    cat $f | aespipe -p 3ls -d 3<<< `cat nfc-list.txt | grep UID | cut -d ‘ ‘ -f 10-| dd bs=13 count=2`
done

There it was. In the file flag1746, in the frame.number==1746, from the Casio camera (device 26.1) to the computer (host), we found a byte array that decrypted properly to:

29C3_ISO1443A_what_else?

What else, indeed? Well played.

Special thanks to Friedrich (aka airmack) and the 29c3 CTF organizers for an enjoyable challenge. Thanks for j3remy for captaining the #misec team and helping make sense of the ACR122 API. Helping us solve this were Philip Polstra and PhreakingGeek. It took a while, but we got it!

Happy New Year 2013

Posted by

We did it. We beat the Mayans. Welcome to 2013.

Read less, do more. That is my New Year’s Resolution. It might sound cynical or uninformed. After all, a good book can tell you a good deal about anything. Moreover, I have been and continue to be a proponent of continued learning. And yet I think it is time to put down the books and get to work.

There are many reasons.

The first reason is the wide gulf between reading about a thing and doing a thing. That first dawned on me while shivering in the mountains, wearing wet clothes and lacking sufficient food. Hey, I read about hiking! Why is this so hard? A more recent example was an OWASP hacker challenge that I completed on cross-site scripting. I read about cross-site-scripting. I know this. It took me three hours. I mentioned it to the founder of OWASP Detroit who, after much prodding, revealed how long it took him. Five minutes. The difference between doing and reading is wide and deep.

The second reason is found in the old saying: writers write. They don’t read books about writing. They don’t attend workshops about writing. They don’t talk about writing. You can readily identify a group of people in writing or any field who are procrastinating by reading, talking, planning, preparing. But not doing. Writers write. Coders code. Security professionals secure.

I have therefore queued up some exciting projects for this year. (Read that Wolfgang exciting, not normal exciting, which is an entirely different form of excitement.)

Professionally, my team and I are architecting and purchasing equipment for our third generation of private cloud computing. We are also revamping our business intelligence platform and adding self-service features.

Personally, I have two development projects in the queue. I released #incog last year for covert channels and steganography. This year, I will release an update adding new channels and a PowerShell interface. I am also working on a hacker capture-the-flag toolset called Botori. I plan to release Botori mid-year along with several example CTF challenges.

Collaboratively, I have been invited to work on the PoshSec project. PoshSec is a PowerShell Information Security project started by Will Steele, who sadly passed away this past Christmas from terminal cancer. The project lead is Matt Johnson, and other members of the team include Rich Cassara. I look forward to working with these sharp people and contributing to Will Steele’s legacy.

As I said, I will be doing more in 2013. There is lots to do and little time. But before wrapping up this article, let’s take a look back.

 

2012: A Year in Review

  • This blog celebrated its tenth anniversary. The website saw its highest readership to date: 35,361 unique visitors and 46,853 page views in 2012.
  • I did two case studies: a Microsoft case study on my firm’s second generation private cloud, and another case study on our new reporting SaaS.
  • I was mentioned in the press a few times on topics like cloud computing, risk management, and DevOps.
  • I spoke at a few different conferences and user groups on topics like — you guessed it! — cloud computing, risk management, and DevOps. I also did a handful of talks on covert channels and steganography.
  • I volunteered for BSides Detroit and collaborated on everything from sponsors to speakers, as well as recording a 23-episode podcast series for the conference.
  • I was recognized with an InfoWorld 2012 Technology Leadership Award for my firm’s private cloud and DevOps initiatives.
  • And I read a lot of books.

Done. Now, onward!

Tip: Running aespipe on Cygwin

Posted by

The Linux command aespipe is used to encrypt and decrypt on the Bash pipeline. It uses AES encryption in CBC mode. During the 29c3 CTF, one challenge required decrypting using aespipe. Being a Windows guy, I naturally turned to Cygwin. The latest aespipe supports Cygwin and installs easily enough.

$ wget http://loop-aes.sourceforge.net/aespipe/aespipe-v2.4c.tar.bz2
$ tar xvfj aespipe-v2.4c.tar.bz2
$ cd aespipe-v2.4c
$ ./configure
$ make
$ make install

Done.

North Oakland ISSA and Motorcity ISSA

Posted by

I will be presenting at North Oakland ISSA on September 12th, and at Motor City ISSA September 20th.

Turtles all the way Down — .Net Software Security. Peel back the layers of abstraction, what do you find? Software. Feel through the fog of cloud computing and what is there? Software. What powers our devices? Handles our protocols? Drives our cars? What ties us all together? Software. Every layer of our technology stack is software. It is turtles all the way down. Few things are as germane to security as software security. We will delve into software security in this session. Using C# as an example, we will see how software in general breaks and how to protect Microsoft .Net in particular. So how do we protect software? Come find out.

North Oakland ISSA. September 12, 2012. Auburn Hills, MI.

Whispering on the Wires. The Internet opened communications and enabled this flat world where everything is but one click away. These complex networks make possible rich exchanges of thoughts and ideas, goods and services. But there is, of course, a dark side. Not all communications are productive. Not all communications are visible. Some are destructive, hidden, invisible. Some messages are whispered in secret. In this session, we will delve into ways attackers can hide their traffic using steganography and covert channels. Examples will be demonstrated and potential controls will be discussed.

Motor City ISSA. September 20, 2012. Livonia, MI.

Parkour styled IT management

Posted by

What can IT management learn from Parkour?

What? You have not heard of Parkour? It is a free running style sport that you can see in action on YouTube (youtu.be/WEeqHj3Nj2c). Also, check out the Parkour episode on Fight Science (youtu.be/RBNaiNnNRfU). A short summary is Parkour devotees create an awareness and nimbleness that allows quick fluid movements over uneven urban terrain.
Parkour has three takeaways that can be applied to leading an IT team:

  • There are no obstacles, only objects used to achieve an objective
  • Decrease shocks and distribute forces
  • Conserve energy and release it at the right time

There are no obstacles, only objects used to achieve an objective. A friend of mine who practices Parkour talks about raising an understanding about environment, and transitioning from seeing only obstacles to seeing only opportunities. I like that approach. The transition begins by pausing after anything happens and asking, how can this further our objective, build our brand, improve our services?

Decrease shocks and distribute forces. It is fair to say that not everything that happens to an IT team or on an IT infrastructure is positive. So what do you do then? In the Fight Science clip, Ryan Doyle jumps fourteen feet and then lands in such a way that he dissipates the force. The resulting impact on Doyle’s body is “similar to what a normal person feels doing jumping jacks.” IT teams can and should build systems and processes that are likewise capable of dissipating political and technical impacts.

Conserve energy and release it at the right time. Parkour experts can leap across several lanes of traffic. IT leaders can fund, execute, and secure large scale IT projects. The commonality? Both build energy, save it, and release it at just the right moment. Of course, in IT, that energy is measured by political capital, brag bags, team’s skill sets, potential cost saving ideas, and so on. The trick is knowing when and how to use this energy.

Sure, we may not look as impressive when carrying out an IT project as Ryan Doyle or Daniel Ilabaca. Still, I see a lot of Parkour in the IT leaders that I know. It is the CISO who rolls the right during an outage, thus dissipating the forces on the organization and the security team. It is the guy who leverages an outage to build a business continuity program. It is the manager who cuts the budget at just the right time to obtain funding for a new business critical initiative. I see it all the time. Applying these three simple lessons improves IT.