Why Agile Isn’t So Agile

Project managers are process driven people. In fact, without the processes they put in place, project managers would be out of jobs. When it comes to software development, these guys make sure developers stay on track and don’t run down paths that have nothing to do with customer requirements. Sadly, developers are known for this. Something about personalities… I still don’t get that one. If more developers learned how to understand and speak to customers, project managers would be obsolete…

I digress – The point here is, the Agile Software Development methodology, while it has it’s benefits is a tool for the project managers of the world to help enforce a process that actually is not as “agile” as it sounds.

Don’t get me wrong – for initial delivery of projects, the agile methodology has it’s benefits, but when customers hear the term “agile” they are thinking something a little different than what your local IT organization is thinking. A customer hears a PM sell them on the “agile” development team and thinks, “Once we’ve got our application, bug fixes and improvements will come immediately. This team is dedicated to my project and I can call upon them at a whim for my needs”

As a member of several of these development teams, let me break it down for you. With the agile methodology, any single customer is only as important as their initial delivery. Every development team in any decent sized IT organization has several customers to deal with. Each one of those customers is just as important as the next unless this is the first time you have encountered this customer.

The first time you encounter a customer, that customer is Priority 1. Your team (or usually just the PM and lead developer) sits down with the customer, gathers requirements, and sets a schedule for the development team to follow. This schedule is usually designed for delivery of software within 3 to 6 weeks. From that point, until the end of the period, that customer’s requirements are the only priority because the goal here is to win them over with the speed at which your group can deliver product. You’ll have your daily scrums to discuss any issues that your testers have found, you’ll prioritize specific features against user requirements, and you’ll have two or three meetings on the design/architecture of any particular feature, and somewhere in between, you’ll find time to actually write code. Usually, in a last minute push, your team somehow, miraculously pulls this delivery off just under the wire and the customer is elated.

This will last right up until your customer finds the first problem with the product you’ve delivered – so usually about two hours. Congrats – you’ve just finished your first iteration of the product and now you’ve been tasked with more to fix or improve. Lucky for you, your PM is there to block all of that non-sense. Now that the system is in production, a bug or new feature has to be put into your issue tracking system which will later be prioritized and scheduled against every other issue from every other customer .

This is the point where the agile method ends up being not quite so agile for the customers at least. Sure, your development team is running through about 50 features every 6-week Sprint. And yes, you are spending 50+ hours get all of the tasks you have in this Sprint done, but no single customer is feeling the love because you’re not delivering in an agile enough way specific to their project.

Look at it from the customer’s perspective. You have moved from developing and delivering a full blown production system in 6 weeks to now delivering 2-3 bug fixes and 2-3 feature improvements (of the 30 they’ve asked for) every six weeks. You no longer appear to be this agile team that they were sold on. In a lot of cases, the application is turned over to an operations team that doesn’t know what it takes to keep the application running in production and can’t fix bugs as quickly or as effectively as the development team.

The new and emerging trend is the concept of a “Dev-Ops” team. “Dev-Ops” teams are development teams that are integrated with the operations team to effectively manage issues as they occur in production. I find the trend interesting as a few of the teams that I have been involved in have been doing this for a very long time now. We’re able to mitigate production issues quickly and effectively because our development team is also the operations team. We are not bound by the order of the Scrum and Sprint. It’s a process that seems to work well… that is until you decide to form a process around it, which I do not doubt will happen.

Process is important and the Agile Software Development Methodology is not all bad, but looking at it from the customer perspective it could be more “agile”. I’m in favor of these dev-ops teams, mostly because in my experience they seem to work more effectively for the customer, and if you ask me, how the customer feels is better than any process that makes an IT organization look good.

Posted in Application Development, Programming | Tagged , , , , , , , , , | Leave a comment

1 Millisecond Is Too Slow

“640K ought to be enough for anybody” – This quote from the 1980s which may or may not be attributed to Bill Gates is quite laughable quote these days. We live in a world where exabytes of data isn’t even enough to classify the amount of data we will consume as a world in the next five years. For the more non-technical readers out there a exabyte is roughly equivalent to 9×10^15 a kilobyte. To break it down just a little more, that’s about 10,000 terabytes of information.

Twitter has about 100M updates per day. We won’t even begin to guess how many updates Facebook has. Blogging is a ubiquitous term so there are plenty of those out there. What I’m getting at is there is a lot of data out there just waiting to be analyzed and analyzing data at these volumes is no trivial task.

Before I end up in a rant about big data let me get to the point. Analyzing data at those volumes takes time. Let’s look at Twitter for instance – 100 million tweets per day. Tweets are relatively small records – 140 characters plus any additional meta-data about the user and retweets and such. Let’s say it takes about 1 millisecond to process a single tweet. 1 millisecond 100 million times is 100,000 seconds, which is about 1667 minutes, which is about 27 hours. So at 1 millisecond per record it would take just over a day to process 1 day worth the tweets. My point – 1 millisecond is too slow.

Enter “the cloud”. Cloud is probably one of the most overloaded terms in the technology space today so let me apologize for using it and explain what I mean by the term. I’m talking about horizontally scaling your architecture in order to process these large volumes of data in parallel.

You have to be smart about how you do this. If you have a web service that can only handle two requests at a time and takes 600ms to process data, scaling out to hundreds or thousands of servers does you no good. The bottle neck still exists at the web service.

We are no longer living in the days where 1 millisecond is considered fast. We live in a world of instant information. 27 hours to process yesterday’s data is unacceptable.

So how do you fix this? You optimize your processing algorithms. This is my call to software engineers everywhere to start optimizing your code and prepare for scaling so that your code meets the demands of today and tomorrow. Eliminate bottlenecks in your code and make those that you cannot remove scale horizontally. Stop accepting “good enough”.

It’s a tall order coming from a small fish low on the totem pole. There is a completely different mindset that software engineers have to switch to in order to achieve this. You have to stop thinking about solving problems in a single threaded manner and move to thinking about problems in parallel.

It’s a new age with new rules. Take the advice or leave it, I’m just a guy that deals with big data on a daily basis.

Posted in Age of Information, Application Development, Big Data, Data Analysis, Large Scale Systems, Programming | Tagged , , , , , | Leave a comment

Anonymous

I’m going to start this one off by saying that I know I am setting myself up here for a full onslaught of attacks, but there are some things that cannot be left alone. That being said, please go watch this video before reading the remainder of this article.

To members of the “hacker” community, this post actually may come off as a little sacrilegious to some, but I ask that you hear me out before making any quick judgements. It is very easy to be sympathetic to this group given what it claims it stands for. This video is actually propaganda at it’s finest. It appeals to every soul that sees the government as the big-bad attacking some small innocent group that just wants to be heard. It’s brilliant in all honesty, but it is nothing more than propaganda.

I’m not going to sit here and argue about legality of what they do – that would just be silly. What I will contest is the questionable morality and hypocrisy of what is being done. I have no problem with a group or person taking a political stance and discussing or openly debating it. I have no problem with a them enabling others to speak out for themselves (all things that Anonymous has done has a hacktivist organization). Where I begin to draw issue is when any organization decides to take action that is damaging to others, whether that is the opposition or innocent bystanders caught in the cross-fire.

The organization has been known to help out in cases where people would have not been heard had Anonymous not given them the ability to speak, but Anonymous supporting an organization like Wikileaks is when I first began taking issue. While Wikileaks claims to be all about exposing the truth that “the people to deserve to know” – what they are actually doing is putting lives at stake. Without getting into the politics behind it all, I’ll just say this: when what you are doing puts innocent lives at stake, you are no longer acting for “the greater good”. At that point, you are self-focused on a goal. Morality says this is wrong.

Anonymous has been known in the past for it’s usage of DDoS attacks and defacing of websites to shutdown the voices and services of others. They silence the opposition. The justification that they use is just silly. Claiming that “Arresting somebody for taking part in a DDoS attack is exactly like arresting somebody for attending a peaceful demonstration in their hometown” is a wildly false statement. Performing a DDoS attack, or defacing a website for comments that are not favorable to your cause is more like bombing and abortion clinic, or spray painting defamatory remarks on your neighbor’s home.

In some cases this can be acceptable (though ill-advised and illegal) – for instance when you are defending yourself from being attacked as they did with HBGary Federal, but attacking groups that have no means of defending themselves from such attacks, or the means to retaliate other than to involve the law (which you then attack them for) is hypocritical and immoral.

Why did I feel the need to talk about this tonight? It is no secret that support that the hacker community. I even support the ideals that they claim to stand for. I support the enabling of groups that have no ability to speak out for themselves.  What I do not support is the hypocrisy and immorality of the actions that the organization tends to take.  As it turns out, what they end up being is no better than those that they claim to stand against.

That’s my venting for the evening. Bring on the attacks.

Posted in General, Politics, Uncategorized | Tagged , , , , | Leave a comment

What Users Don’t Know Will Hurt Them

There’s an old saying, “ignorance is bliss”, that I’d like to add an addendum to today. The quote should be “Ignorance is bliss, until that ignorance hurts you”. In the IT world we have the tendency to build systems to the specifications provided by the “customer”. I quote customer like that because although there is usually a requirements group that provides the specifications to us, the real customer of the applications that we build are the end-users. (As a complete aside -  I’ve never been a fan of requirements groups. They rarely get the customer requirements right, and can never really explain why the end-user “needs” something. Another conversation for another day.)

I bring this up because usually what happens is a number of requirements are defined, the system is built to those specifications, the end-users are given a training on the most common features, and are pointed to documentation that they will never read for more advanced features. Even worse is when a user is “voluntarily” enrolled in some system as a part of some contract that they signed. The worst case is when a user signs up for a system, and is completely unaware that their information is also being used by several other systems. Users are often times harmed by not knowing how to protect themselves in these systems, or when they do not know what is in the realm of the possible.

Here’s just one specific example. Recently, I was planning on getting a birthday present for a friend of mine from college. I knew she was going to be at her parents place for her birthday, so I wanted to have the present sent to her home. The problem with that was I didn’t know her home address. Virginia Tech has a system that the students typically refer to as “Hokie Stalker”. You can search for a person by name and it returns their local address, home address, major, phone numbers, and e-mail address if they have not elected to suppress that information. The system is actually a public system, so anyone can go to the Virginia Tech website and search for any student and get all of that same information.

Needless to say, she got her present, but was curious as to how I got her home address. I explained it to her, and then explained that she could suppress it by clicking a checkbox in her account. The problem here was two-fold. She was unaware that I could even get that information and also unaware that she could hide it. Luckily, I was a friend just trying to send a gift, but the situation could have been a lot worse. Just by having a name, I could launch a very effective social engineering attack on some unknowing student. Knowing a major, a home address, the school they attend, and an e-mail address, I can make myself sound like a valid authority and request additional information.

A more interesting example deals with security in browsing the web. It is common these days that users know to look for the little lock in the bottom of their browser before entering personal information or credit card details, but they don’t really understand what that lock means. They assume that if the lock is there, then the site is secure and they can safely enter information. They also know to look for the “https” in the URL bar of their browser. While they know to make these checks, one thing that users are still very bad about is reading pop ups about security certificates. A user is trying to get to a site and this annoying pop-up prevents them from getting there – the auto-reaction is to click “Confirm Security Exception”. The user does not understand that a website can sign it’s own certificates and that if they accept these certificates, the browser will do as they say and treat this site as trusted thus showing them that lock that makes them feel all warm and fuzzy inside.

Browsers have done their part in attempting to explain to users what they are doing, but unless the user is security conscious, they don’t bother reading it. Some things are just beyond our control. Sure we can provide and require certain security trainings on the job, which hopefully employees will take and apply in their personal lives, but not every user of the Internet is granted these learning experiences. There are several other examples of users being unaware of how systems actually function and how these things can hurt them. Facebook privacy is one that we’ll leave alone today because it’s almost like beating a dead horse with a stick, but the point is users lack of awareness can and will hurt them.

Whether it be someone using information the user could have hidden for malicious social engineering attacks or a website claiming to be a user’s bank by providing a self-signed SSL cert, users can and will be attacked when they are unaware of what is possible. The question is how do we protect them from every threat? Productivity would certainly be lost if we explained every system in full detail to every user. That is just not a feasible solution. Perhaps the answer lies in how we present documentation to users. If documentation is hidden underneath layers of pages, then we can expect that users will not find it. Should we make documentation apart of the entire user experience with hints and tip boxes? Would that deter users from using systems? It’s an interesting question that I do not have the answer to. I do know, however, that as long as users remain ignorant of certain features of the systems they use, they are more likely to be attacked.

Posted in Application Development, Privacy, security, social networking, Tech Ed | Tagged , , , , , , , , , , , , , , , , | 1 Comment

Cyberwar or Cyberhype?

Over the past week, the blog-sphere of the computer security world has been ambushed with some serious discussions about whether all of the talk going in the nation about the threat cyberwar is nothing more than hype. Some would argue it is being used as a scare tactic to push political and agency agendas, while others would argue that it is a valid and prevalent issue. In the world of information technology, this is often times an issue – important concepts, ideas, or issues are over-hyped and then dismissed. In some cases dismissing technology hype completely is valid (see NoSQL), while in others it could be very dangerous.

Richard Bejtlich over at TaoSecurity summarizes the argument of those that believe it’s all just cyberhype nicely:

Their argument is simple.

  1. The government wants to control the people, or obtain a resource, or pursue some objective that could not be reasonably achieved if transparently presented to the citizenry.

  2. The government “propaganda machine,” sometimes in coordination with “the media” and “big business,” “manufactures” a “crisis” whose only solution is increased government power.

  3. The people acquiesce in order to preserve their safety, and the government achieves its objective

It’s not too far-fetched to believe that politicians and intelligence agencies have some agenda of their own. It is also not too far-fetched to believe that the government uses propaganda and scare tactics to push those agendas – but that does not mean a threat does not in fact exist. Bejtlich goes on in his article to state that the cyberwar is in fact real, regardless of if it follows the traditional definition of “war”. In a followup article, he continues to support his argument by using a variety of political frameworks for defining what actually constitutes as war.

While Bejtlich obviously believes that cyberwar is in fact real, others such as renowned security professional Bruce Schneier has a different take on it all. In one of his recent articles, Schneier argues that the threat has been exaggerated. A number of government officials, have been quoted as saying that the cyberwar is a real and prevalent threat. According to Schneier, “…the entire national debate on cyberwar is plagued with exaggerations and hyperbole.” Schneier goes on to explain several examples of the overuse and misuse of the term cyberwar and states that we are in fact not a cyberwar. He believes that we should have a Cyber Command and be prepared for war having improved cybersecurity, but says that there is no more of a threat of a threat of a cyberwar than there is a ground invasion.

While Schneier presents a few valid and convincing points, I largely agree with Bejtlich, in that the cyberwar is in fact a real and an important threat that most certainly needs to be addressed. Yes politicians use rhetoric to sell the public on the need for change in policies. And yes agencies do oversell the threats to push their personal agendas. With no real definition for who has power in the case of cyber attacks, it is no surprise that every one wants control. Even if the threats are somewhat exaggerated, it does not mean the threats do not exist.

Attacks on classified networks, whether these be denial-of-service attacks or attacks used purely to obtain information are real threats. If it is known that our networks are vulnerable and not defended, a foreign agency can use this fact to their benefit to prevent communications when we really need them. It is surprising that Schneier would dismiss the attack on Estonian websites in 2007 as “simple hacking”. A denial-of-service attack, while simple in execution can cause a tremendous amount of damage when mission critical services are interrupted. Even if networks are being attacked as simple proof-of-concepts, it poses a real threat. Reconnaissance is the first step in covert warfare. This fact does not change in a cyber arena.

I feel it is hard to argue, knowing that our networks are being attacked on a regular basis (regardless of where the attacks are originating from), that there is not a cyberwar going on. It is imperative that we defend our networks, and imperative that we understand the consequences of failure. This is not to say I fully agree with the media and politicians who exaggerate the actual threat. There is no question of whether or not the threat has been exaggerated, but that does not mean we should entirely dismiss the threat or that a cyberwar does in fact exist.

Posted in Politics, Privacy, security, Uncategorized | Tagged , , , , , , , , , , , , , | 1 Comment

Are We Witnessing The Death Of Privacy?

While the death of privacy may seem like a far-fetched concept, particularly in the United States, it really is an idea that we should be paying attention to.  To be perfectly honest, it was an idea that even I dismissed just a few months ago, but it has slowly been beginning to scare me a little more as I pay more attention to the generational differences.

What really sparked this paradigm shift in my line of thinking about the security of privacy was a conversation I was having with my good friend John a few months ago.  We were walking through our local Kroger late at night as college students tend to do and discussing the uselessness (or usefulness as my friend argued) of Facebook Chat (since then you can now integrate FB Chat with clients so I use it all the time… They were listening to my complaints). I was explaining how I never use it because it requires me to be locked in the browser, and there is no way of really being notified of a new message if I happen to be on another desktop or window etc… I’ve always been a big fan of clients for services.

My friend responded to me that younger generations find tremendous value in it due to the fact that they do not see any reason anyone would ever use a “Screen Name” to talk to their friends.  For those of you reading this wondering what I’m talking about, it was a trend made popular by AOL’s chat service from back in the 90s. I brought up the anonymity on the net argument, and his response was a simple one that caught me off guard:  “People don’t care about that anymore”.  I was unable to respond to that mostly because after thinking about it for a few minutes I realized it was true.  Even back in my high school days, I had friends who would post everything about their lives on the internet without thinking about potential repercussions.

The Social Networking Problem

The whole idea bothers me really.  With all these social networking sites like MySpace and Facebook, and blogging on the rise, people have this tendency to share everything. Then we have Twitter and now you have people constantly posting about their lives.  Don’t get me wrong, these tools are great, but is it really okay… is it really safe for us to be so willing to share everything about ourselves to the world?

I personally hide myself as much as possible on these sites. I use them for keeping in touch with people that I know. Not for meeting random people on the internet. You still can’t trust that the person on the other end is who they say they are. Even with me only adding or sharing information with just my friends, I still limit that for several reasons: 1) My prior statement remains true – I can’t verify that my friend’s account hasn’t been hacked, or if it’s being used by a friend that they shared a password with (another point we’ll come back to), 2) By putting information on these sites, I’m putting a lot of trust in the site that is hosting the information. Facebook openly sells information. At a point, any Facebook employee had access to information for ANY user. That’s too much trust.

Another little known fact about Facebook -  they literally track and keep a history of everything you do while on the site. Every page view, picture view, wall post, message sent, even attended, group started, ad clicked, chat conversation had is logged and stored. With the right kind of analysis on this information, you could generate a pretty accurate profile of a person.  To be honest, I wouldn’t be surprised if Federal agencies aren’t already doing such things. Big brother isn’t the government, it’s Facebook.

Location Based Services

Facebook isn’t the only criminal here though… Let’s talk about Twitter, Google, and the iPhone for a minute. Perhaps it’s just me, but Location Based Services seem like the most unsafe idea ever. Yes they provide a level of convenience and context to situational events, but there is one major problem with the implementations that we’ve seen with the applications that have been produced – They give people the ability to stalk you. Think about it. Google Latitude is built for broadcasting your location to your friends (or the world if you want). Twitter has location based services so when you tweet, your location can also be shot off (don’t worry it’s an opt-in system… which is even scarier considering the number of people who use it). The biggest criminal, however, has got to be Foursqaure.

Foursqaure, for those of you that don’t know, is an application that asks users to share their location. The real crime is the way in which they convince users to do this. If you share your location, every time you go back to a particular store or spot, you “check-in”.  If you check in more times at a particular location than anyone else you can become the “Mayor” of that location! How fun! Except for now that you’re broadcasting your location, and where you spend most of your time, if I want I can build a nice profile of when you’re not at home so I can rob you, or stalk you without ever having to leave my home. Grats!

Grocery Stores

Grocery stores are also adding to the privacy problem. Particularly in this current economy, it is really easy for grocery stores to get you to sign up for these free cards that give you absolutely great discounts on items you buy in stores. It is very uncommon to find a grocery store that doesn’t offer these. It wasn’t really apparent to me what kind of implications this had on privacy, however, until about a year ago. I received a phone call from my local Kroger informing me that Nestle Toll House had recalled a number of its products (cookies) due to some issue with them (I don’t remember specifics) and that I was receiving the phone call because I had purchased these products in the past few months. My train of thought went something like this: “Oh wow, that’s awesome that they called me to let me know… I hope I don’t get sick… wait a second how did they know I bought those cookies and how did they know how to get in touch with me…”

Then it hit me. I signed up for one of those cards when I moved into the area because I wanted to get those discounts. Part of signing up is providing your phone number (which they say is so you can not have the card and still receive the discounts), but it actually serves multiple purposes. They want to be able to contact you. You receive ads in the mail because you also provided your address. They’re also selling your information to advertisers. We don’t care though, because we get those discounts.

Generation Z

For starters, this isn’t my label. This is the label you were given based on when you were born. Generation Z refers to all of those born between mid-1990s through 2009. There’s a reason the theme at last years Defcon was blame the 90s. It’s funny… I have younger siblings that were born during this time frame that (at least for the moment) seem to know better than to share everything about themselves on the internet. That know better than to give a boyfriend or girlfriend their passwords. That could also be due to the fact that I shove security down their throats on a regular basis, but that can’t be proven.

Fact of the matter is, a lot of these Gen-Zers are out there doing exactly those things that I mentioned. They do so without thinking about the repercussions of sharing everything about yourself with the world. Without thinking about the damage that can be done by some disgruntled friend or ex. They’re being led by bad models of privacy and just accepting them because they simply do not know any better. Is this due to a lack of education by my generation? Generation Z is following along with these bad models of privacy which are essentially killing the concept slowly, but surely.

CEOs and Privacy

Know what’s really scary? When CEOs don’t think privacy matters. Especially CEOs who run companies that pretty much own every piece of data that is shared on the internet. I’m looking at you Eric Schmidt and Mark Zuckerberg. Let’s start with Zuckerberg… Mark is a young twenty something CEO who started the most used social networking site ever. The site has exploded since its inception and now gets more traffic than even Google. The site I’m referring to of course is Facebook. Facebook has been under a lot of heat in the past (and even today) about their privacy policies. They keep changing the policy so that information is shared, and can be sold. As I said in another post, this is nothing we shouldn’t expect from Facebook as a company because it is just that -  a company.  My issue comes when CEOs such as Zuckerberg say things like “We view it as our role in the system to constantly be innovating and be updating what our system is to reflect what the current social norms are” when his views of those social norms are slighted towards the benefit of his company.

Perhaps that is a little harsh. Let me phrase that a little differently. Zuckerberg claims that the social norms of what people will share and with whom they will share that information have changed, but the fact of the matter is Facebook has led that change. Over the years, every time Facebook updated their privacy policy, there was an uproar of sorts from their users (or at least the ones who cared to pay attention). Leading the masses of sheep who aren’t paying attention into a dark hole and claiming that it is the social norm is a tad twisted.

Even worse than Zuckerberg, however, would have to be Eric Schmidt. Schmidt is Google’s CEO, and in an interview earlier this year in response to a question about whether or not people should trust Google as much as they do he says “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.”  Now, that would almost be a valid statement except for the fact that it just isn’t.  There are a million and one different ways I could blow that statement out of the water, but we’ll save time and skip that. [Insert your own example here]. I gotta give Schmidt credit though – at least he doesn’t lie. “But if you really need that kind of privacy, the reality is that search engines – including Google – do retain this information for some time.”  When the CEO of the company that literally owns every piece of your life (think about it  – maps, e-mail, chat, code, everything…) is bold enough to just tell you, “we keep your information, your privacy doesn’t exist as far as we’re concerned”, maybe this whole notion of privacy is becoming a novelty. 

Is Privacy Dying Before Our Eyes?

I would like to think that at some point, people will begin to realize how much they are really exposing to the world and how dangerous it is. I would like to think that these past few years will be something that we look at as a quick slip in the future. What scares me the most is the fact that I know what I personally can do with the information people share out their on these sites… And I’m no Kevin Mitnick. I’m just some guy who happens to think about things from a security standpoint. If I were a worse person, lives could be destroyed and identities stolen very easily.

By all accounts, privacy does seem to be slowly dying. I hope – for all our sakes – that there is some major awakening that reminds people why privacy has existed in the past. Why it is not always best to share everything about your life with the world. I am not saying we need to “fight the power” and destroy Facebook, Google, Twitter and the like. They are all great tools – but only when used in safe manners. If privacy dies, we’re welcoming a world of chaos with open arms.

Posted in Application Development, General, Privacy, security, social networking, Tech Ed, Web Development | Tagged , , , , , , , , , , , , | 1 Comment

NoSQL – Not The End Of RDBMS

There has been a lot of noise on the web recently in regards to the death of relational database management systems.  This is not the first time there has been such sacrilegious chatter, but it is the first time that developers as a whole are really starting to pay attention to it.  There is good reason for everyone to start paying attention to the NoSQL movement, but it is not the end of relational database management systems.  That being said, in this article we are going to take a look at what exactly NoSQL is all about and how it can be beneficial.

Before we jump into NoSQL, let’s talk about relational database management systems (RDBMSes) and why they are, and have been used.  In a RDBMS a database is comprised of tables, which are comprised of rows and columns.  Each row of the table is considered a record with a value for each one of the columns (though some of those values may be blank or NULL). Relational databases currently run the world whether you are talking about an online e-commerce site, the next big Web 2.0 Social Networking fling, or major enterprise applications. It is a great way to keep unrelated information separate while preserving the ability to link to semi-related information pertaining to a specific user. In short, relational databases are great for structured data.  There is of course one major caveat to that fact… Your data has to be structured.

Anyone out there who has spent more than five minutes designing a database knows the pains of building a schema that is efficient for the task at hand.  Data modeling, despite what many may think is a non-trivial task. If you are thrown a large set of data about customers, products, and sales for a online retailer,  with relational databases you do not want to be storing all of that information in the same table. You want to keep your product information separate from your customer information, and separate from the sales transactions. There are a number of reasons you want to do this such as avoiding duplicate data, giving data context, scalability, and security of information to name a few.

So what’s the problem?

One of the major problems with relational databases is their limited ability to scale. You typically have to scale “up” instead of “out” to get better performance with databases that are hit on a very consistent basis. That is to say, you have to throw more ram, faster processors, and hard drives with better IO at the database server to get optimal performance versus spreading the load across multiple servers. Granted there are things like Memcached out there to assist with the scaling out issue, but it is not always going to be the most optimal solution.

Performance is another big one that needs to be addressed. Just about every relational database out there is stored on disk and as everyone knows, disk IO operations (unless you’re using SSDs) are really expensive.  With transactions occurring constantly, these hits will eventually wear the disk down and require replacement, or at the very least be slow with large concurrent user bases.

There is also, of course, the issue of a relational model not always being necessarily the right model for the job. Think business intelligence and reporting tools. These tools just want a view of the data for analytical purposes. In order to get the information they want, large queries that run across multiple tables are written with various joins and specific rules for lack of information etc.. At the end of the day this is a cumbersome process that takes a large hit on the database for an analytics tool.

Enter “NoSQL”

I want to start by saying NoSQL is a terrible name for this movement. To be honest this movement is not really anything new, its a rehashing of old ideas that is making leaps and bounds due to the current tech-buzz: cloud computing.  The idea is to move away from relational databases and move into unstructured databases. For a lot of DBAs out there this is going to sound excessively sacrilegious and you may want to hang yourselves while reading it, but give me a few minutes and I promise you’ll regain your bearings.

Here goes: Unstructured databases lack the concept of tables. In fact, they lack the concept of columns, or schema in any sense. There is no data modeling with unstructured databases. You have one table filled with records with varying numbers and names for fields in each record. (I told you, give me a few minutes, keep breathing, you can get through this).  The idea is data does not always need to have a specific structure. There is no point in having fields in a record that have no value. That’s just taking up space (space is still reserved for a record, used or not in relational databases).

Indexing essentially becomes a hash map. Key value pairs. You give it a key, and it returns a record (or document) that has whatever fields it has and nothing more. Again this is all in one table. Think about this from a large dataset perspective. I need to get information in a single record. I know exactly where that record is in my dataset thanks to my key. Searching for it is a trivial task. We’re not doing the unnecessary look ups for data as done with B-Trees (how most indexing systems in relation databases are done).

“NoSQL” databases are being designed to reside on multiple servers. Think Amazon’s EC2. Large datasets in the “cloud” for processing. Replication is literally built into these systems, so no more of the master/slave type deal. Most of these NoSQL databases are being built to run in memory with the ability to persist on disk. That means less disk IO operations, thus saving you money in the long run. Virtual servers with shared data on a SAN anyone?

Another big benefit that I see with NoSQL is from an application design perspective. When designing applications, you can be a bit more generic. There is no need to know about the schema of a database. You build the application generically based on the data you receive from a particular record. Some app developers out there might be bothered by this concept, but if you start to really think about it, it saves you time in the long run. Reusable code for varying datasets.

This is not the end of RDBMS

All that said – this is not the end for relational databases. Not even by a long shot. What we have here is an opportunity to look at a different way to handle large datasets. A way to really take advantage of cloud computing. Should people be paying attention to the “NoSQL” movement? Yes, but let us make sure we are paying attention to it for the right reasons. From a development standpoint, this is another tool we can add to our arsenal. It is a powerful tool, but one that comes with a huge responsibility.

That responsibility is knowing when to use it. “NoSQL” databases are not always the answer. Relational databases will more times than not solve the problems you are looking to solve. It happens a little to often that we tend to hop on the bandwagon of technologies just to be early adopters. I don’t want to see a ton of “NoSQL” fanboys out there throwing it at everything they see. Be aware that “NoSQL” exists, and that it can potentially be very useful in the right situations.

Posted in Application Development, Databases, Tech Ed | Tagged , , , , , , , , , , , , , , , , , , , , , | 2 Comments

Can Policy and Power Be Mutually Exclusive?

Two nights ago while snowed in, one of my roommates and I got into one of those interesting political discussions that you always seem to have while in college.  It started off as a simple enough debate about whether or not capitalism is fair, and if not what type of economic system would work better.  As is the case with every political discussion, this did not end with just the discussion on economic systems. At some point we found ourselves discussing the United Nations and its authority to enforce any agreements made among the nations involved.  My roommate made an interesting point saying that while the UN may have authority they have no real power.  The UN itself does not have a body that it can use to enforce these policies, as it requires the participation of members of the UN to actually enforce.  Additionally, these members have the option to elect not to participate in enforcing certain policies.  Without the participation of the major members, the UN virtually has now power. The question then becomes can you have the authority to create policy without the power to enforce it?

The question itself is not a new one at all, and is in fact one that I have considered in other scopes (particularly in the security policy arena), but this discussion we had got me thinking about the issue in a different way. I want to start by discussing authority and power as completely separate ideas briefly.

Authority implies having the right or authorization to do something, whether that just be the authorization to grant someone else permission to do something or, the authorization to actually perform a task yourself.  This can be seen in a wide range of examples, but to give a specific one, consider a DBA working with a group that requires limited access to information. That DBA may have the authority to grant CRUD permissions on specific tables, but may lack the authority to perform those CRUD operations his or herself.

Power, on the other hand, is more focused on someone having the ability to do something with or without the appropriate authorization. For instance, continuing with the prior example, a Data Analyst may have the ability to perform those CRUD operations once granted the authorization by the DBA; however, a hacker may also have the ability to perform these very same actions without having the appropriate authorization to do so.  To be clear, power has nothing to do with how the ability was obtained, but just the fact that the ability exists.  Instead of using a hacker, we could have used the DBA who may have the power to perform those CRUD operations, while lacking the authority to use them.

With a better understanding of authority and power, let’s take a look at policy.  A non-technical manager decides to implement a security policy for data access which restricts the DBA from performing any operations other than granting permissions to user approved by the manager.  The manager is implementing this policy because any unauthorized access to the information could be detrimental to the organization.  Bear in mind, that being non-technical, the manager has no way of enforcing this policy other than entrusting the DBA with the job of ensuring that only authorized users have access to this information.  The DBA has the power and authority to grant permissions.  The DBA also has the power to access this information, but according to the security policy he does not have the authority to access it.  How does the non-technical manager go about enforcing the policy effectively without having to just inherently trust the DBA?

An old saying is heard time and time again in the security world: “Trust, but verify.”  So the manager decides that he will do regular audits of who has accessed the data and require the DBA to provide these access logs on a weekly basis to ensure that the policy is indeed being enforced.  The DBA has the ability to alter these audit logs prior to sending this report, so this becomes an ineffective strategy very quickly.  The only other option this manager has is to hire someone whose job is to monitor access to data by all users including the DBA.  This will allow the manager to be informed of any unauthorized accesses and handle each case, but it only does so after the breach has occurred.  If this were mission critical or otherwise highly sensitive data, the damage would have already been done before the manager could take administrative action.

In this example, the real power to enforce the policy actually lies with the DBA not the manager.  The manager must trust some third-party to ensure that his policy is enforced.  Now this example has a solution, though it may not always be a feasible one.  The manager should only hire a DBA that he trusts absolutely to enforce this policy the way he sees fit.  What if the issue were instead, that users had a way to circumvent a policy.  A perfect example would be an organization that restricts users from loading executable files from external media to their machines.  The way this policy is enforced is by having as a part of a mandatory group of installed programs on the machines a Antivirus program that scans any attached media and quarantines any file that meets a specific set of criteria.  A user, intent on having a certain executable file loaded on their machine, circumvents the policy by turning off the Antivirus software until they can get the program loaded (real example).  The obvious solution is to prevent the user from being able to turn the Antivirus software off, but the user needs to have the ability to do this in special cases without having to go through hoops so you can not include this in your policy.  This is the Catch-22 that many policy makers find themselves in on a daily basis: having to give users power without authorization, or deny access without the power to prevent it.

I want to close the discussion from a technical standpoint with a few thoughts on how the issue can be addressed.  As technologist, we should be creating technology that allows policy makers to enforce whatever policies they want to make.  If that means getting as granular as the technology will allow you to get, then that is what needs to be done.  There is no reason from a technical standpoint that policy makers should not be able to enforce policies they make due to a lack of technical knowledge.  That being said, we must also be wary of allowing non-technical policy makers to make uninformed policy decisions in regards to technology.

More generally, it is still difficult to find a solution.  If an organization has no direct power to enforce a policy, do they actually matter?  At that point do the policies become just a mere suggestion?  If there is no consequence, that will be incurred due to the lack of power to enforce policies, then what stops the members affected by a policy from breaking it?  The technical issue can be solved by giving policy makers the power in a way that makes sense to them, but you cannot create power from thin air.  If the power to enforce a policy does not exist, the policy is nothing more than words.

This conversation can be taken somewhat literally, so I want to be clear on a few things. I do not advocate breaking policy just because policy can be broken.  To do so would be to support the breaking of laws that cannot be enforced and to promote illegal actions such as piracy.  What I am suggesting however, is that policy makers take a harder look at the policies they are trying to implement and only implement those which can be enforced.  This does not mean laws and policies have to be thrown out, just that they adapt to the powers that exist.

I’m interested to hear your thoughts on power and policy. Let me know what you think.

Posted in Uncategorized | Tagged , , , , , , , , , , , | Leave a comment

Whose Responsibility Is Privacy?

The one thing that Facebook has consistently pissed users off about over the years is Privacy. The Electronic Privacy Information Center (EPIC) filed a 29-page complaint with the Federal Trade Commission (FTC), claiming that Facebook mislead its users with the recent updates to privacy. The complaint pretty much says that the changes are confusing to users so instead of keeping their information safe users end up losing jobs, being embarrassed etc…  While the social media giant has made some tremendous screw ups in the past in the realm of privacy, I think it’s about time we cut Facebook some slack (just a little though…)

I read through the complaint which pretty much goes over the history of Facebook’s Privacy changes pretty accurately (albeit with a pretty heavy bias). I encourage you to read it on your own. I’m going to skip going over all of that and skip right down to the basis on which EPIC is filing this complaint (towards the bottom of page 23):

98. Facebook is engaging in unfair and deceptive acts and practices. Such practices are prohibited by the FTC Act, and the Commission is empowered to enforce the Act’s prohibitions. These powers are described in FTC Policy Statements on Deception and Unfairness.


99. A trade practice is unfair if it “causes or is likely to cause substantial injury to consumers which is not reasonably avoidable by consumers themselves and not outweighed by countervailing benefits to consumers or to competition.”

One of the major complaints has been that Facebook’s new privacy settings reveal too much of a user’s personal information without giving them adequate controls to effectively manage the security of their personal information.  This is one point that I have to immediately disagree with. Facebook has always given some very granular controls on who can access every piece of information that you post. In fact, it gives you the ability to set specific settings for specific friends that you have… So if you want to allow your college friends to see certain pictures, but not your boss, you can do that. The argument has been made that these settings are too confusing or too hard for users to find or modify… To that I say: No, not really… And if they are then too bad.

Alright, that may have been a little bit harsh, but hear me out. I’ve been using Facebook for a good four years, and one of the first things I did when I started was modified my privacy settings so that I was pretty much invisible. My friends hated it because they couldn’t find me easily, and if they some how could they couldn’t even add me as a friend let alone see any of my information. This also meant that no one I didn’t want to find me or see my information could either. So, to be perfectly honest, when this recent migration occurred, I was fine. The system prompted me to “share my information with everyone” or keep my old settings. I kept my old settings and I was fine.

What I’m getting at is if users are going to get on the internet and share their information with websites such as Facebook, they should understand how to control such tools. Facebook is a company. Companies exist to make money. This particular company makes money by selling information (or advertising to you). While they haven’t made the best decisions in the past in regards to privacy, they’ve done a pretty good job of giving you control of who can actually access this information. So if you want to post pictures of you getting plastered on the company dollar, or engaging in illicit activities, then it is your job to make sure you control who has access to that information. If you decide to post on a friend’s wall about some illicit activity that you engaged in, and they don’t have their information blocked, then you’re the one that’s really at fault… not Facebook.

I really do not see this complaint going to far because the amount of benefit this site provides (as many users will attest) outweighs the injuries that its users incur due to it. Additionally, the injuries are self-inflicted. The argument comes up about the API and its access… If you have your controls set right the most that the API can obtain about you is your (Name, Profile Picture, Gender, Current City, Networks, Friend List, and Pages). Keep your profile picture clean. Other than that, the rest of the information is publicly available information. Any quick Google search could give me most of that and more “damaging” information.

The fact of the matter is, the responsibility of personal privacy resides with the user. If you have a problem with the way a site operates, then do not post your information on it. If you cannot read FAQs that are posted on a site that tell you how to protect your information, do not post it. Social Networking sites were not built for privacy. They were built for allowing users to network, and they do the best they can to help facilitate this… Okay, while trying to make money on the side, but can you really blame them. Here’s a thought. If you have such an issue with how Facebook handles privacy, stop using the site, and build your own that handles privacy in the most effective way.

I am not writing this because I firmly agree with all of Facebook’s privacy policies (or their others for that matter), nor do I work for Facebook, or support it 100%. I’m writing this because users need to start taking responsibility for the privacy of their own information on the internet. You can expect a bank not to release your current balance to public sources, or a hospital to not release your medical records, but when you post information on a social networking website that has specific terms and agreements about what can and cannot be done with the information you post, and how you control it, the responsibility lies with you.

Posted in Privacy, security, social networking | Tagged , , , , , , , , , | Leave a comment

The Importance of Engineering in Undergraduate Computer Science Programs

Recently I’ve been thinking heavily about the Computer Science program at Tech due to a number of changes that are quickly making their way into the curriculum. One of the more interesting decisions for changing the program that the Computer Science Department at Virginia Tech made was moving the Department into the College of Engineering. While the full potential of this move has not yet been realized, it was a move that has tremendous advantages for not only the department and its students, but also the industry and academia on a whole.

The advantages gained from such a move primarily surround the principles of Software Engineering. Software Engineering is a term that unjustly gets little to no credit among academics in the field. A large number consider it to be an abomination of sorts with no real meaning or value. They take it to be just one of those buzz words that is thrown about these days as “Web 2.0″ and the like have been in the past. The fact of the matter is Software Engineering is a term that is far too often overlooked, particularly in academia, which is a trend that needs to stop if we would like to see growth in the field of Computer Science on a whole.

The industry has changed substantially since the early 1960s. We are no longer in an era where the field of Computer Science is completely dissociated from the rest of the world. Every business and organization out there sees the tremendous amount of value in having technology available to make jobs more efficient by increasing productivity through the elimination of complex or tedious tasks from the agendas of workers. It has thus become more important that the gurus of the Computer Science field fall into professions that require they understand business and customer needs. The backbone of our economy lies on the efficiency and productivity of our businesses, and by transitive property, at the fingertips of those gurus.

All this being said, it is a wonder that members of academia refuse to accept software engineering as a part (let alone a major component) of the Computer Science discipline. In fact, there are a number of papers and articles that have written off Software Engineering as a “pseudo science”.  In his article titled “What Is Software Engineering”[1], William Curran, an Associate Professor of Computer Science at Southeastern Louisiana University, states, “A software engineer is no more an engineer than a novelist is a word engineer.” This statement is wildly false. An explanation of this claim requires an answer to the fundamental question that Curran asks in the title of his article; what is software engineering?

Providing an answer to the question on what Software Engineering actually is requires a firm definition of what engineering is in its broadest terms. Engineering is a multifaceted discipline in which science and mathematics are applied to practical problems. This definition states in a fairly explicit manner that engineering is applied science. As software is a product of Computer Science, Software Engineering is unquestionably the application of Computer Science to practical problems. It is important to define Software Engineering deliberately in terms of Computer Science in order to establish Software Engineering as subset of Computer Science. Establishing this hierarchy prevents the “tainting” of the field that some believe occurs when using the term Software Engineering.

This structure leaves us two branches of Computer Science. One branch is for those who focus on theory and dive into research developing the foundation that is Computer Science, while the other branch focuses on the more practical side of the field. A more complete understanding of this requires a more in depth look into what a Software Engineer actually does. A Software Engineer is one who develops software to make something more efficient or to solve a particular problem that could not feasibly be solved by a human in a reasonable amount of time. It would be a false assumption to say that the Software Engineer just jumps straight into developing this software. That is what “code-monkeys” are for.  The engineering part of the Software Engineer’s job is to define and solve a problem. This is done through standard engineering methods, which include defining the problem, designing a potential solution to the problem (without actually implementing), considering the implications, and redesigning the solution until the best possible solution is reached.

A Software Engineer does all of these things the same way any other engineer would: by reaching back to the science. There of course factors beyond the pure science that the Software Engineer has to consider such as risk management, and human interaction, but this is no different from a Chemist designing a vaccine to cure a particular disease. At the end of the day all of these products are meant to benefit people, and if there is more loss than gain, then the engineer has failed in solving the problem they sought to tackle. Software Engineering is therefore not a pseudo science, but a practical science. Every technique that a Software Engineer employs to actually develop the software and solve the problem at hand reaches back to the science. It does not cheapen the work of those in the field of Computer Science or the field itself, but in fact enhances both. Knowledge without application is useless. This is not to cheapen the value of the Science by any means. Software Engineering depends on the Science, but the Science also requires some form of application to be beneficial.

The flaw in most Computer Science programs is that they produce two types of students: Students that can code until their fingers come off or students that appreciate the value of the theory and research and decide to continue developing the field. There is absolutely nothing wrong with these two products, but the fault is these programs lack the creation of a third type of student. That is to say they do not create Software Engineers. The value in a Software Engineer is that they can efficiently solve problems and implement them. You can give a developer any specification for a product and they can churn out code and produce a product that works, but it is the Software Engineers that you can hand a problem and leave it to them to develop a specification for a product and implement a solution that not only works, but works in the most efficient manner.

A significant number of undergraduates who receive their Bachelor’s Degree in Computer Science will head straight towards the industry. At current, the industry is flooded with developers who write brilliant code, but lack the ability to solve the problems that industry hands to them. The System Architects and other positions of the like are reserved for those who have gone on to higher education and received their Masters or Doctorate Degrees in Computer Science because they are the ones who know how to solve problems. Computer Science programs at Universities need to shy away from this trend. Every single Computer Science graduate, whether they are in an undergraduate program or a graduate program should leave with the ability to not only develop software, but also solve problems. This is achieved by teaching engineering methods in CS Programs.

Some would argue that this would flood the market with a number of Engineers who disagree on ideas or cheapen the value of a graduate degree. What it actually does is provides greater opportunity for advancement in the field of Computer Science. The more challenges that are solved, the harder the challenges become. Having great minds in the industry allows for the potential of these challenges being solved. Additionally, facilitating an engineering mindset throughout a Computer Science curriculum will also increase the number of students who remain on the side of academia due to their commitment to tackling the most challenging problems that the field faces at any given time.

Simple changes can be made to Computer Science programs to focus more on the practical application of the knowledge gained through analysis and research. Furthermore, an engineering approach to research and analysis enhances the value of the knowledge obtained. If members of academia remove the mindset that applying engineering methodology to Computer Science devalues the Science, the programs will begin to produce better engineers to face not only the problems of today, but the problems of tomorrow as well. The Computer Science Department at Virginia Tech has made a great first step in this direction, but there needs to be more of a movement by the entire academic community for the benefits to truly be realized.

[1] – http://www.acm.org/ubiquity/views/b_curran_1.html

Posted in Programming, Tech Ed, VT | Tagged , , , , , , , , , , | Leave a comment