TAG | cloud

Boris Livshutz

An Introduction to the Data Cloud

As data has grown exponentially at many sites, companies have been forced to horizontally scale their data.  Some have turned to sharding of databases like Postgres or MySQL, while others have switched to newer NoSQL data systems.  There have been many debates in the last few years about SQL vs. NoSQL data management systems and which is better.  What many have failed to grasp, though, is how similar these systems are and how complex they both are to run in production in high scale.

Both of these systems represent what I call a Data Cloud. This Data Cloud is logical data set spread across many nodes.  While developers have heated debates about which system is better and how to design code around it, those in DevOps usually struggle with very similar issues because the two systems are mostly the same.  Both systems

  • Run across many nodes with large amounts of data flowing between them and from/to the application
  • Strain both the hardware of all nodes, and the network connecting them
  • Maintain duplicate data across nodes for fault tolerance, and must have failover ability
  • Must be tuned on a per node and cluster-wide bases
  • Must allow for growth by adding additional nodes.

Running this Data Cloud in production presents a new set of challenges for DevOps, many of which are not well understood or addressed.  One of the main challenges is the management and monitoring of these systems, for which few (if any) tools or products exist at this time.

When systems were smaller and you ran a single Database in production, you probably had all the necessary systems in place.  With a plethora of products for Management, monitoring, visualizing data, and backups, it was not hard to be successful and meet your SLAs.

But now all this is much more complex once you move into the world of the Data Cloud.  Now you have a large number of nodes, all representing the same system and still needing to meet the same SLAs as the old simple DB from before.  Let us look at the challenges for running a production Data Cloud successfully.

Capacity Planning

Do you know how many nodes you need?  How many nodes do you put in each replica set?  How much latency and throughput do you need in your network for the nodes to communicate fast enough?  What is the ideal hardware to use for each node to balance performance with costs?

Monitoring

How do you monitor dozens, hundreds or even thousands of nodes all at once?   How do you get a unified view of your data cloud, and then drill down to the problem nodes?   Are there even any off-the-shelf monitoring tools that can help?  Your old monitoring tool won’t be very useful anymore unless you are willing to look at every node one by one to see what is going on there.

Alerting

How do you set up a common set of alerts across all nodes?  And how do you keep your alert thresholds in sync as you add nodes and remove them?   More importantly, even assuming you have alerting in place,  once staff receives critical alerts, how will they know where to find the troubled node in the massive cloud, or whether it’s a node level  issue or more global in nature?  This must be done quickly during critical outages.

Data Visualization

How does your staff view the data when it is distributed?  In case of data inaccuracy, how can they quickly identify the faulty nodes and fix up the data?

Performance Tuning

As performance degrades, how do you troubleshoot and identify the bottlenecks?  How do you find which nodes by be the cause of the problem?  How do you improve performance across all the nodes.

Data Cloud Management

How do you back up all the data while consistently tracking which nodes were backed up successfully and when?  How do you make schema changes across all the nodes in one consistent step without breaking your app? And how do you make configuration changes on various nodes or across all nodes?  And how do you track the configurations of each node and keep them consistent across your system?

By now you should see that there is a lot to think about before endeavoring to launch a production Data Cloud.  Too many companies focus all their energies on deciding which DB or NoSQL system to use and developing their apps for it.  But that might turn out to be the lesser of your challenges once you struggle to put the system into production.  Be sure you can answer all the questions I have listed above before your launch.

Boris.

Link to this post:

, , , , , ,

There’s been a generational shift in how Java enterprise applications are created: they have been broken down from a monolithic architecture into multiple services, and they’re highly interconnected and distributed. How can Java developers and Operations teams adapt to these changes?

This keynote will discuss the 4 Big Things that Java professionals need to design for now:

  • Cloud: Most applications built will have some part of its service in the cloud
  • Big Data: With the advent of NoSQL, Hadoop, and distributed caches, how should we now approach the data layer?
  • Agile Development & Operations: Developers won’t just be responsible for the code, but how it’s deployed. How does that affect the DevOps relationships?
  • Failure is an option: Distributed systems won’t just invite but demand failure, so how can failure become part of the initial design?

This talk will present recommended strategies and approaches for these new design imperatives.

You can watch the keynote here

 

Link to this post:

, , , , , , , ,

App Man

App Man has Beers with Santa Claus

Hey Santa Claus, what’s happening man?

Well, it’s that time of year again where unfortunately I have to work. Gift orders are up 50% and my elves are working their socks off right now stocking my warehouses for the big day. I honestly don’t know how I would have coped without my new ElfOps team–they’ve been elastic and fantastic thanks to our new AWS hosted applications. The bad news is that my applications have processed record orders in 2011 so I’ll have to work harder. Frankly, this sucks.

Come again? Santa is using the cloud?

Isn’t everyone these days? I mean what else can I possibly do when my orders and subscriptions double each year? If Santa can’t scale, I spend my vacation reading angry letters from parents about how I made their kids cry and upset on Christmas Day. Forget that. I’d rather sit on a beach in Hawaii drinking Mojito’s catching a nice tan. In fact, I should be able to upgrade myself to the Four Seasons in Hawaii in 2012 due to the money I’ve saved by migrating to the cloud. You simply wouldn’t believe how much my own data center was costing me, not to mention those expensive IT Elf consultants I had to bring in. I’m now a lean mean parcel delivering machine and it’s all thanks to the cloud.

Read the Full Post…

Link to this post:

, , , , , , , , , ,

We recently finished conducting our annual Application Performance Management survey. Over 250 IT professionals participated, and they shared insights such as:
- Many Ops and Dev teams are anticipating growth in their applications by 20% or more
- Over 50% are planning to move to the cloud, and are architecting brand-new applications to be cloud-ready
- Most teams are using log files to monitor application performance, rather than an Application Performance Management (APM) tool.

We’ll release the full report soon, but here’s an infographic that summarizes some of the main findings:

AppDynamics Inforgraphic - Storm Clouds in 2012

Embed this image on your site:

What I found personally surprising was the heavy reliance on log files. When you’re troubleshooting distributed architectures, time is of the essence–and there’s no way to cut your MTTR down when you’re relying on log files to identify root cause.

In fact, there’s only one guy who ever made using a log file look cool:

And I think we can all agree that’s a pretty unique use case.

We’ll have the full survey results available soon.

 

 

Link to this post:

, , , , , , , , , , , , , , , , , , , ,

I’m fed up of reading about Cloud outages, largely because all applications are created and managed by the most dangerous species on the planet – the human being. Failure is inevitable in regards to everything the human being creates or touches, and for this reason alone I see no news in seeing the word “outage” in IT articles with or without Cloud mentioned.

What gets me the most is that applications, infra-structure and data centers were slowing down and blowing up long before “Clouds” became fashionable. They just didn’t make the news every other week when applications resided in “data-centers”–ah, the good old days. Just ask anyone who works in operations or help desk/app support whether they’ve worked a 38 hour week; I guess the vast majority will either laugh or slap you. If everything worked according to plan, IT would be a really dull place to work, help desk would be replaced with OK desk, and we’d have nothing to talk about in the office or pub.

Read the Full Post…

Link to this post:

, , , , , , , , , , , , ,

If you are running mission-critical applications and do not have a strategy to deal with failure, you are putting your whole organization at risk. You may think that your application cannot fail, but at some point everything fails.  It may not be the software running your application that fails – it could be the hardware, the network, or even a natural disaster in your area that causes your application to go down. In case of such failure, no matter how rare, your customers will still expect the same level of service, not to mention preservation of their data.  Without a failover strategy and a tested backup infrastructure you will be out of service for an unknown period of time, which will lead to angry customers and loss of revenue.

Most of you have some failover strategy in place. I’m sure many of you have spent large amounts of time and money ensuring that your application is resilient to failure because you understand your app’s importance. But you may still be missing one key component, without which you are still at risk. That component is monitoring, or more specifically, Application Performance Management (APM). While mission-critical applications rely on an APM system to help monitor application performance and health, APM is often forgotten on the failover systems. APM needs to goes hand-in-hand with any failure testing plan to ensure that your company’s strategy will work in case of a real emergency.

Read the Full Post…

Link to this post:

, , , ,

Welcome Jeremy, lets start with a quick introduction of who you are and what you do at EMC.
I run marketing at EMC working for Joe Tucci, our CEO.  Been there about 18 months.

And what beer will you be drinking tonight?
That new Bud in bottles that includes Lime – why did no one think of that until now ?  I hate putting that real lime in my beer and squirting it all over my shirt.  American innovation leads the way again.

So this Cloud meets Big Data stuff, what’s all that about?
Cloud has emerged as the biggest disruptive force in IT for at least the last decade.  And maybe ever.  Complexity in IT departments is at a breaking point, so they are re-transforming their infrastructure around virtualized servers, storage, and networking, transforming their applications using frameworks like Spring and Ruby and transforming access using a myriad of consumer devices such as the iPad.   Once this transformation is complete, IT will be able to run the way God intended it to run – as an agile, efficient service.

Read the Full Post…

Link to this post:

, , , , , , , , , ,

“APM Game Changer” was the headline yesterday with the news that Compuware acquired Dynatrace Software for $256 million. With the APM market worth an estimated $6 billion, it’s not really a surprise to see such a transaction (no pun intended) within the APM market. In fact, over the past ten years, there have been several large transactions made by established IT vendors trying to claim a piece of the APM market.

If we take a look at several of those APM acquisitions (specifically Deep Dive) and present the facts, you can draw some interesting conclusions.

Acquirer

Acquired
Company

Acquired
Product

Year
Founded

Year
Acquired
 Compuware  Dynatrace  Dynatrace  2005  2011
 Oracle  ClearApp  QuickVision  2002  2008
 BMC  Identify  AppSight  1996  2006
 CA  Wily Tech  Introscope  1998  2006
 Compuware  DevStream  Vantage Analyzer  2002  2004
 IBM  Cyanea  ITCam  2001  2004
 OpNet  Altaworks  Panorama  1999  2004
 Veritas  Precise  I3  1990  2003
 HP/Mercury  Performant  Diagnostics  2000  2003
 Quest  Sitraka  Foglight  1989  2002

Read the Full Post…

Link to this post:

, , , , , , , , , , , ,

App Man

NoSQL meets SQL

App Man

Web Ops to Dev Ops in Mountain View

After Velocity came DevOpsDays in Mountain View. I was hoping to give my feet and liver a rest after two consecutive days at Velocity. When I woke up at 7:30 a.m. the next day my body was throwing lots of OutOfEnergyExceptions, and it did cross my mind whether another 2 day event on Dev and Ops might be overkill. I got out of bed, hopped on a Caltrain, and made it to the DevOpsDay just in time for John’s “State of Union” speech.

John gave a great intro into how the DevOps movement started and its progress since inception back in 2009. Within just two years, Gartner is now recognizing the movement and its relevance within IT. John then talked about agile companies like FlickR who were doing a wacky 10 deployments a day back in 2009—and more recently WealthFront, who are now up to 50-100 deployments a day. It’s becoming clear Agile Development is driving the need for Agile Infrastructure and Operations. However, whilst the Cloud concept helps organizations become more agile, the idea that Ops will simply go away or be taken over by Dev is just nonsense (John put it a bit more bluntly than that, which was well received). The Business and Dev still need Ops, no matter what.

Read the Full Post…

Link to this post:

, , , , ,

Older posts >>