11/4/09

Cloud Computing: New Concept of Distributed Computing


Cloud Computing is a new concept of distributed computing which is a major revolution in the world of IT. Within the last few years, many emerging technology Buzzword. Gartner (www.gartner.com) a leading research institute, in its annual report, 2008, released "Gartner's Hype Cycle Special Report 2008". In the category of emerging technologies, Gartner stated that "Cloud Computing" will reach its maturity level of about 2 to 5 years to come. Based on these reports, this time, Cloud Computing is still regarded as a technology buzzworld still face the problem of market adoption.

In general, Cloud Computing can be described as follows:
• Computing and information-related technologies are no longer needed or owned by the company and stored in the company's own environment. This also means that any company can use business applications, without having to buy a license, buying servers and storage needs, purchase a database license, and buy equipment and data communications networks. Company is contacted and ordered the company applications cloud computing provider, signed a contract and register the user who will use the application, and then applications can already be used by users with access to the Internet via a browser that has been set.

Apart from the many problems of technology adoption in the marketplace and its mature yet these solutions, while providing cloud computing promises irresistible to many companies.

In the case of technology ownership, the company actually has two choices ie, have the technology or hiring technology. For companies, the cost of rent will cause what we call OPEX - Operational Expenditure, which is routine in nature and tend to stay. While the cost of buying will cause CapEx - Capital Expenditure, its only once, when bought, but is usually accompanied by other costs such as maintenance and capital depreciation costs. Many companies prefer OPEX costs than CapEx. Based on the cost concepts above, cloud computing is also very inclined to use OPEX cost model.

There are some things that is the reason.
1. Companies that adopt this technology does not need to buy or have the technology products, such as servers and storage.

2. Companies that buy the solution-based business applications cloud computing-based business does not need to buy a license.

3. The company also does not need to have an IT department that monitors servers, storage, networking, and business applications.

In general, cloud computing is the latest form of the combination of the three concepts of technology: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Cloud Computing also has characteristics of the three concepts of the technology: Grid Computing, Utility Computing, and Autonomic Computing. All IT services based cloud computing is usually a business application services online. It also illustrates well that this cloud computing service infrastructure that tend to have complex behind it.

More Info:
•Wikipedia – Cloud Computing
•Gruman,Galen(2008-04-07). “What cloud computing really means”.InfoWorld
Read More..

Map Reduce with Hadoop


Hadoop was introduced in late 2005 as part of the project Nutch (web search engine) and Lucene (text search library) by the Apache Software Foundation. In March 2006, Hadoop developed into a separate project. Hadoop own creation, inspired by techniques MapReduce and Google File System that is used by Google.
Hadoop is a software framework that enables distributed computing of large data. Hadoop is designed to be able to work reliably - to overcome the failure of computing elements and storage, efficient - capable of processing data in parallel, scalable - able to reach thousands of machines and petabytes of data.


MapReduce
MapReduce is a framework and programming model intended for large data processing in parallel. MapReduce consists of two main functions, the Map and Reduce.

• Map function takes the input data and turn it into a list of pairs key / value. In essence, aims to map the Map inputs into a set of key / value.

• Reduce combining function keys / values that have been grouped according to key. In essence, Reduce aimed at reducing key / value into a set of values.

Implementation of library programs, MapReduce operation can be divided into seven stages:

1. Solving the input file into M pieces (such as 64 MB). MapReduce library and distribute copies to all the search engines.

2. One of the copies of the program acts as a master, the rest is the worker who receives the task from the master. Determined number of M machines that served as a map and a reduce R machine.

3. Worker assigned the task of reading the pieces of the input file, sorting out the key / value and continue to function with the result map will be stored in memory.

4. Periodically, the results are stored in memory to local disk, R is divided into sections, and reported its location to the master. Next the master will continue storage location on the worker in charge to reduce.

5. Worker reduce using remote procedure call to read the map. The results of key / value and sorted according to keys that are combined in groups, it serves to facilitate the functions generally reduce the work based on the key.

6. Reduce worker then checked the results of key / value is already sorted, and in every unique key submit key and set the value to the function Reduce. The results of the Reduce function is stored into the final output file.

7. After the entire map and reduce task completion, the master returns the next process in the user program. In this case is the result of the call MapReduce.

Hadoop uses one JobTracker and many in the computing TaskTracker spread. Applications sent to the MapReduce task JobTracker, which then coordinates and leave the task to the Task Tracker. Hadoop perform optimization on a network using the computing processes closer to the data.

Develop a distributed storage Hadoop called Hadoop Distributed File System (HDFS). HDFS serves to split the file into blocks of data and pass it on every machine in the cluster.

Hadoop framework is created and run on the Java platform. Most of the programs and libraries built using Java, except for some components of non-Java. Making programs for Hadoop also use Application Programming Interface (API) that is provided.

Hadoop usage for implementation in the real world can be found at major online service providers such as Yahoo!, Facebook, Baidu, and America Online. Yahoo! using Hadoop for Webmap applications in indexing web sites, to reach 1 trillion links. As for facebook, also has been using Hadoop for data processing and storage.

More info:
http://hadoop.apache.org
http://en.wikipedia.org/wiki/Hadoop
Read More..