Alfred V. Aho, Bell Laboratories, Murray Hill, New. Jersey This book presents the data structures and algorithms that underpin much of today's computer. Design and Analysis of Computer Algorithms;. Addison-Wesley, Aho A.V.; Hopcroft J.E.; Ullman J.D.; Data. Structures and Algorithms. Computer Science homeworks / projects / backups. Contribute to sunnypatel/ Classwork development by creating an account on GitHub.
|Language:||English, French, Hindi|
|ePub File Size:||16.33 MB|
|PDF File Size:||8.86 MB|
|Distribution:||Free* [*Registration needed]|
AHO, A.V., J. E. HOPCROFT, J. D. ULLMAN: Data Structures and Algorithms. Addison‐Wesley Amsterdam S. W. Issel · Search for more papers by this. Data Structures and Algorithms. 6. Recommended readings. Alfred V. Aho, Jeffrey D. Ullman, John E. Hopcroft, Data Structures and. Algorithms, Addison Wesley. The authors' treatment of data structures in "Data Structures and Algorithms" is unified by an informal notion of "abstract data types," allowing readers to compare.
There are two ways to handle this case. The first way is to find the largest key y from the left subtree.
Replace the contents of node x with y and delete node y. Note that the node y can have at most one child. In the tree of Figure 2, say we desire to delete The largest key in the left subtree 5 is 17 there is only one node in the left subtree.
We replace 25 with 17 and delete node 17 which happens to be a leaf. The second way to handle this case is to identify the smallest key z in the right subtree of x, replace x with z, and delete node z. In either case, the algorithm takes time O h. The operation Find-Min can be performed as follows. We start from the root and always go to the left child until we cannot go any further. The key of the last visited node is the minimum. In the tree of Figure 2, we start from 12, go to 9, and then go to 7.
We realize 7 is the minimum. This operation also takes O h time. If we have a binary search tree with n nodes in it, how large can h get? The value of h can be as large as n. Consider a tree whose root has the value 1, its right child has a value 2, the right child of 2 is 3, and so on. This tree has a height of n. Thus we realize that in the worst case even the binary search tree may not be better than an array or a linked list.
But fortunately, it has been shown that the expected height of a binary search tree with n nodes is only O log n. This is based on the assumption that each permutation of the n elements is equally likely to be the order in which the elements get inserted into the tree. Thus we arrive at the following Theorem. Theorem 2. In the worst case, the operations might take O n time each. There are a number of other schemes based on binary trees which ensure that the height of the tree does not grow very large.
These schemes will maintain a tree height of O log n at any time and are called balanced tree schemes. Examples include red- black trees, AVL trees, trees, etc. These schemes achieve a worst case run time of O log n for each of the operations of our interest. We state the following Theorem without proof. We just illustrate one example.
Consider the problem of sorting. Given a sequence of n numbers, the problem of sorting is to rearrange this sequence in non- decreasing order. This comparison problem has attracted the attention of numerous algorithm designers because of its applicability in many walks of life. We can use a priority queue to sort. Let the priority queue be empty to begin with. We insert the input keys one at a time into the priority queue.
This involves n invocations of the 6 Insert operation and hence will take a total of O n log n time c. Fol- lowed by this we apply Delete-Min n times, to read out the keys in sorted order. This will take another O n log n time as well. Thus we have an O n log n -time sorting algorithm. This algorithm can be specified as follows. It is natural to describe any algorithm based on divide-and-conquer as a recursive algorithm i.
The run time of the algorithm will be expressed as a recurrence relation which upon solution will indicate the run time as a function of the input size. These multiplications are performed recursively. Cppersmith and Winograd have proposed an algorithm that only takes O n2. This is a complex algorithm details of which can be found in the reference supplied at the end of this article. The problem is to check if x is a member of a[ ].
If so, the problem has been solved. We have already seen one such algorithm in Section 2. We assume that the elements to be sorted are from a linear order. If no other assumptions are made about the keys to be sorted, the sorting problem will be called general sorting or comparison sorting. In this section we consider general sorting as well as sorting with additional assumptions. The first algorithm is called the selection sort. Let the input numbers be in the array a[1 : n].
We first find the minimum of these n numbers by scanning through them. Let this minimum be in a[i]. We exchange a and a[i]. An asymptotically better algorithm can be obtained using divide-and-conquer. This algorithm is referred to as the merge sort. Sort each half recursively and finally merge the two sorted subsequences.
The problem of merging is to take as input two sorted sequences and produce a sorted sequence of all the elements of the two sequences. Compare q1 and r1. Clearly, the minimum of q1 and r1 is also the minimum of X and Y put together. Output this minimum and delete it from the sequence it came from. In general, at any given time, compare the current minimum element of X with the current minimum of Y , output the minimum of these two, and delete the output element from its sequence.
Proceed in this fashion until one of the sequences becomes empty. At this time output all the elements of the remaining sequence in order. Whenever the above algorithm makes a comparison, it outputs one element either from X or from Y. Theorem 4. In particular, we assume that the keys are integers in the range [1, nc ], for any constant c. This version of sorting is called integer sorting. We make use of an array a[1 : m] of m lists, one for each possible value that a key can have. These lists are empty to begin with.
We look at each input key and put it in an appropriate list of a[ ]. We have basically grouped the keys according to their values. Next, we output the keys of list a, the keys of list a, and so on. This may not be acceptable since we can do better using the merge sort. Say we are interested in sorting n two-digit numbers.
One way of doing this is to sort the numbers with respect to their least significant digits and then to sort with respect to their most significant digits. This approach works provided the algorithm used to sort the numbers with respect to a digit is stable.
We 10 say a sorting algorithm is stable if equal keys remain in the same relative order in the output as they were in the input. Note that the bucket sort as described above is stable. If the input integers are in the range [1, nc ], we can think of each key as a c log n- bit binary number.
In stage i, the numbers are sorted with respect to their ith most significant log n-bits. This means that in each stage we have to sort n log n-bit numbers, i. We get the following Theorem. A simple algorithm for this problem could pick any input element k, partition the input into two — the first part being those input elements that are less than x and the second part consisting of input elements greater than x, identify the part that contains the element to be selected, and finally recursively perform an appropriate selection in the part containing the element of interest.
This algorithm can be shown to have an expected i. In general the run time of any divide- and-conquer algorithm will be the best if the sizes of the subproblems are as even as possible.
In this simple selection algorithm, it may happen so that one of the two parts is empty at each level of recursion. So, even though this simple algorithm has a good average case run time, in the worst case it can be bad. We will be better off using the merge sort. Say we are given n numbers. We group these numbers such that there are five numbers in each group.
Find the median of each group.
Find also the median M of these group medians. Having found M, we partition the input into two parts X1 and X2. X1 consists of all the input elements that are less than M and X2 contains all the elements greater than M. We can also count the number of elements in X1 and X2 within the same time. This can be argued as follows.
Let the input be partitioned into the groups G1 , G2 ,.
Assume without loss of generality that every group has exactly n 5 elements. There are 10 groups such that their medians are less than M. In each such group there are at least three elements that are less than M. Therefore, there 3 are at least 10 n input elements that are less than M. This in turn means that the 7 size of X2 can be at most 10 n. Similarly, we can also show that the size of X1 is no 7 more than 10 n. Thus we can complete the selection algorithm by performing an appropriate se- lection in either X1 or X2 , recursively, depending whether the element to be selected is in X1 or X2 , respectively.
Then it takes T n5 time to identify the median of medians M. This can be proved by induction. Theorem 5. Three different measures can be conceived of: the best case, the worst case, and the average case.
Typically, the average case run time of an algorithm is much smaller than the worst case. While computing the average case run time one assumes a distribution e. If 12 this distribution assumption does not hold, then the average case analysis may not be valid. Is it possible to achieve the average case run time without making any assump- tions on the input space?
Randomized algorithms answer this question in the affir- mative. They make no assumptions on the inputs. A coloring of a graph is an assignment of a color to each vertex of the graph so that no two vertices connected by an edge have the same color.
It is not hard to see that our problem is one of coloring the graph of incompatible turns using as few colors as possible.
Data structures and algorithms
The problem of coloring graphs has been studied for many decades, and the theory of algorithms tells us a lot about this problem. Unfortunately, coloring an arbitrary graph with as few colors as possible is one of a large class of problems called "NP-complete problems," for which all known solutions are essentially of the type "try all possibilities. With care, we can be a little speedier than this, but it is generally believed that no algorithm to solve this problem can be substantially more efficient than this most obvious approach.
We are now confronted with the possibility that finding an optimal solution for the problem at hand is computationally very expensive. We can adopt Fig. Graph showing incompatible turns. Table of incompatible turns.
If the graph is small, we might attempt to find an optimal solution exhaustively, trying all possibilities. This approach, however, becomes prohibitively expensive for large graphs, no matter how efficient we try to make the program. A second approach would be to look for additional information about the problem at hand.
It may turn out that the graph has some special properties, which make it unnecessary to try all possibilities in finding an optimal solution. The third approach is to change the problem a little and look for a good but not necessarily optimal solution. We might be happy with a solution that gets close to the minimum number of colors on small graphs, and works quickly, since most intersections are not even as complex as Fig.
An algorithm that quickly produces good but not necessarily optimal solutions is called a heuristic. One reasonable heuristic for graph coloring is the following "greedy" algorithm. Initially we try to color as many vertices as possible with the first color, then as many as possible of the uncolored vertices with the second color, and so on.
To color vertices with a new color, we perform the following steps. Select some uncolored vertex and color it with the new color.
Scan the list of uncolored vertices. For each uncolored vertex, determine whether it has an edge to any vertex already colored with the new color. If there is no such edge, color the present vertex with the new color. This approach is called "greedy" because it colors a vertex whenever it can, without considering the potential drawbacks inherent in making such a move. There are situations where we could color more vertices with one color if we were less "greedy" and skipped some vertex we could legally color.
For example, consider the graph of Fig. The greedy algorithm would tell us to color 1 and 2 red, assuming we considered vertices in numerical order. A graph.
As an example of the greedy approach applied to Fig. However, we can colorDC blue. Now we start a second color, say by coloring BC red. Each other uncolored vertex has an edge to a red vertex, so no other vertex can be colored red.
Even if we suspect our problem can be solved on a computer, there is usually considerable latitude in several problem parameters. Often it is only by experimentation that reasonable values for these parameters can be found. If certain aspects of a problem can be expressed in terms of a formal model, it is usually beneficial to do so, for once a problem is formalized, we can look for solutions in terms of a precise model and determine whether a program already exists to solve that problem.
Even if there is no existing program, at least we can discover what is known about this model and use the properties of the model to help construct a good solution. Almost any branch of mathematics or science can be called into service to help model some problem domain. Problems essentially numerical in nature can be modeled by such common mathematical concepts as simultaneous linear equations e. Symbol and text processing problems can be modeled by character strings and formal grammars.
Problems of this nature include compilation the translation of programs written in a programming language into machine language and information retrieval tasks such as recognizing particular words in lists of titles owned by a library. Once we have a suitable mathematical model for our problem, we can attempt to find a solution in terms of that model.
Our initial goal is to find a solution in the form of an algorithm, which is a finite sequence of instructions, each of which has a clear meaning and can be performed with a finite amount of effort in a finite length of time. An integer assignment statement such as x: In an algorithm instructions can be executed any number of times, provided the instructions themselves indicate the repetition.
However, we require that, no matter what the input values may be, an algorithm terminate after executing a finite number of instructions. Thus, a program is an algorithm as long as it never enters an infinite loop on any input.
There is one aspect of this definition of an algorithm that needs some clarification. We said each instruction of an algorithm must have a "clear meaning" and must be executable with a "finite amount of effort.
Top 10 Algorithms and Data Structures for Competitive Programming
It is often difficult as well to prove that on any input, a sequence of instructions terminates, even if we understand clearly what each instruction means.
By argument and counterargument, however, agreement can usually be reached as to whether a sequence of instructions constitutes an algorithm. The burden of proof lies with the person claiming to have an algorithm. In Section 1. In addition to using Pascal programs as algorithms, we shall often present algorithms using a pseudo-language that is a combination of the constructs of a programming language together with informal English statements.
We shall use Pascal as the programming language, but almost any common programming language could be used in place of Pascal for the algorithms we shall discuss.
The following example illustrates many of the steps in our approach to writing a computer program. Example 1. A mathematical model can be used to help design a traffic light for a complicated intersection of roads. To construct the pattern of lights, we shall create a program that takes as input a set of permitted turns at an intersection continuing straight on a road is a "turn" and partitions this set into as few groups as possible such that all turns in a group are simultaneously permissible without collisions.
We shall then associate a phase of the traffic light with each group in the partition. By finding a partition with the smallest number of groups, we can construct a traffic light with the smallest number of phases. For example, the intersection shown in Fig.
Roads C and E are oneway, the others two way. There are 13 turns one might make at this intersection.
Citations per year
Some pairs of turns, like AB from A to B and EC , can be carried out simultaneously, while others, like AD and EB , cause lines of traffic to cross and therefore cannot be carried out simultaneously. The light at the intersection must permit turns in such an order that AD and EB are never permitted at the same time, while the light might permit AB and EC to be made simultaneously.
We can model this problem with a mathematical structure known as a graph. A graph consists of a set of points called vertices , and lines connecting the points, called edges. For the traffic intersection problem we can draw a graph whose vertices represent turns and whose edges connect pairs of vertices whose turns cannot be performed simultaneously. For the intersection of Fig.
The graph can aid us in solving the traffic light design problem. A coloring of a graph is an assignment of a color to each vertex of the graph so that no two vertices connected by an edge have the same color. It is not hard to see that our problem is one of coloring the graph of incompatible turns using as few colors as possible. The problem of coloring graphs has been studied for many decades, and the theory of algorithms tells us a lot about this problem.
Unfortunately, coloring an arbitrary graph with as few colors as possible is one of a large class of problems called "NP-complete problems," for which all known solutions are essentially of the type "try all possibilities. With care, we can be a little speedier than this, but it is generally believed that no algorithm to solve this problem can be substantially more efficient than this most obvious approach.
We are now confronted with the possibility that finding an optimal solution for the problem at hand is computationally very expensive. We can adopt. If the graph is small, we might attempt to find an optimal solution exhaustively, trying all possibilities. This approach, however, becomes prohibitively expensive for large graphs, no matter how efficient we try to make the program. A second approach would be to look for additional information about the problem at hand.
It may turn out that the graph has some special properties, which make it unnecessary to try all possibilities in finding an optimal solution. The third approach is to change the problem a little and look for a good but not necessarily optimal solution.
We might be happy with a solution that gets close to the minimum number of colors on small graphs, and works quickly, since most intersections are not even as complex as Fig. An algorithm that quickly produces good but not necessarily optimal solutions is called a heuristic. One reasonable heuristic for graph coloring is the following "greedy" algorithm.
Initially we try to color as many vertices as possible with the first color, then as many as possible of the uncolored vertices with the second color, and so on. To color vertices with a new color, we perform the following steps. Select some uncolored vertex and color it with the new color. Scan the list of uncolored vertices.Hopcroft Structures.
In the graph of Fig. Theorem 4. Using a similar algorithm, we can also compute the boolean AND of n bits in O 1 time. Even if there is no existing program, at least we can discover what is known about this model and use the properties of the model to help construct a good solution.
The run time of the algorithm will be expressed as a recurrence relation which upon solution will indicate the run time as a function of the input size. As an example of the greedy approach applied to Fig.