In order to start manipulating graphs using DukeGuess, we first need to get a graph opened! In this example we will open up a graph representing the characters and relationships between characters in the first season of the popular show "24".
Start DukeGuess by double clicking on "DukeGuess.jar" in the "DukeGuess" folder.
DukeGuess now displays a file loader box. First, click the button labeled "..." and then select the file called "twentyFour.gdf". Then click "Ok"
You should now have a graph that looks something like this:
What you see now is the main DukeGuess application. You will be doing all of your graph manipulations from this screen. The area showing blue blocks and green lines is called the Visualization area. In the visualization area you can zoom in and out, move nodes (squares) and edges(lines), and move the "camera" around.
The white area below the visualization area with the ">>>" prompt is called the Interpreter. The function of the Interpreter is to take Python code commands and use them to manipulate the way the graph looks, or to found out more information about the graph. We will go into more detail on how to use the interpreter later.
In order to view more information about the graph, go to the "Display" menu at the top and click on "Information Window". DukeGuess now displays an information window to the left. When the mouse hovers over a node or edge, the information window will display detailed information for it in this new window.
When DukeGuess first loads a graph, it automatically loads it with the "random" layout. This means that nodes placed at random locations within the window. While the random layout can have interesting results, it is generally not the most helpful for examining the structure of a graph. Fortunately, DukeGuess has several built-in layouts that are interesting to view and simple to run. To access them, go to the "Layout" menu at the top of the window and select the layout you desire. In this case, lets select the "GEM" Layout. It is a good layout that organizes things nicely.
Using the Panel is fairly simple. Before you may use any of the buttons, though, you must select a node. You may do this in two different ways: either click on a node, or type its name into the text box and hit enter (Note that capitalization DOES matter). This will make the node "selected" and turn its color red. To select a different node, simply click on a different node or enter a different name into the text box. To clear a selected node without selecting another, click on the "Clear Current Node" button.
The buttons labeled "Get outdegree" and "Get Betweenness" operate in similar ways. Once you have selected a node, if you click on the outdegree button the outdegree of the selected node is printed to the text box. If you click on betweenness, the betweenness is printed to the text box.
For more information on outdegree and betweennness, see their Wikipedia articles: Degree on Wikipedia and Betweenness on Wikipedia
This is a simple button that allows you to highlight all the selected nodes neighbors. A neighbor is a node that is directly connected to the selected node. This button will turn all the neighbors orange. If you would like to return them to their original color either click the "Clear Current Node" button or click on a different node.
This button is a more powerful version of the "Highlight Neighbors" button. Instead of highlighting just neighbors, you can dictate the distance away a node must be from your selected node in order to be highlighted orange. For example, if you select "2" for Jack, but it will select all the neighbors for Jack's neighbors whom Jack does not have a connection with. These nodes are a distance of 2 away.
When you click on the "Highlight Nodes X Edges Away" button, a box comes up asking you to enter the distance you would like to highlight. Either enter a distance or type "all". If you type "all", something a little bit different will happen. Instead of highlighting nodes at just one set distance, the program will highlight nodes at every distance according to their distance. For example, neighbors will be highlighted one color, nodes 2 away will be highlighted another color, nodes 3 away another color, and so on. Nodes that are not reachable from the selected node (a distance of infinity) will be highlighted black. A color key also pops up to help you identify the colors.
The last two buttons, "Save Group" and "Clear Group" have a specialized function. If you hold the shift key and left click on a node, you'll notice it changes color to a light blue. This means the node has been added to a group. To deselect a node from the group, shift-click it again.If you would like to save this group click on the "Save Group" button. If you would like to completely clear the list, click on the "Clear Group" button.
Any groups that you save will be available for use in the Interpreter for the rest of your DukeGuess session.
The Interpreter is the most powerful part of the Guess program, but it is also the most complicated to learn how to use. It allows you to access the different qualities of individual nodes, groups of nodes, edges, and the entire graph.
In this section, code that you should type into the Interpreter will be in italics and seperated from the rest of the text.
Go ahead and bring the Interpreter back into view by clicking on the "Interpreter" tab at the bottom of the Guess window.
Accessing AttributesAttributes are something that we have alluded to so far, but have not throroughly described. Both edges and nodes have several different attributes
associated with them. This is the information that is displayed in the Information Window when you hover over a node or edge. We can also
access this information using the Interpreter. We do this by using the '.' (dot) operator. For example, to access Jack's color, we would type:
Jack.color
The Interpreter then prints out Jack's color. Go ahead and try it. In this case, Jack's color is blue.
In order to access edges, we need to type a little more. In DukeGuess, edge's are defined by their end nodes. DukeGuess uses the "-" symbol to represent the edge between
two nodes. For example, an edge between Jack and Kim would be represented as "Jack-Kim" or "Kim-Jack". We can then access this node's color by typing:
(Jack-Kim).color
We have to put "Jack-Kim" within parenthesis because the '.' operator has a higher priority than '-'. This means that DukeGuess would try to find an edge between Jack and Kim's color (which is green) instead of first finding the edge between Jack and Kim, and then finding that edge's color.
It is also possible to change the value of attributes for nodes and edges. To do this we use both the '.' and the equals sign, '=' to "assign" the attribute a new value. For an example of this, lets first make Jack purple, and then make the edge between Jack and Kim orange. Here is the code:
Now it is easier to find Jack and the edge between Jack and Kim.
Here is a final example that will help you put together a couple of the things you have learned.
Let's say we want to investigate what impact the removal of Jack and all of his neighbors would have on the "24" graph. First,
make sure you have the "twentyFour.gdf" file loaded and that you have opened the Node Functions Panel. Select Jack and highlight his
neighbors. Now shift click Jack and all of his neighbors and click on "Save Group" to save these nodes to a group. Check the image below
to make sure you got all of the nodes.
Now save this group under the name "friends". To make it appear like we have removed Jack and his friends, we need to type some commands into the interpreter.
You'll notice something strange about this line of code -- it doesn't contain any node names! In DukeGuess, if you have a group of nodes
you can access and change the same attribute for all nodes in the group by using the '.' (dot) operator on the group name the same way you would
use it on a node name.
You should end up with a graph that looks something like the one below. Clearly Jack and his friends are a key portion of this social network! Also notice that DukeGuess makes invisible edges attached to invisible nodes
Steven Johnson's popular book, Everything Bad Is Good for You: How Today's Popular Culture Is Actually Making Us Smarter, looks at how today's popular entertainment is actually far more complex than the culture of the past and requires more cognitive work of the pop culture consumer. As an example, he compares the social network of the characters in 24 (twentyFour.gdf) as compared to the popular 80s show Dallas (dallas.gdf).
In what ways, does the social network imply that 24 is indeed more complex?
This graph has six nodes (A-F) and eight edges. It can be represented by
the following dictionary:
In Gython, the extension of Python used in GUESS, there is a
-> operator that accesses a directed edge between two
nodes. Therefore, you can create the graph above using with the
following code.
GUESS automatically defines a variable g that represents the
entire graph. g.nodes and g.edges are the
list of of all nodes and edges in the graph, respectively.
There are actually five different operators that can be used to represent
an edge between node1 and node2.
node1->node2 # directed edge (node 1 to node 2) node1<-node2 # directed edge (node2 to node1) node1-node2 # undirected edge node1<->node2 # undirected edge (or bidirectional) node1?node2 # any type of edge between node1 and node2
Nodes are objects. To access node and edge attributes, use the dot operator (".").
node1.color = blue # Make node1 blue (node1?node2).color = cyan # Make all types of edges between node1 and node2 cyan (node1->node2).weight = 5 # Make the edge from node1 to node2 have a weight of 5 node1.visible = 0 # Make node1 invisibleFor numerical attributes, you may access the max or min for that attribute over all nodes or edges by typing the attribute name plus the dot operator and
min or max.
For example, weight.min returns the minimum weight for all
edges, and
salary.max returns the max salary for all nodes. See
our GUESS Quick Reference Sheet for more information on node and edge
attributes.
# returns list of nodes in path from start to end
# or None if no path exists
def find_path(start, end, path=[]):
path = path + [start]
if start == end:
return path
if start not in g.nodes:
return None
adj = start.getNeighbors()
for node in adj:
if node not in path:
newpath = find_path(node, end, path)
if newpath: return newpath
return None
One new feature here is the use of default arguments with
path=[]. The function can now be called with only two
arguments as in: findpath(node1, node2).
A sample run (using the graph above):
> find_path(A, D)
[A, B, C, D]
>
The second 'if' statement is necessary only in case there are nodes that are
listed as end points for arcs but that don't have outgoing arcs
themselves, and aren't listed in the graph at all. Such nodes could also
be contained in the graph, with an empty list of outgoing arcs, but
sometimes it is more convenient not to require this.
Note that while the user calls find_graph() with two arguments, it calls
itself with a third argument: the path that has already been traversed.
The default value for this argument is the empty list, '[]', meaning no
nodes have been traversed yet. This argument is used to avoid cycles (the
first 'if' inside the 'for' loop). The 'path' argument is not modified:
the assignment path = path + [start] creates a new list. If we had
written path.append(start) instead, we would have modified the variable
path in the caller, with disastrous results.
A sample run:
> find_all_paths(A, D)
[[A, B, C, D], [A, B, D], [A, C, D]]
>
A sample run is below:
> find_shortest_path(graph, A, D)
[A, C, D]
>
# visit each connected node in the graph (g) start from u in a depth-first manner def dfs(u): stack = [] addNodeField("visited",Types.BOOLEAN,FALSE) # add visited field if necessary stack.append(u) # append to back works as push while len(stack) > 0: v = stack.pop() v.visited = true adj = v.getNeighbors() for w in adj: if not w.visited: stack.append(w)
Now, write a function bfs that traverses the graph from a node in a breadth-first manner. Hint: use a queue. You should also be able to keep track of the unweighted distance from the source as you proceed. We have started it for you below.
# visit each connected node in the graph (g) start from u in a breadth-first manner def bfs(u): queue = [] maxDist = len(g.nodes) # no path can be longer than the number of vertices addNodeField("dist",Types.INTEGER,Integer(maxDist)) # add field to store distance
In the 1970s, E.M. Rogers and D.L. Kincaid studies the diffusion of family planning methods in 24 villages in the Republic of Korea. In the DukeGUESS folder, there are two files, korea1.gdf and korea2.gdf, that represent the communication networks among women in the two villages. In both networks, each vertex represents a woman in the village and an edge between two vertices indicates that the two women discussed family planning. The vertices also have other attributes:
(adopter == false).visible = falseNote that in GUESS, you can use 0 and 1 for false and true.
The successful village which had a successful family planning program is on the left below (korea1.gdf) and, the unsuccessful group is on the right (korea2.gdf). When looking at the graphs of these two networks below, it is easy to see that the network structure does differ.
| korea1 | korea2 |
|
|
Call the shortest path between two vertices a geodesic. The distance between two nodes is the number of edges be on that geodesic. The closeness centrality is the number of other vertices divided by the sum of all distances between the vertex and all others.
The betweenness centrality is the proportion of all geodesics between pairs of other vertices that include this vertex.
A few tips:
pictured
to the right. To find out the name of that node enter
(label == "v31").name in the Interpreter
window. That should return [pajek4] for the korea1
network. You can now use pajek4 as the name of that
node.
When we compare these networks in terms of centrality and centralization, we get the following numbers:
As the numbers show, even though the unsuccessful graph had more connections per person, the successful graph is more centralized, especially when the betweenness centralization is used. This centralized structure allows for better communcation, resulting in the successful implementation of the family program by the network on the left.
Investigate whether adoption time is associated with the structural prestige rather than the centrality of doctors in the discussion network. Compute indices of prestige (indegree, restricted input domain with a maximum distance of 2, and proximity prestige) as well as the corresponding centrality measures in the undirected network. Use rank correlation and note that adoption time is higher when a doctor adopts later.
Another hypothesis states that friendship relations are more important than discussion relations for the adoption of a new drug because it is easier to persuade friends than people you only know professionally. Physicians with many direct or indirect friends would adopt sooner than physicians with less central positions in the friendship network. The file Galesburg_friends.gdf contains the friendship network between the doctors. Is the adoption time of the new drug related to prestige or centrality in the friendship network rather than in the discussion network?
| Galesburg_discussion | Galesburg_friends |
|
|