layout: false class: title-slide-section-red, middle # Big Data Storage: HDFS Justin Post --- # Digging Deeper - Hopefully have an idea about the big data pipeline - Data lakes, data warehouses, databases, etc. - Next: + How is big data actually stored? + How to we access the big data? --- # Big Data Storage - Commonly used data storage systems + Hadoop Distributed File System (HDFS) + Amazon's Simple Storage Service (S3) + Google's Cloud Storage (GCS) + Azure's Blob Storage - Most of these systems are for HDFS compliant as that was the major method for a long time --- # Hadoop Hadoop is a framework for efficiently storing and processing large datasets - Allows for clustering of multiple computers to analyze datasets in parallel --- # Hadoop Hadoop is a framework for efficiently storing and processing large datasets - Allows for clustering of multiple comptuers to analyze datasets in parallel The [base Apache Hadoop framework](https://hadoop.apache.org/) is composed of the following modules: - Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on across machines - Hadoop YARN – a platform responsible for managing computing resources in clusters and using them for scheduling users' applications - Hadoop MapReduce – an implementation of the MapReduce programming model for large-scale data processing - Hadoop Common – contains libraries and utilities needed by other Hadoop modules <img src="data:image/png;base64,#img/hadoop.png" width="400px" style="display: block; margin: auto;" /> --- # HDFS File system - system that peforms file management (organization, retrieval, naming, etc.) Distributed file system - file system where the storage devices are physically dispersed (multiple machines for instance) --- # HDFS File system - system that peforms file management (organization, retrieval, naming, etc.) Distributed file system - file system where the storage devices are physically dispersed (multiple machines for instance) - An HDFS instance may consist of hundreds or thousands of server machines (a cluster) - HDFS stores the data in blocks (say 128 MB chunks) - Data stored in multiple places for fault tolerance - Works well for computations that can be split up, run in parallel, and combined --- # HDFS Architecture - A **Namenode** holds all of the information about where the data is stored - Each node in the cluster usually has a **DataNode** that manage storage for the node's data <!--The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.--> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#img/hdfsarchitecture.png" alt="https://hadoop.apache.org/docs/r3.3.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html" width="400px" /> <p class="caption">https://hadoop.apache.org/docs/r3.3.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html</p> </div> --- # HDFS Data Replication Data is split into blocks and stored in multiple places - replication factor determines how many copies are made <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#img/hdfsdatanodes.png" alt="https://hadoop.apache.org/docs/r3.3.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html" width="500px" /> <p class="caption">https://hadoop.apache.org/docs/r3.3.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html</p> </div> --- # HeartBeat and Balancing HeartBeat - signal sent from datanode back to namenode - Namenode sees no signal, datanode considered dead Balancing - when datanodes fail, data may be under-replicated - Namenode will send signals to replicate and balance data replication <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#img/heartbeat.png" alt="https://www.researchgate.net/figure/HDFS-STRCUTURE-Advantages-It-has-very-high-bandwidth-to-support-map-reduce-jobs-It_fig1_303527953" width="500px" /> <p class="caption">https://www.researchgate.net/figure/HDFS-STRCUTURE-Advantages-It-has-very-high-bandwidth-to-support-map-reduce-jobs-It_fig1_303527953</p> </div> --- # Hadoop YARN - Yet Another Resource Negotiator Hadoop YARN – a platform responsible for managing computing resources in clusters and using them for scheduling users' applications - Client submits jobs - Resource manager runs in the background to assign and manage resources to complete the job <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#img/yarn_architecture.gif" alt="https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/YARN.html" width="500px" /> <p class="caption">https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/YARN.html</p> </div> --- # Hadoop MapReduce [MapReduce](https://www.ibm.com/topics/mapreduce#:~:text=MapReduce%20is%20a%20programming%20paradigm,tasks%20that%20Hadoop%20programs%20perform.) is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster - Leverages Parallel Computing --- # Hadoop MapReduce [MapReduce](https://www.ibm.com/topics/mapreduce#:~:text=MapReduce%20is%20a%20programming%20paradigm,tasks%20that%20Hadoop%20programs%20perform.) is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster - Leverages Parallel Computing + Take computations that can be done independently + Run computation simultaneously on + different processor cores + across many connected computers (i.e on a cluster) + Combine results --- # Parallel Computing Idea <img src="data:image/png;base64,#img/serialProblem.gif" width="450px" style="display: block; margin: auto;" /> <hr> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#img/parallelProblem.gif" alt="https://computing.llnl.gov/tutorials/parallel_comp/" width="450px" /> <p class="caption">https://computing.llnl.gov/tutorials/parallel_comp/</p> </div> --- # Hadoop MapReduce [MapReduce](https://www.ibm.com/topics/mapreduce#:~:text=MapReduce%20is%20a%20programming%20paradigm,tasks%20that%20Hadoop%20programs%20perform.) is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster Basic MapReduce idea: - Consider different chunks of data to be analyzed - Use a **map** function to turn each chunk into zero or more key-value pairs - Collect together all pairs with the same keys - **Reduce** each collection of grouped values to produce an output for the corresponding key <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#img/mapreducewords.png" alt="https://www.todaysoftmag.com/article/1358/hadoop-mapreduce-deep-diving-and-tuning" width="500px" /> <p class="caption">https://www.todaysoftmag.com/article/1358/hadoop-mapreduce-deep-diving-and-tuning</p> </div> <!--As an analogy, you can think of map and reduce tasks as the way a census was conducted in Roman times, where the census bureau would dispatch its people to each city in the empire. Each census taker in each city would be tasked to count the number of people in that city and then return their results to the capital city. There, the results from each city would be reduced to a single count (sum of all cities) to determine the overall population of the empire. This mapping of people to cities, in parallel, and then combining the results (reducing) is much more efficient than sending a single person to count every person in the empire in a serial fashion.--> --- # MapReduce Example We've already kind of done this with our counting of words from a book homework! - Plan for a text document: + create a dictionary with words used as the keys and counts as values + This is the map part - Use this map function across multiple text documents in parallel + Combine the resulting dictionaries by summing counts across the words + This is the reduce part --- # MapReduce Example <img src="data:image/png;base64,#img/MapReduceOliverTwist.png" width="600px" style="display: block; margin: auto;" /> --- # MapReduce Example - Splitting up Oliver Twist ```python import string def find_chap(lines, string): chap_start = lines.find(string) chap_end = lines.find(string, chap_start + 1) if chap_end == -1: chap_end = lines.find("End of the Project Gutenberg EBook") return([chap_start, chap_end]) def remove_char(lines): #replace punctuation for symbol in string.punctuation: lines = lines.replace(symbol, "") lines = lines.replace("\n", " ") return(lines) ``` --- # MapReduce Example - Splitting up Oliver Twist ```python def save_chap(lines, chap = None): if chap == None: start_end = find_chap(lines, "CHAPTER I") chap = 1 else: start_end = find_chap(lines, "CHAPTER") #get the chapter and turn it to lower case chap_text = lines[start_end[0]:start_end[1]].lower() #remove punctuation chap_text = remove_char(chap_text) with open('dickens/chap' + str(chap) + '.txt', 'w') as w: w.write(chap_text) chap += 1 if lines[(start_end[1] + 1):].find("CHAPTER") == -1: return else: save_chap(lines[start_end[1]:], chap = chap) #read in the book as a string with open('dickens/charles-dickens-oliver-twist.txt', 'r') as f: my_lines = f.read() save_chap(my_lines) ``` --- # MapReduce Example - Counting Words Now we can take in one of the chapters and count the words (our mapping function) ```python def map_words(chap): word_count_dictionary = {} chap_split = chap.split(" ") for word in chap_split: if word in word_count_dictionary: word_count_dictionary[word] += 1 else: word_count_dictionary[word] = 1 return word_count_dictionary with open('dickens/chap1.txt', 'r') as f: my_chap = f.read() counted = map_words(my_chap) for vals in list(counted.items())[:4]: print(vals) ``` ``` ## ('chapter', 2) ## ('i', 7) ## ('', 40) ## ('treats', 1) ``` --- # MapReduce Example - Counting Words - We can construct an iterable with all the chapters and map our function to each chapter - This could be parallelized across the chapters yielding 53 dictionaries ```python my_chap = [] for i in range(1, 54): with open('dickens/chap' + str(i) + '.txt', 'r') as f: my_chap.append(f.read()) mapped = list(map(map_words, my_chap)) for key, value in mapped[0].items(): print(key, ":", value) ``` ``` ## chapter : 2 ## i : 7 ## : 40 ## treats : 1 ## of : 35 ## the : 75 ## place : 2 ## where : 4 ## oliver : 9 ## twist : 3 ## was : 17 ## born : 3 ## and : 35 ## circumstances : 1 ## attending : 1 ## his : 11 ## birth : 1 ## among : 1 ## other : 1 ## public : 1 ## buildings : 1 ## in : 22 ## a : 33 ## certain : 1 ## town : 1 ## which : 10 ## for : 7 ## many : 1 ## reasons : 1 ## it : 13 ## will : 3 ## be : 5 ## prudent : 1 ## to : 27 ## refrain : 1 ## from : 5 ## mentioning : 1 ## assign : 1 ## no : 5 ## fictitious : 1 ## name : 3 ## there : 3 ## is : 8 ## one : 2 ## anciently : 1 ## common : 1 ## most : 4 ## towns : 1 ## great : 2 ## or : 5 ## small : 1 ## wit : 1 ## workhouse : 4 ## this : 9 ## on : 10 ## day : 1 ## date : 1 ## need : 1 ## not : 4 ## trouble : 2 ## myself : 1 ## repeat : 1 ## inasmuch : 1 ## as : 8 ## can : 2 ## possible : 1 ## consequence : 1 ## reader : 1 ## stage : 1 ## business : 1 ## at : 4 ## all : 5 ## events : 1 ## item : 1 ## mortality : 1 ## whose : 1 ## prefixed : 1 ## head : 4 ## long : 3 ## time : 4 ## after : 2 ## ushered : 1 ## into : 2 ## world : 2 ## sorrow : 1 ## by : 11 ## parish : 4 ## surgeon : 7 ## remained : 1 ## matter : 1 ## considerable : 2 ## doubt : 1 ## whether : 1 ## child : 6 ## would : 6 ## survive : 1 ## bear : 1 ## any : 2 ## case : 1 ## somewhat : 1 ## more : 3 ## than : 4 ## probable : 1 ## that : 12 ## these : 1 ## memoirs : 1 ## never : 1 ## have : 12 ## appeared : 1 ## if : 5 ## they : 5 ## had : 12 ## being : 6 ## comprised : 1 ## within : 1 ## couple : 1 ## pages : 1 ## possessed : 2 ## inestimable : 1 ## merit : 1 ## concise : 1 ## faithful : 1 ## specimen : 1 ## biography : 1 ## extant : 1 ## literature : 1 ## age : 1 ## country : 1 ## although : 1 ## am : 1 ## disposed : 1 ## maintain : 1 ## itself : 1 ## fortunate : 1 ## enviable : 1 ## circumstance : 1 ## possibly : 1 ## befall : 1 ## human : 1 ## do : 2 ## mean : 1 ## say : 1 ## particular : 1 ## instance : 1 ## best : 1 ## thing : 1 ## could : 3 ## possibility : 1 ## occurred : 1 ## fact : 2 ## difficulty : 1 ## inducing : 1 ## take : 3 ## upon : 2 ## himself : 1 ## office : 1 ## respirationa : 1 ## troublesome : 2 ## practice : 1 ## but : 5 ## custom : 1 ## has : 2 ## rendered : 2 ## necessary : 1 ## our : 1 ## easy : 1 ## existence : 1 ## some : 2 ## he : 11 ## lay : 1 ## gasping : 1 ## little : 2 ## flock : 1 ## mattress : 1 ## rather : 2 ## unequally : 1 ## poised : 1 ## between : 2 ## next : 1 ## balance : 1 ## decidedly : 1 ## favour : 1 ## latter : 1 ## now : 2 ## during : 1 ## brief : 1 ## period : 1 ## been : 11 ## surrounded : 1 ## careful : 1 ## grandmothers : 1 ## anxious : 1 ## aunts : 1 ## experienced : 1 ## nurses : 1 ## doctors : 1 ## profound : 1 ## wisdom : 1 ## inevitably : 1 ## indubitably : 1 ## killed : 1 ## nobody : 2 ## however : 1 ## pauper : 1 ## old : 4 ## woman : 4 ## who : 3 ## misty : 1 ## an : 3 ## unwonted : 1 ## allowance : 1 ## beer : 1 ## did : 2 ## such : 1 ## matters : 1 ## contract : 1 ## nature : 1 ## fought : 1 ## out : 3 ## point : 1 ## them : 2 ## result : 1 ## few : 1 ## struggles : 1 ## breathed : 1 ## sneezed : 1 ## proceeded : 2 ## advertise : 1 ## inmates : 1 ## new : 1 ## burden : 1 ## having : 2 ## imposed : 1 ## setting : 1 ## up : 4 ## loud : 1 ## cry : 1 ## reasonably : 1 ## expected : 2 ## male : 1 ## infant : 2 ## very : 2 ## useful : 1 ## appendage : 1 ## voice : 2 ## much : 1 ## longer : 1 ## space : 1 ## three : 1 ## minutes : 1 ## quarter : 1 ## gave : 1 ## first : 1 ## proof : 1 ## free : 1 ## proper : 2 ## action : 1 ## lungs : 1 ## patchwork : 1 ## coverlet : 1 ## carelessly : 1 ## flung : 1 ## over : 4 ## iron : 1 ## bedstead : 1 ## rustled : 1 ## pale : 1 ## face : 3 ## young : 4 ## raised : 2 ## feebly : 1 ## pillow : 2 ## faint : 1 ## imperfectly : 1 ## articulated : 1 ## words : 1 ## let : 1 ## me : 3 ## see : 2 ## die : 1 ## sitting : 1 ## with : 5 ## turned : 1 ## towards : 2 ## fire : 2 ## giving : 1 ## palms : 1 ## hands : 3 ## warm : 1 ## rub : 1 ## alternately : 1 ## spoke : 1 ## rose : 1 ## advancing : 1 ## beds : 1 ## said : 5 ## kindness : 1 ## might : 2 ## him : 2 ## oh : 1 ## you : 2 ## must : 1 ## talk : 1 ## about : 1 ## dying : 1 ## yet : 1 ## lor : 2 ## bless : 3 ## her : 13 ## dear : 6 ## heart : 3 ## interposed : 1 ## nurse : 4 ## hastily : 1 ## depositing : 1 ## pocket : 1 ## green : 3 ## glass : 1 ## bottle : 3 ## contents : 1 ## she : 11 ## tasting : 1 ## corner : 1 ## evident : 1 ## satisfaction : 1 ## when : 1 ## lived : 1 ## sir : 1 ## thirteen : 1 ## children : 1 ## own : 1 ## em : 1 ## dead : 1 ## except : 1 ## two : 1 ## wurkus : 1 ## shell : 1 ## know : 1 ## better : 1 ## way : 2 ## think : 1 ## what : 2 ## mother : 1 ## theres : 1 ## lamb : 1 ## apparently : 1 ## consolatory : 1 ## perspective : 1 ## mothers : 1 ## prospects : 1 ## failed : 1 ## producing : 1 ## its : 4 ## due : 1 ## effect : 1 ## patient : 1 ## shook : 1 ## stretched : 1 ## hand : 2 ## deposited : 1 ## arms : 1 ## imprinted : 1 ## cold : 1 ## white : 1 ## lips : 1 ## passionately : 1 ## forehead : 1 ## passed : 1 ## gazed : 1 ## wildly : 1 ## round : 1 ## shuddered : 1 ## fell : 2 ## backand : 1 ## died : 1 ## chafed : 1 ## breast : 1 ## temples : 1 ## blood : 1 ## stopped : 1 ## forever : 1 ## talked : 1 ## hope : 1 ## comfort : 1 ## strangers : 1 ## too : 2 ## mrs : 1 ## thingummy : 1 ## last : 2 ## ah : 2 ## poor : 2 ## so : 1 ## picking : 1 ## cork : 1 ## fallen : 1 ## stooped : 1 ## neednt : 1 ## mind : 1 ## sending : 1 ## cries : 1 ## putting : 1 ## gloves : 1 ## deliberation : 1 ## likely : 1 ## give : 1 ## gruel : 1 ## put : 1 ## hat : 1 ## pausing : 1 ## bedside : 1 ## door : 1 ## added : 1 ## goodlooking : 1 ## girl : 1 ## come : 1 ## brought : 1 ## here : 1 ## night : 1 ## replied : 1 ## overseers : 2 ## order : 1 ## found : 1 ## lying : 1 ## street : 1 ## walked : 2 ## distance : 1 ## shoes : 1 ## were : 1 ## worn : 1 ## pieces : 1 ## came : 1 ## going : 1 ## knows : 1 ## leaned : 1 ## body : 1 ## left : 2 ## story : 1 ## shaking : 1 ## weddingring : 1 ## goodnight : 1 ## medical : 1 ## gentleman : 1 ## away : 1 ## dinner : 1 ## once : 1 ## applied : 1 ## herself : 1 ## sat : 1 ## down : 1 ## low : 1 ## chair : 1 ## before : 1 ## dress : 2 ## excellent : 1 ## example : 1 ## power : 1 ## wrapped : 1 ## blanket : 1 ## hitherto : 1 ## formed : 1 ## only : 1 ## covering : 1 ## nobleman : 1 ## beggar : 1 ## hard : 1 ## haughtiest : 1 ## stranger : 1 ## assigned : 1 ## station : 1 ## society : 1 ## enveloped : 1 ## calico : 1 ## robes : 1 ## grown : 1 ## yellow : 1 ## same : 1 ## service : 1 ## badged : 1 ## ticketed : 1 ## oncea : 1 ## childthe : 1 ## orphan : 2 ## workhousethe : 1 ## humble : 1 ## halfstarved : 1 ## drudgeto : 1 ## cuffed : 1 ## buffeted : 1 ## through : 1 ## worlddespised : 1 ## pitied : 1 ## none : 1 ## cried : 2 ## lustily : 1 ## known : 1 ## tender : 1 ## mercies : 1 ## churchwardens : 1 ## perhaps : 1 ## louder : 1 ``` --- # MapReduce Example - Counting Words - Now we would need a reducer function + Takes in the dictionaries + Combines their counts for each word ```python def word_reduce(dict1, dict2): combined = {} for key in dict1.keys(): if key in dict2: combined[key] = dict1[key] + dict2[key] else: combined[key] = dict1[key] for key in dict2.keys(): if key not in dict1.keys(): combined[key] = dict2[key] return combined ``` --- # MapReduce Example - Counting Words - Now we would need a reducer function + Takes in the dictionaries + Combines their counts for each word ```python with open('dickens/chap1.txt', 'r') as f: my_chap = f.read() counted1 = map_words(my_chap) with open('dickens/chap2.txt', 'r') as f: my_chap = f.read() counted2 = map_words(my_chap) temp = word_reduce(counted1, counted2) for key, value in temp.items(): print(key, ":", value) ``` ``` ## chapter : 3 ## i : 39 ## : 214 ## treats : 2 ## of : 148 ## the : 332 ## place : 4 ## where : 10 ## oliver : 47 ## twist : 13 ## was : 95 ## born : 3 ## and : 153 ## circumstances : 1 ## attending : 1 ## his : 51 ## birth : 1 ## among : 1 ## other : 8 ## public : 4 ## buildings : 1 ## in : 93 ## a : 165 ## certain : 2 ## town : 1 ## which : 30 ## for : 46 ## many : 4 ## reasons : 1 ## it : 55 ## will : 11 ## be : 28 ## prudent : 1 ## to : 120 ## refrain : 1 ## from : 13 ## mentioning : 2 ## assign : 1 ## no : 21 ## fictitious : 1 ## name : 7 ## there : 13 ## is : 18 ## one : 15 ## anciently : 1 ## common : 1 ## most : 7 ## towns : 1 ## great : 18 ## or : 32 ## small : 7 ## wit : 1 ## workhouse : 13 ## this : 33 ## on : 29 ## day : 6 ## date : 1 ## need : 2 ## not : 17 ## trouble : 2 ## myself : 2 ## repeat : 2 ## inasmuch : 1 ## as : 30 ## can : 2 ## possible : 3 ## consequence : 3 ## reader : 1 ## stage : 1 ## business : 5 ## at : 35 ## all : 19 ## events : 1 ## item : 1 ## mortality : 1 ## whose : 2 ## prefixed : 1 ## head : 9 ## long : 7 ## time : 9 ## after : 9 ## ushered : 2 ## into : 15 ## world : 4 ## sorrow : 1 ## by : 29 ## parish : 14 ## surgeon : 8 ## remained : 1 ## matter : 3 ## considerable : 2 ## doubt : 1 ## whether : 6 ## child : 13 ## would : 18 ## survive : 1 ## bear : 1 ## any : 10 ## case : 1 ## somewhat : 3 ## more : 14 ## than : 8 ## probable : 2 ## that : 50 ## these : 6 ## memoirs : 1 ## never : 9 ## have : 33 ## appeared : 1 ## if : 11 ## they : 27 ## had : 60 ## being : 12 ## comprised : 1 ## within : 2 ## couple : 1 ## pages : 1 ## possessed : 2 ## inestimable : 1 ## merit : 1 ## concise : 1 ## faithful : 1 ## specimen : 1 ## biography : 1 ## extant : 1 ## literature : 1 ## age : 3 ## country : 1 ## although : 2 ## am : 3 ## disposed : 1 ## maintain : 1 ## itself : 1 ## fortunate : 1 ## enviable : 1 ## circumstance : 2 ## possibly : 1 ## befall : 1 ## human : 1 ## do : 7 ## mean : 1 ## say : 9 ## particular : 1 ## instance : 1 ## best : 1 ## thing : 2 ## could : 8 ## possibility : 1 ## occurred : 1 ## fact : 2 ## difficulty : 2 ## inducing : 1 ## take : 10 ## upon : 9 ## himself : 4 ## office : 1 ## respirationa : 1 ## troublesome : 3 ## practice : 1 ## but : 18 ## custom : 1 ## has : 3 ## rendered : 3 ## necessary : 2 ## our : 2 ## easy : 1 ## existence : 1 ## some : 9 ## he : 62 ## lay : 3 ## gasping : 1 ## little : 11 ## flock : 1 ## mattress : 1 ## rather : 5 ## unequally : 1 ## poised : 1 ## between : 3 ## next : 8 ## balance : 1 ## decidedly : 2 ## favour : 1 ## latter : 3 ## now : 8 ## during : 1 ## brief : 2 ## period : 1 ## been : 21 ## surrounded : 1 ## careful : 1 ## grandmothers : 1 ## anxious : 1 ## aunts : 1 ## experienced : 1 ## nurses : 1 ## doctors : 2 ## profound : 1 ## wisdom : 2 ## inevitably : 1 ## indubitably : 1 ## killed : 1 ## nobody : 5 ## however : 4 ## pauper : 2 ## old : 7 ## woman : 8 ## who : 23 ## misty : 1 ## an : 14 ## unwonted : 1 ## allowance : 2 ## beer : 1 ## did : 4 ## such : 2 ## matters : 1 ## contract : 1 ## nature : 2 ## fought : 1 ## out : 11 ## point : 1 ## them : 6 ## result : 2 ## few : 1 ## struggles : 1 ## breathed : 1 ## sneezed : 1 ## proceeded : 2 ## advertise : 1 ## inmates : 2 ## new : 1 ## burden : 1 ## having : 6 ## imposed : 1 ## setting : 1 ## up : 13 ## loud : 1 ## cry : 4 ## reasonably : 1 ## expected : 3 ## male : 1 ## infant : 4 ## very : 26 ## useful : 2 ## appendage : 1 ## voice : 6 ## much : 4 ## longer : 1 ## space : 1 ## three : 6 ## minutes : 1 ## quarter : 4 ## gave : 5 ## first : 6 ## proof : 1 ## free : 1 ## proper : 3 ## action : 1 ## lungs : 1 ## patchwork : 1 ## coverlet : 1 ## carelessly : 1 ## flung : 1 ## over : 8 ## iron : 1 ## bedstead : 2 ## rustled : 1 ## pale : 3 ## face : 5 ## young : 6 ## raised : 3 ## feebly : 1 ## pillow : 2 ## faint : 2 ## imperfectly : 1 ## articulated : 1 ## words : 3 ## let : 3 ## me : 8 ## see : 6 ## die : 1 ## sitting : 3 ## with : 44 ## turned : 2 ## towards : 3 ## fire : 3 ## giving : 1 ## palms : 1 ## hands : 6 ## warm : 1 ## rub : 1 ## alternately : 1 ## spoke : 2 ## rose : 2 ## advancing : 2 ## beds : 1 ## said : 36 ## kindness : 1 ## might : 6 ## him : 42 ## oh : 1 ## you : 32 ## must : 1 ## talk : 1 ## about : 6 ## dying : 1 ## yet : 3 ## lor : 3 ## bless : 5 ## her : 20 ## dear : 9 ## heart : 6 ## interposed : 2 ## nurse : 5 ## hastily : 1 ## depositing : 1 ## pocket : 1 ## green : 3 ## glass : 3 ## bottle : 4 ## contents : 1 ## she : 18 ## tasting : 1 ## corner : 3 ## evident : 1 ## satisfaction : 1 ## when : 16 ## lived : 1 ## sir : 13 ## thirteen : 1 ## children : 6 ## own : 4 ## em : 4 ## dead : 1 ## except : 1 ## two : 9 ## wurkus : 1 ## shell : 2 ## know : 6 ## better : 1 ## way : 5 ## think : 6 ## what : 16 ## mother : 3 ## theres : 1 ## lamb : 1 ## apparently : 1 ## consolatory : 1 ## perspective : 1 ## mothers : 2 ## prospects : 1 ## failed : 1 ## producing : 1 ## its : 8 ## due : 1 ## effect : 1 ## patient : 1 ## shook : 1 ## stretched : 1 ## hand : 6 ## deposited : 2 ## arms : 1 ## imprinted : 1 ## cold : 3 ## white : 8 ## lips : 1 ## passionately : 1 ## forehead : 2 ## passed : 1 ## gazed : 2 ## wildly : 1 ## round : 5 ## shuddered : 1 ## fell : 4 ## backand : 1 ## died : 2 ## chafed : 1 ## breast : 2 ## temples : 1 ## blood : 1 ## stopped : 1 ## forever : 1 ## talked : 1 ## hope : 2 ## comfort : 1 ## strangers : 1 ## too : 8 ## mrs : 28 ## thingummy : 1 ## last : 6 ## ah : 3 ## poor : 8 ## so : 10 ## picking : 2 ## cork : 1 ## fallen : 1 ## stooped : 1 ## neednt : 1 ## mind : 1 ## sending : 1 ## cries : 1 ## putting : 2 ## gloves : 1 ## deliberation : 1 ## likely : 1 ## give : 2 ## gruel : 9 ## put : 2 ## hat : 4 ## pausing : 1 ## bedside : 1 ## door : 1 ## added : 4 ## goodlooking : 1 ## girl : 1 ## come : 8 ## brought : 3 ## here : 5 ## night : 4 ## replied : 10 ## overseers : 2 ## order : 2 ## found : 4 ## lying : 1 ## street : 1 ## walked : 3 ## distance : 1 ## shoes : 1 ## were : 18 ## worn : 1 ## pieces : 1 ## came : 2 ## going : 4 ## knows : 2 ## leaned : 1 ## body : 3 ## left : 3 ## story : 2 ## shaking : 2 ## weddingring : 1 ## goodnight : 1 ## medical : 1 ## gentleman : 17 ## away : 6 ## dinner : 2 ## once : 5 ## applied : 1 ## herself : 3 ## sat : 1 ## down : 4 ## low : 3 ## chair : 7 ## before : 6 ## dress : 2 ## excellent : 2 ## example : 1 ## power : 1 ## wrapped : 1 ## blanket : 1 ## hitherto : 1 ## formed : 1 ## only : 4 ## covering : 1 ## nobleman : 1 ## beggar : 1 ## hard : 2 ## haughtiest : 1 ## stranger : 1 ## assigned : 1 ## station : 1 ## society : 2 ## enveloped : 1 ## calico : 1 ## robes : 1 ## grown : 1 ## yellow : 1 ## same : 1 ## service : 1 ## badged : 1 ## ticketed : 1 ## oncea : 1 ## childthe : 1 ## orphan : 4 ## workhousethe : 1 ## humble : 1 ## halfstarved : 1 ## drudgeto : 1 ## cuffed : 1 ## buffeted : 1 ## through : 2 ## worlddespised : 1 ## pitied : 1 ## none : 1 ## cried : 3 ## lustily : 1 ## known : 3 ## tender : 3 ## mercies : 1 ## churchwardens : 1 ## perhaps : 5 ## louder : 1 ## ii : 2 ## twists : 2 ## growth : 1 ## education : 1 ## board : 15 ## eight : 3 ## ten : 4 ## months : 3 ## victim : 1 ## systematic : 1 ## course : 1 ## treachery : 1 ## deception : 1 ## hungry : 4 ## destitute : 1 ## situation : 2 ## duly : 1 ## reported : 1 ## authorities : 6 ## inquired : 7 ## dignity : 1 ## female : 4 ## then : 6 ## domiciled : 1 ## house : 5 ## impart : 1 ## consolation : 2 ## nourishment : 1 ## stood : 1 ## humility : 2 ## magnanimously : 1 ## humanely : 1 ## resolved : 1 ## should : 7 ## farmed : 1 ## dispatched : 1 ## branchworkhouse : 1 ## miles : 1 ## off : 3 ## twenty : 2 ## thirty : 1 ## juvenile : 1 ## offenders : 1 ## against : 1 ## poorlaws : 1 ## rolled : 1 ## floor : 2 ## without : 2 ## inconvenience : 1 ## food : 2 ## clothing : 1 ## under : 2 ## parental : 1 ## superintendence : 1 ## elderly : 2 ## received : 1 ## culprits : 1 ## consideration : 1 ## sevenpencehalfpenny : 2 ## per : 4 ## week : 4 ## sevenpencehalfpennys : 1 ## worth : 1 ## good : 6 ## diet : 2 ## deal : 2 ## may : 9 ## got : 9 ## quite : 4 ## enough : 2 ## overload : 1 ## stomach : 1 ## make : 4 ## uncomfortable : 1 ## experience : 1 ## knew : 1 ## accurate : 1 ## perception : 1 ## appropriated : 1 ## greater : 1 ## part : 2 ## weekly : 1 ## stipend : 1 ## use : 1 ## consigned : 1 ## rising : 1 ## parochial : 1 ## generation : 1 ## even : 1 ## shorter : 1 ## originally : 1 ## provided : 2 ## thereby : 1 ## finding : 1 ## lowest : 1 ## depth : 1 ## deeper : 1 ## still : 1 ## proving : 1 ## experimental : 3 ## philosopher : 2 ## everybody : 1 ## another : 6 ## theory : 1 ## horse : 2 ## able : 2 ## live : 2 ## eating : 1 ## demonstrated : 1 ## well : 9 ## straw : 1 ## unquestionably : 1 ## spirited : 1 ## rampacious : 1 ## animal : 1 ## nothing : 3 ## fourandtwenty : 1 ## hours : 1 ## comfortable : 1 ## bait : 1 ## air : 1 ## unfortunately : 1 ## philosophy : 1 ## protecting : 1 ## care : 4 ## delivered : 1 ## similar : 1 ## usually : 3 ## attended : 1 ## operation : 3 ## system : 3 ## moment : 1 ## contrived : 1 ## exist : 1 ## smallest : 1 ## portion : 1 ## weakest : 1 ## perversely : 1 ## happen : 2 ## half : 3 ## cases : 2 ## either : 1 ## sickened : 1 ## want : 4 ## neglect : 1 ## halfsmothered : 1 ## accident : 2 ## miserable : 1 ## summoned : 1 ## gathered : 1 ## fathers : 1 ## occasionally : 1 ## interesting : 2 ## inquest : 1 ## overlooked : 1 ## turning : 1 ## inadvertently : 1 ## scalded : 1 ## death : 1 ## happened : 2 ## washingthough : 1 ## scarce : 1 ## anything : 3 ## approaching : 1 ## washing : 3 ## rare : 1 ## occurrence : 1 ## farmthe : 1 ## jury : 1 ## their : 7 ## heads : 2 ## ask : 2 ## questions : 1 ## parishioners : 1 ## rebelliously : 1 ## affix : 1 ## signatures : 1 ## remonstrance : 1 ## impertinences : 1 ## speedily : 1 ## checked : 1 ## evidence : 1 ## testimony : 1 ## beadle : 14 ## former : 1 ## whom : 2 ## always : 2 ## opened : 2 ## inside : 2 ## indeed : 2 ## invariably : 1 ## swore : 1 ## whatever : 1 ## wanted : 4 ## selfdevotional : 1 ## besides : 2 ## made : 8 ## periodical : 1 ## pilgrimages : 1 ## farm : 1 ## sent : 1 ## neat : 1 ## clean : 1 ## behold : 1 ## went : 1 ## people : 7 ## cannot : 1 ## farming : 1 ## produce : 1 ## extraordinary : 2 ## luxuriant : 1 ## crop : 1 ## ninth : 3 ## birthday : 3 ## thin : 3 ## diminutive : 1 ## stature : 1 ## circumference : 1 ## inheritance : 1 ## implanted : 1 ## sturdy : 1 ## spirit : 2 ## olivers : 2 ## plenty : 1 ## room : 6 ## expand : 1 ## thanks : 1 ## spare : 1 ## establishment : 1 ## attributed : 1 ## keeping : 1 ## coalcellar : 1 ## select : 1 ## party : 1 ## participating : 1 ## sound : 1 ## thrashing : 1 ## locked : 1 ## atrociously : 1 ## presuming : 1 ## mann : 27 ## lady : 1 ## unexpectedly : 1 ## startled : 1 ## apparition : 1 ## mr : 27 ## bumble : 29 ## striving : 1 ## undo : 1 ## wicket : 2 ## gardengate : 2 ## goodness : 1 ## gracious : 1 ## thrusting : 1 ## window : 1 ## wellaffected : 1 ## ecstasies : 2 ## joy : 1 ## susan : 1 ## brats : 1 ## upstairs : 1 ## wash : 1 ## directlymy : 1 ## alive : 1 ## how : 3 ## glad : 1 ## surely : 1 ## fat : 4 ## man : 4 ## choleric : 1 ## instead : 2 ## responding : 1 ## openhearted : 1 ## salutation : 1 ## kindred : 1 ## tremendous : 1 ## shake : 1 ## bestowed : 1 ## kick : 1 ## emanated : 1 ## leg : 1 ## beadles : 3 ## running : 1 ## outfor : 1 ## boys : 7 ## removed : 3 ## timeonly : 1 ## forgotten : 1 ## gate : 3 ## bolted : 1 ## account : 1 ## walk : 5 ## pray : 2 ## invitation : 1 ## accompanied : 2 ## curtsey : 1 ## softened : 1 ## churchwarden : 1 ## means : 1 ## mollified : 1 ## respectful : 1 ## conduct : 1 ## grasping : 2 ## cane : 3 ## keep : 2 ## officers : 1 ## waiting : 1 ## your : 5 ## porochial : 3 ## orphans : 1 ## are : 7 ## aweer : 1 ## delegate : 1 ## stipendiary : 1 ## im : 3 ## sure : 2 ## telling : 2 ## fond : 1 ## coming : 1 ## idea : 1 ## oratorical : 1 ## powers : 1 ## importance : 1 ## displayed : 1 ## vindicated : 1 ## relaxed : 1 ## calmer : 1 ## tone : 2 ## lead : 1 ## something : 1 ## parlour : 1 ## brick : 2 ## placed : 1 ## seat : 1 ## officiously : 1 ## cocked : 3 ## table : 6 ## wiped : 1 ## perspiration : 1 ## engendered : 1 ## glanced : 1 ## complacently : 1 ## smiled : 3 ## yes : 3 ## men : 3 ## dont : 2 ## offended : 1 ## observed : 1 ## captivating : 1 ## sweetness : 1 ## youve : 2 ## wouldnt : 1 ## mention : 1 ## drop : 5 ## somethink : 1 ## nor : 1 ## waving : 1 ## right : 3 ## dignified : 1 ## placid : 1 ## manner : 1 ## noticed : 1 ## refusal : 1 ## gesture : 1 ## just : 3 ## leetle : 2 ## water : 2 ## lump : 1 ## sugar : 1 ## coughed : 1 ## persuasively : 1 ## why : 2 ## obliged : 1 ## blessed : 1 ## infants : 1 ## daffy : 2 ## aint : 1 ## cupboard : 1 ## took : 7 ## gin : 2 ## ill : 2 ## deceive : 1 ## b : 1 ## following : 1 ## eyes : 5 ## process : 3 ## mixing : 1 ## couldnt : 1 ## suffer : 1 ## my : 3 ## approvingly : 1 ## humane : 2 ## set : 2 ## shall : 1 ## early : 1 ## opportunity : 1 ## drew : 2 ## feel : 1 ## stirred : 1 ## ginandwater : 3 ## drink : 1 ## health : 1 ## cheerfulness : 1 ## swallowed : 1 ## taking : 2 ## leathern : 1 ## pocketbook : 1 ## halfbaptized : 1 ## nine : 1 ## year : 2 ## today : 1 ## inflaming : 1 ## eye : 2 ## apron : 2 ## notwithstanding : 2 ## offered : 2 ## reward : 2 ## pound : 2 ## afterwards : 1 ## increased : 1 ## superlative : 1 ## supernatral : 1 ## exertions : 1 ## we : 4 ## discover : 1 ## father : 3 ## settlement : 1 ## condition : 1 ## astonishment : 2 ## moments : 1 ## reflection : 1 ## comes : 2 ## pride : 1 ## inwented : 1 ## fondlings : 1 ## alphabetical : 1 ## sswubble : 1 ## named : 2 ## ttwist : 1 ## unwin : 1 ## vilkins : 1 ## names : 1 ## ready : 1 ## end : 3 ## alphabet : 1 ## again : 3 ## z : 1 ## youre : 2 ## literary : 1 ## character : 1 ## evidently : 1 ## gratified : 1 ## compliment : 1 ## finished : 1 ## remain : 1 ## determined : 1 ## back : 2 ## fetch : 1 ## directly : 1 ## leaving : 2 ## purpose : 3 ## outer : 1 ## coat : 1 ## dirt : 1 ## encrusted : 1 ## scrubbed : 1 ## led : 2 ## benevolent : 1 ## protectress : 1 ## bow : 3 ## divided : 1 ## go : 4 ## along : 2 ## majestic : 1 ## anybody : 2 ## readiness : 1 ## glancing : 1 ## upward : 1 ## caught : 1 ## sight : 2 ## behind : 4 ## fist : 2 ## furious : 1 ## countenance : 2 ## hint : 2 ## often : 1 ## impressed : 2 ## deeply : 1 ## recollection : 1 ## cant : 1 ## sometimes : 1 ## sense : 2 ## feint : 1 ## feeling : 1 ## regret : 1 ## difficult : 1 ## boy : 12 ## call : 1 ## tears : 2 ## hunger : 3 ## recent : 1 ## illusage : 1 ## assistants : 3 ## naturally : 1 ## thousand : 1 ## embraces : 1 ## piece : 1 ## bread : 4 ## butter : 1 ## less : 1 ## seem : 1 ## slice : 2 ## browncloth : 1 ## cap : 1 ## wretched : 2 ## home : 1 ## kind : 1 ## word : 1 ## look : 1 ## lighted : 1 ## gloom : 1 ## years : 1 ## burst : 1 ## agony : 1 ## childish : 1 ## grief : 1 ## cottagegate : 1 ## closed : 1 ## companions : 3 ## misery : 2 ## friends : 1 ## ever : 1 ## loneliness : 1 ## wide : 1 ## sank : 1 ## childs : 1 ## strides : 1 ## firmly : 1 ## goldlaced : 1 ## cuff : 1 ## trotted : 1 ## beside : 1 ## inquiring : 1 ## every : 3 ## mile : 1 ## nearly : 2 ## interrogations : 1 ## returned : 2 ## snappish : 1 ## replies : 1 ## temporary : 1 ## blandness : 1 ## awakens : 1 ## bosoms : 1 ## evaporated : 1 ## walls : 1 ## hour : 1 ## scarcely : 1 ## completed : 1 ## demolition : 1 ## second : 1 ## handed : 1 ## informed : 1 ## appear : 1 ## forthwith : 1 ## clearly : 1 ## defined : 1 ## notion : 1 ## astounded : 1 ## intelligence : 1 ## ought : 1 ## laugh : 1 ## tap : 2 ## wake : 1 ## lively : 1 ## bidding : 1 ## follow : 1 ## conducted : 1 ## large : 4 ## whitewashed : 1 ## gentlemen : 2 ## top : 1 ## seated : 1 ## armchair : 1 ## higher : 1 ## rest : 1 ## particularly : 1 ## red : 1 ## brushed : 1 ## lingering : 1 ## seeing : 1 ## fortunately : 1 ## bowed : 2 ## whats : 2 ## high : 4 ## frightened : 2 ## tremble : 1 ## causes : 1 ## answer : 2 ## hesitating : 1 ## whereupon : 1 ## waistcoat : 6 ## fool : 1 ## capital : 1 ## raising : 1 ## spirits : 1 ## ease : 1 ## listen : 1 ## suppose : 1 ## fooli : 1 ## thought : 2 ## hush : 1 ## spoken : 1 ## weeping : 1 ## bitterly : 1 ## crying : 2 ## prayers : 1 ## gruff : 1 ## feed : 1 ## youlike : 1 ## christian : 3 ## stammered : 1 ## unconsciously : 1 ## like : 1 ## marvellously : 1 ## prayed : 1 ## fed : 2 ## hadnt : 2 ## because : 1 ## taught : 2 ## educated : 1 ## trade : 2 ## redfaced : 1 ## youll : 1 ## begin : 1 ## pick : 1 ## oakum : 2 ## tomorrow : 1 ## morning : 3 ## six : 2 ## oclock : 1 ## surly : 1 ## combination : 1 ## both : 1 ## blessings : 1 ## simple : 1 ## direction : 1 ## hurried : 1 ## ward : 1 ## rough : 1 ## bed : 1 ## sobbed : 1 ## sleep : 2 ## novel : 1 ## illustration : 1 ## laws : 1 ## england : 1 ## paupers : 3 ## sleeping : 1 ## happy : 1 ## unconsciousness : 1 ## around : 1 ## arrived : 2 ## decision : 1 ## exercise : 1 ## material : 1 ## influence : 1 ## future : 1 ## fortunes : 1 ## members : 1 ## sage : 1 ## deep : 1 ## philosophical : 1 ## turn : 1 ## attention : 1 ## ordinary : 1 ## folks : 1 ## discoveredthe : 1 ## liked : 1 ## regular : 1 ## entertainment : 1 ## poorer : 1 ## classes : 2 ## tavern : 1 ## pay : 1 ## breakfast : 1 ## tea : 1 ## supper : 3 ## mortar : 1 ## elysium : 1 ## play : 1 ## work : 1 ## oho : 1 ## looking : 1 ## knowing : 1 ## fellows : 1 ## rights : 1 ## stop : 1 ## established : 1 ## rule : 1 ## alternative : 1 ## compel : 1 ## starved : 1 ## gradual : 1 ## quick : 1 ## view : 2 ## contracted : 1 ## waterworks : 1 ## unlimited : 1 ## supply : 2 ## cornfactor : 1 ## periodically : 1 ## quantities : 1 ## oatmeal : 1 ## issued : 1 ## meals : 1 ## onion : 1 ## twice : 1 ## roll : 1 ## sundays : 1 ## wise : 1 ## regulations : 1 ## reference : 1 ## ladies : 1 ## kindly : 1 ## undertook : 1 ## divorce : 1 ## married : 1 ## expense : 1 ## suit : 1 ## commons : 2 ## compelling : 1 ## support : 2 ## family : 2 ## theretofore : 1 ## done : 1 ## bachelor : 1 ## saying : 1 ## applicants : 1 ## relief : 2 ## started : 1 ## coupled : 1 ## longheaded : 1 ## inseparable : 1 ## full : 1 ## expensive : 1 ## increase : 1 ## undertakers : 1 ## bill : 3 ## necessity : 1 ## clothes : 1 ## fluttered : 1 ## loosely : 1 ## wasted : 1 ## shrunken : 1 ## forms : 1 ## twos : 1 ## number : 1 ## stone : 1 ## hall : 1 ## copper : 4 ## master : 7 ## dressed : 1 ## assisted : 1 ## women : 1 ## ladled : 1 ## mealtimes : 1 ## festive : 1 ## composition : 1 ## each : 2 ## porringer : 1 ## moreexcept : 1 ## occasions : 1 ## rejoicing : 1 ## ounces : 1 ## bowls : 2 ## polished : 1 ## spoons : 2 ## till : 1 ## shone : 1 ## performed : 1 ## sit : 1 ## staring : 1 ## eager : 1 ## devoured : 1 ## bricks : 1 ## composed : 1 ## employing : 1 ## themselves : 2 ## meanwhile : 1 ## sucking : 1 ## fingers : 1 ## assiduously : 1 ## catching : 1 ## stray : 1 ## splashes : 1 ## cast : 2 ## thereon : 1 ## generally : 1 ## appetites : 1 ## suffered : 1 ## tortures : 1 ## slow : 1 ## starvation : 1 ## voracious : 1 ## wild : 2 ## tall : 1 ## used : 1 ## sort : 1 ## kept : 1 ## cookshop : 1 ## hinted : 1 ## darkly : 1 ## unless : 1 ## basin : 2 ## diem : 1 ## afraid : 1 ## eat : 1 ## slept : 1 ## weakly : 1 ## youth : 1 ## implicitly : 1 ## believed : 1 ## council : 1 ## held : 1 ## lots : 1 ## evening : 2 ## places : 1 ## cooks : 1 ## uniform : 1 ## stationed : 1 ## ranged : 1 ## served : 1 ## grace : 1 ## short : 1 ## disappeared : 1 ## whispered : 1 ## winked : 1 ## while : 1 ## neighbors : 1 ## nudged : 1 ## desperate : 1 ## reckless : 1 ## spoon : 1 ## alarmed : 1 ## temerity : 1 ## please : 2 ## healthy : 1 ## stupefied : 1 ## rebel : 1 ## seconds : 1 ## clung : 1 ## paralysed : 1 ## wonder : 1 ## fear : 1 ## length : 1 ## aimed : 1 ## blow : 1 ## ladle : 1 ## pinioned : 1 ## arm : 1 ## shrieked : 1 ## aloud : 1 ## solemn : 1 ## conclave : 1 ## rushed : 1 ## excitement : 1 ## addressing : 1 ## limbkins : 2 ## beg : 1 ## pardon : 1 ## asked : 2 ## general : 1 ## start : 1 ## horror : 1 ## depicted : 1 ## compose : 1 ## yourself : 1 ## distinctly : 1 ## understand : 1 ## eaten : 1 ## allotted : 1 ## dietary : 1 ## hung : 3 ## controverted : 1 ## prophetic : 1 ## gentlemans : 1 ## opinion : 1 ## animated : 1 ## discussion : 1 ## ordered : 1 ## instant : 1 ## confinement : 1 ## pasted : 1 ## outside : 1 ## offering : 1 ## five : 2 ## pounds : 2 ## apprentice : 1 ## calling : 1 ## convinced : 2 ## life : 3 ## knocked : 1 ## read : 1 ## show : 1 ## sequel : 1 ## waistcoated : 1 ## mar : 1 ## interest : 1 ## narrative : 1 ## supposing : 1 ## possess : 1 ## ventured : 1 ## violent : 1 ## termination : 1 ``` --- # MapReduce Example - Counting Words We can run this function across all the 53 chapters using `functools.reduce()`! - Recall`reduce()` takes in a function of two variables and an iterable, applies the function repetitively over the iterable, and returns the result ```python import functools functools.reduce(lambda x, y: x + y, range(1,11)) # sum first 10 numbers ``` ``` ## 55 ``` --- # MapReduce Example - Counting Words We can run this function across all the 53 chapters using `functools.reduce()`! - Next we use our `word_reduce()` function with `functools.reduce()` to add up across all the chapters! ```python final = functools.reduce(word_reduce, mapped) for key, val in list(final.items())[:10]: print(key, ":", val) ``` ``` ## chapter : 59 ## i : 1604 ## : 7654 ## treats : 4 ## of : 3686 ## the : 9272 ## place : 111 ## where : 178 ## oliver : 727 ## twist : 54 ``` --- # Data Partitioning and Organization How you save your data is important! - From an efficiency perspective you want to make sure the nodes in your cluster have the data they need close to them rather than constantly having to shuffle data back and forth + If interested in chapter specific things, would want to try to store data split by chapter - If you had a big data set you know you want to query by state, smart to store teh data partitioned by state --- # Hadoop Hadoop is a framework for efficiently storing and processing large datasets - Allows for clustering of multiple comptuers to analyze datasets in parallel The [base Apache Hadoop framework](https://hadoop.apache.org/) is composed of the following modules: - Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on across machines - Hadoop YARN – a platform responsible for managing computing resources in clusters and using them for scheduling users' applications - Hadoop MapReduce – an implementation of the MapReduce programming model for large-scale data processing - Hadoop Common – contains libraries and utilities needed by other Hadoop modules <img src="data:image/png;base64,#img/hadoop.png" width="400px" style="display: block; margin: auto;" /> --- # Hadoop Limitations - HDFS usually requires a decent amount of 'on-prem' infrastructure and support - Difficulties with scalability + Horizontal scaling: adding more machines or larger disk spaces + Vertical scaling: adding additional computational power (CPU, RAM) <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#img/scaling.png" alt="https://bit.ly/3vjyHXJ" width="250px" /> <p class="caption">https://bit.ly/3vjyHXJ</p> </div> --- # Cloud Storage - Commonly used data storage systems + Hadoop Distributed File System (HDFS) + Amazon's Simple Storage Service (S3) + Google's Cloud Storage (GCS) + Azure's Blob Storage - Many companies are moving to cloud storage for big data + S3, GCS, and Blob storage use **object storage** instead of a distributed file system + Object storage includes the data, metadata, and a unique identifier --- # Cloud Storage - Commonly used data storage systems + Hadoop Distributed File System (HDFS) + Amazon's Simple Storage Service (S3) + Google's Cloud Storage (GCS) + Azure's Blob Storage - Many companies are moving to cloud storage for big data + S3, GCS, and Blob storage use **object storage** instead of a distributed file system + Object storage includes the data, metadata, and a unique identifier - You can still use a Hadoop MapReduce job using the cloud storage options though! + Drawback, it tends to be a bit slower + Bonus, cost is often cheaper overall --- # Recap - Commonly used data storage systems + Hadoop Distributed File System (HDFS) + Amazon's Simple Storage Service (S3) + Google's Cloud Storage (GCS) + Azure's Blob Storage - HDFS, Hadoop YARN, MapReduce, and Hadoop Common