databases project and need the explanation and answer to help me learn.
Having work related to Map Reduce. Are you interested? Need task 3 and 4 only.
Task and Mark distribution: 1. ER model and table generation (252. SQL programming (30%) 3. Sequential and distributed processing (254. Research report (20%) Notes: 1. You are expected to use the students can contact Centre for Academic Writing (CAW)2. Please notify your registry 3. Any student requiring an extension or deferral should follow the university process as outlined R model and table generation (25%) processing (25%) the CUHarvard referencing format. For support and advice on how this Centre for Academic Writing (CAW). registry course support team and module leader for disability supportAny student requiring an extension or deferral should follow the university process as outlined For support and advice on how this and module leader for disability support. Any student requiring an extension or deferral should follow the university process as outlined
7086CEM – Data Management Systems This assignment is made up of four parts: – Part A deals with database design, using E-R modelling. – Part B concerns database creation and querying, using SQL. – Part C covers parallel processing frameworks – Part D involves a research report Coursework requirements Please note that plagiarism (copying and pasting from sources) and collusion (submitting the same content as a fellow student) are detected automatically by Turnitin and flagged. You should be able to check for similarity. You are expected to submit your work. The report must be self-contained. Links to external datasets are not acceptable. Appendices are not required. Report structure Please follow the instructions below. You are expected: a) to include the name and the code of the module, the title of the CW, your names and your student Ids. b) to follow the structure of the CW brief, and present your work accordingly, i.e.: Part A 1. 2. …… Part B 1. 2. a) b) c) d) e) …… Submission process for the group CW is submitted as a group of two students as follows: a) You are expected to work as a group on ALL the parts of the CW. b) one student will submit individually in his/her name on Aula, the whole report, with both names on the front page, as a single pdf file, and c) the other student must submit individually in his/her name on Aula, one page only, with both names, as a pdf file, where he/she presents reflections on the collaborative work.
Part A. Conceptual modelling An equipment company wishes to create a database to support the hiring of tools and machinery to clients. The company has three types of equipment: power tools, such as drills and vacuum cleaners, plants such as excavators and scaffolding. Each piece of equipment is identified by a number. Power tools are described by their model and the voltage they use, whereas plants are classified by their model and their size in tonnes. Scaffolding can be traditional, aluminium or fibreglass; in addition, its width can be single or double. A large piece of equipment may be composed of smaller pieces of equipment. The company has various outlets and each has staff including a manager and several senior technicians who are responsible for supervising the work of allocated groups of technicians. A supervision record is also kept for a specific date. All employees are identified by their number, name, date of birth (DOB) and address. Furthermore, a record is kept on their employment records and their qualifications. Each outlet has a stock of equipment that may be hired by clients for varying periods of time, from a minimum of four hours to a maximum of six months. Each hire agreement between a client and the company is uniquely identified by using a hire number. Each client is identified by a number and a name. The company insists that each client must take out insurance cover for each equipment hire period. Each insurance cover is identified by a unique insurance number and includes the description of the insurance. The company wishes to keep a record of the member of staff who was in charge of a specific hire agreement. Each piece of equipment is checked for faults when it is returned by the client, and the faults/defects/damage recorded. The company keeps a record of the hire history of each client. 1. Create an ER diagram for the above scenario and indicate the cardinality of relationships and the nature of the associations (mandatory or optional). You should allocate adequate attributes to the entities of interest, especially the identifiers. (20%) 2. Generate, with justification, relational tables from the ER diagram. Indicate clearly the names of the tables, the attributes, the primary keys and the foreign keys. (5%) Guidance: i) Create the ER diagram and clearly identify any identifiers, and indicate the cardinality of relationships and the nature of the associations (mandatory or optional). ii) Generate tables and include primary and foreign keys. Use the schema notation; you do not have to produce SQL statements. Example of table generation in schema form: Course(courseId, courseName) Student (studentId, name, courseId*)
Part B: SQL programming Consider the following “Publication” Database schema, where AuthorId, CategoryId, PubIicationd, UserId are unique identifiers. Category (CategoryId, Type) Author (AuthorId, Name) Publisher(PublisherID, PublisherName) Publication (PublicationId, AuthorId, Title, CategorytId, PublishedYear, PublisherId) Request (UserId, PublicationId, RequestDate) User (UserId, Name, Email, Password) TABLE: Author AuthorId Name A011 Dingle R A012 Ransome A A013 Wardale R A014 Alexander T A015 Spurrier S TABLE: Publisher PublisherId PublisherName P01 Pearson P02 HarperCollins P03 Simon and Schuster TABLE: Category CategoryId Type C911 Short stories C912 Journal articles C913 Biography C914 Illustrations
TABLE: Publication PubId AuthorId Title CategoryId PublishedYear PublisherId P001 A011 The Blue Treacle C911 1911 P01 P002 A012 In Aleppo Once C911 2001 P01 P003 A012 Illustrating Arthur Ransome C914 1973 P03 P004 A012 Ransome the Artist C914 1994 P03 P005 A014 Bohemia In London C912 2008 P02 P006 A011 The Best of Childhood C911 2002 P01 P007 A015 Distilled Enthusiasms C912 2010 P02 TABLE: Request UserId PublicationId RequestDate U016 P001 05-Oct-2020 U241 P001 28-Sep-2020 U55 P002 08-Sep-2020 U016 P004 06-Oct-2020 U121 P002 23-Sep-2020 TABLE: User UserId Name Email Password U111 Kenderine J KenderineJ@hotmail.com Kenj2 U241 Wang F WangF@hotmail.com Wanf05 U55 Flavel K FlavelK@hotmail.com Flak77 U016 Zidane Z ZidaneZ@hotmail.com Zidz13 U121 Keita R KeitaR@hotmail.com Keir22
1. Use appropriate data types and write the SQL statements to create the tables defined in the schema above. (10%) 2. Write SQL Statements to return the following data from the Publication database. a) The names and the email of the users and the dates of all their requests. (4%) b) The details of the publications that were requested during September 2020 in descending order of requested dates. The details should include publication ids, titles and dates on which publications were requested. (4%) c) The names of the users, the title of the publications they requested, and the names of their authors. (4%) d) The number of publications requested for each of the categories (e.g. ‘Illustrations’). (4%) e) The names of users who requested more than one publication. (4%) Guidance: Please use standard SQL in Oracle Live SQL. Indicate clearly the primary keys and the foreign keys. State the SQL statements and give the results. Select and use appropriate data. The presentation of each query should have a text summary which includes i) the query itself, ii) the corresponding SQL statement solution, iii) the result of the execution of the statement and iv) evidence that you have used standard SQL and implemented each statement on a database (use screenshots, no need for appendices). A small data sample is given. When appropriate, you can create and insert additional data records in order to make sure that the queries return results. Links to external datasets or code and are not acceptable. The report must be self-contained.
Part C. Sequential and parallel processing Consider a university data store about module coursework (CW) submissions by students. For each CW submission for a specific module, each record of the data store holds the following data: 1. Module code 2. Module name 3. Semester id 4. Year 5. CW title 6. Student id 7. Student name 8. CW submission date 9. Actual date of CW submission by student Examples of records would have the following values: (7086CEM, Data Management Systems, S1, 2022, Database Management, 10101010, Lionel Messi, 28/11/2022, 30/11/2022) (7053CEM, Big Data and Visualisation, S2, 2021, Data Analysis, 07070707, Cristiano Ronaldo, 21/10/2021, 25/10/2021) (7071CEM, Information Retrieval, S3, 2020, Comparative Evaluation, 11011010, Zinedine Zidane, 09/09/2020, 25/10/2020) A record will contain a list of key-value pairs, as presented in the lectures. A block will consist of a sequence of records. You are expected to create similar additional records for this part in order to produce a mapReduce solution in question 2, below. For each module, the university Registry would like to determine the number of CW submissions which were submitted late, by one day or more. 1. Assuming that the data is stored in a relational database, produce with justification a) the SQL statement to create the corresponding table for the data store and b) the SQL statement to determine the number of CW submissions that were submitted late by one day or more, for each module. (You do not need to implement the SQL statements in Oracle Live SQL). (5%) 2. Assuming that the data is too large to be processed in a centralised manner, and that it is stored in an ordinary file, produce a decentralised solution, equivalent to 1.b) above, which applies MapReduce to the data processing. Justify your decisions and all the steps of your solution. (20%) Guidance: You should study carefully the examples of mapReduce covered in the lecture notes. You should consider the structure of the key in the (key, value) pair in the original record and in the mapping stage. This is not a programming exercise. The solution should follow the structure of the solutions given in the lecture notes.
Part D. Research report Write in your own words a brief report where you introduce and discuss data warehouses. You should cover the rationale for data warehouses, their structure and their use. The maximum length of the report is 700 words. Longer reports will be penalised. (20%) Guidance: Use your own words for the report. Copying and pasting is plagiarism. You should include relevant references. The maximum length of the report is 700 words. Longer reports will be penalised.
Marking Rubric Marking Scheme Grade Part A Part B Part C Part D <40 Incorrect interpretation of scenario and Incomplete formulation of solution Limited identification of entities and poor annotation of relationships Incorrect generation of relational tables Limited or absent rationale Poor interpretation of requirements and of queries DDL and DML SQL statements limited in scope Incomplete and incorrect SQL statements Absence of rationale Partial understanding of requirements and partially correct SQL formulation Partial understanding of context and relevance of parallel processing Incomplete steps in the application of MapReduce Partial justification of design decisions Lack of understanding of requirements Inadequate identification of issues Incompetent understanding of structural and processing components Poorly written essay 40-49 Partial interpretation of scenario and formulation of solution Partially correct ER diagram with relevant entities and relationships Partial consistency in the generation of the relational tables Partial justification of design decisions Basic understanding of requirements and partially correct interpretation Relevant use of DDL and DML statements in solution formulation Partially complete SQL statements Limited justification of solution Partial understanding of requirements and partially correct SQL formulation Partial understanding of context and relevance of parallel processing Incomplete steps in the application of MapReduce Partial justification of design decisions Partial interpretation of requirements Limited presentation of key issues Relevant description of structural and processing components Mostly descriptive essay 50-59 Adequate and consistent interpretation of scenario and satisfactory conceptual modelling Mostly correct generation of ER diagram with relevant entities and relationships Relatively competent generation of the relational tables Adequately justified design decisions Adequate understanding of requirements and mostly correct interpretation Adequate use of DDL and DML SQL statements in solution formulation Mostly complete SQL statements Adequate justification of solution Adequate understanding of requirements and correct SQL formulation Adequate presentation of context of application of parallel processing Mostly correct application of the different steps of MapReduce Adequately justified solution Adequate interpretation of requirements Key issues well identified and partially addressed Adequate presentation of key structural and processing components Adequately written essay 60-69 Good and clear interpretation of scenario and good solution formulation of the two parts Correct and complete identification of entities and relationships Consistent generation of relational tables Good interpretation of requirements and good formulation of solution Correct use of DDL and DML statements in solution formulation Complete and relevant treatment of queries Good justification of decisions Good solution to the initial query in terms of SQL statements Focused presentation of context of application of parallel processing Clearly stated and correct sequential steps of MapReduce Well expressed rationale Good understanding and statement of requirements Well focused presentation of main issues Good description of structural and processing components Well presented Well expressed rationale essay with some reflection 70+ Very good and clear interpretation of scenario and excellent solution formulation of the two parts Correct and coherent identification of well annotated entities and types of relationships Logical and consistent generation of relational tables Excellent justification of design decisions Excellent interpretation of requirements and very good formulation of solution Excellent use of DDL and DML statements in solution formulation Complete treatment of queries and SQL formulation Very good rationale Very good solution to the initial query in terms of SQL statements Excellent presentation on the need for an overall parallel solution Clear deployment and annotation of the sequential steps of MapReduce Excellent rationale Excellent understanding and interpretation of requirements Very good identification and formulation of key issues Relevant and specific presentation of structural and processing components Reflective writing supported by an excellent structure